Thoughts on the Vision Pro, VisionOS and AR/VR

These are my collected thoughts about the Vision Pro, visionOS and at least some of the future of mobile AR as a medium, written in no particular order. I’ve been very interested in this device, how it is being handled by the news media, and how it is broadening and heightening our expectations about augmented reality as those who can afford it apply it in “fringe” venues (i.e., driving, riding a subway, skiing, cooking). I also have thoughts about whether we really need that many optical lenses/sensors, how Maps software could be used in mobile smartglasses AR, and what VRChat-like software could look like in AR. This is disjointed because I’m not having the best time in my life right now.

Initial thoughts

These were mostly written around February 1.

  • The option to use your eyes + pinch gesture to select keys on the virtual keyboard is an interesting way to type out words.
    • But I’ve realized that this should lead, hopefully, to a VR equivalent of swipe-typing on iOS and Android: holding your pinch while you swipe your eyes quickly between the keys before letting go, and letting the software determine what you were trying to type. This can give your eyes even more of a workout than they’re already getting, but it may cut down the time in typing.
    • I also imagine that the mouth tracking in visionOS could allow for the possibility of reading your lips for words without having to “listen”, so long as you are looking at a microphone icon. Or maybe that may require tongue tracking, which is a bit more precise.
  • The choice to have menus pop up to the foreground in front of a window is also distinctive from the QuestOS.
  • The World Wide Web in VR can look far better. This opens an opportunity for reimagining what Web content can look like beyond the WIMP paradigm, because the small text of a web page in desktop view may not cut it.
    • At the very least, a “10-foot interface” for browsing the web in VR should be possible and optional.
  • The weight distribution issue will be interesting to watch unfold as the devices go out to consumers. 360 Rumors sees Apple’s deliberate choice to load the weight on the front as a fatal flaw that the company is too proud to resolve. Might be a business opportunity for third party accessories, however.

Potential future gestures and features

Written February 23.

The Vision Pro’s gestures show an increase in options for computing input beyond pointing a joystick:

  • Eye-tracked gaze to hover over a button
  • Single quick Pinch to trigger a button
  • Multiple quick pinches while hovering over keys in order to type on a virtual keyboard
  • Dwell
  • Sound actions
  • Voice control

There are even more possible options for visionOS 2.0 both within and likely outside the scope of the Vision Pro’s hardware:

  • My ideas
    • Swiping eye-tracking between keys on a keyboard while holding a pinch in order to quickly type
    • Swiping a finger across the other hand while gazing at a video in order to control playback
    • Scrolling a thumb over a finger in order to scroll up or down a page or through a gallery
    • Optional Animoji, Memoji and filters in visionOS Facetime for personas
    • Silent voice commands via face and tongue tracking
  • Other ideas (sourced from this concept, these comments, this video)
    • Changing icon layout on home screen
    • Placing app icons in home screen folders
    • Ipad apps in Home View
    • Notifications in Home View sidebar
    • Gaze at iphone, ipad, Apple Watch, Apple TV and HomePod to unlock and receive notifications
    • Dock for recently closed apps
    • Quick access to control panel
    • Look at hands for Spotlight or Control Center
    • Enable dark mode for ipad apps
    • Resize ipad app windows to create desired workspace
    • Break reminders, reminders to put the headset back on
    • Swappable shortcuts for Action button 
    • User Profiles
    • Unlock and interact with HomeKit devices
    • Optional persistent Siri in space
    • Multiple switchable headset environments 
    • Casting to iphone/ipad/Apple TV via AirPlay
    • (realtime) Translate
    • Face detection
    • Spatial Find My
    • QR code support
    • Apple Pencil support
    • Handwritten notes detection
    • Widget support
    • 3D (360) Apple maps 
    • 3D support
    • Support for iOS/ipadOS keyboard

VRChat in AR?

Written February 23.

  • What will be to augmented reality what VRChat is to VR headsets and Second Life to desktops?
    • Second Life has never been supported by Linden Labs on VR headsets
    • No news or interest from VRChat about a mixed reality mode
    • (Color) Mixed reality is a very early, very open space
    • The software has yet to catch up
    • The methods of AR user input are being fleshed out
    • The user inputs for smartphones and VR headsets have largely settled
    • Very likely that AR headset user input will involve more reading of human gestures, less use of controllers
  • But what could an answer to VRChat or Second Life look like in visionOS or even Quest 3?
    • Issues
      • VRChat (VR headset) and Second Life (desktop) are about full-immersion social interaction in virtual reality
      • Facetime-like video chat with face-scanned users in panels is the current extent
      • Hardware weight, cost, size all limit further social avatars
      • Device can’t be used outside of stationary settings as per warranty and company policy
      • Lots of limitations to VRChat-like applications which involve engagement with meatspace
  • What about VRChat-like app in full-AR smartglasses?
    • Meeting fellow wearers IRL who append filters to themselves which are visible to others
    • Geographic AR layers for landmarks
    • 3D AR guided navigation for maps
    • Casting full personal view to other stationary headset/smartglass users
    • Having other users’ avatars visit you at a location and view the location remotely but semi-autonomously

Google Maps Immersive View

Written back on December 23.

Over a year and a half ago, Google announced Immersive View, a feature of Google Maps which would use AI tools like predictive modeling and NeRF fields to generate 3D images from Street View and aerial images of both exteriors and interiors of locations, as well as generate information and animations about locations from historical and environmental data for forecasts like weather and traffic. Earlier this year, they announced an expansion of Immersive View to routes (by car, bike or walk).

This, IMO, is one of Google’s more worthwhile deployments of AI: applying it to mashup data from other Google Maps features, as well as the library of content built by Google and third-party users of Google Maps, to create more immersive features.

I just wonder when they will apply Immersive View to Google Earth.

Granted, Google Earth already has had 3D models of buildings for a long time, initially with user-generated models in 2009 which were then replaced with autogenerated photogrammetric models starting in 2012. By 2016, 3D models had been generated in Google Earth for locations, including their interiors, in 40 countries, including locations in every U.S. state. So it does seem that Immersive View brings the same types of photogrammetric 3D models of select locations to Google Maps.

The differences between Immersive View and Google Earth seem to be the following:

  • animations of moving cars simulating traffic
  • predictive forecasts of weather, traffic and busyness outward to a month ahead, with accompanying animation, for locations
  • all of the above for plotted routes as well

But I think there is a good use case for the idea of Immersive View in Google Earth. Google touts Immersive View in Maps as “getting the vibe” of a location or route before one takes it. Google Earth, which shares access to Street View with Google Maps, is one of a number of “virtual globe” apps made to give cursory, birds-eye views of the globe (and other planetary bodies). But given the use of feature-rich virtual globe apps in VR headsets like Meta Quest 3 (see: Wooorld VR, AnyWR VR, which both have access to Google Earth and Street View’s data), I am pretty sure that there is a niche overlap of users who want to “slow-view” Street View locations and routes for virtual tourism purposes without leaving their house, especially using a VR headset.

But an “Immersive View” for Google Earth and associated third-party apps may have go in a different direction than Immersive View in Maps.

The AI-driven Immersive View can easily fit into Google Earth as a tool, smoothing over more of the limitations of virtual globes as a virtual tourism and adding more interactivity to Street View.

Sonar+AI in AR/VR?

Written around February 17.

Now if someone can try hand-tracking, or maybe even eye-tracking, using sonar. The Vision Pro’s 12 cameras (out of 23 total sensors) need at least some replacement with smaller analogues:

  • Two main cameras for video and photo
  • Four downward, 2 TrueDepth and 2 sideways world-facing tracking cameras for detecting your environment in stereoscopic 3D
  • four Infrared internal tracking cameras that track every movement your eyes make, as well as an undetermined number of infrared cameras outside to see despite lighting conditions
  • LiDAR
  • Ambient light sensor
  • 2 infrared illuminators
  • Accelerometer & Gyroscope

Out of these, perhaps the stereoscopic cameras are the best candidates for replacement with sonar components.

I can see hand-tracking, body-tracking and playspace boundary tracking being made possible with the Sonar+AI combination.

Leave a comment