I was blown away. One week ago I was in Cupertino, CA for Apple’s annual Worldwide Developer Conference and was invited to demo their forthcoming Vision Pro mixed reality headset. The physical design of the headset is beautiful—curved glass, sculpted aluminum and woven fabrics work together to create a distinctly Apple product. Inside, the technology is wildly impressive and the thoughtfulness behind the interaction design equally so. The “spatial computing” experience was as good as their announcement video showcased. I needed to sit with my first impressions for a minute to get over the initial wow-factor and think through how—and if—this device will find its way in the world.
“Pro” is in the name for a reason and it’s not only to justify the $3,500 price point. In Apple’s world “Pro” products are made for creative professionals who truly maximize a product’s capabilities and they’re more broadly consumed by those who aspire to do so—or simply want top-of-the-line and are willing to pay additional for it. Yes, the vision Apple shares in their video above includes many consumer scenarios because there are great opportunities for entertainment and communication, but the path to broad consumer adoption will likely be slow and steady. No doubt that over time we’ll see them release spatial computing devices that boast the same quality of experience the Vision Pro offers in a more affordable and wearable form. This is not to say that the Vision Pro is cumbersome when you put it on—it weighs about a pound and the “Light Seal” and “Head Band” comfortably distribute the headset’s weight—it’s just a very significant piece of hardware to be seen wearing on your face—even with EyeSight rendering the user’s eyes onto the exterior screen of Vision Pro. Fans and critics are already pontificating the Vision “Air” that could be lower in price and more like glasses than goggles.
Watching a clip from Avatar 2 was a significantly better 3D experience than seeing it in the theater wearing flimsy plastic glasses and the 180-degree Immersive Video (an Apple proprietary format) examples were more transportive than any screen-based video experience I’ve ever had. Then there was an interactive 3D demo with a butterfly that landed on my finger and a dinosaur that followed me around the room, making eye contact the entire time—all rendered with extreme detail, flawless animation and perfectly accurate placements. Even simply watching a movie clip in the “Cinema Environment” was deeply engaging. These incredible visual experiences were complemented by Personalized Spatial Audio—a seemingly gimmicky innovation Apple introduced to AirPods last year to manipulate sound to feel like it’s coming from the screen, regardless of how you move your head. Now we know this was simply their way of testing a key component for spatial computing in plain sight.
The foundation of Apple’s spatial computing is visionOS. One of its core tenets is that by blending digital content and experiences with the real world users can maintain physical presence and connection while in a headset experience. When putting Vision Pro on you aren’t transported to a virtual world or the Metaverse, you still see the physical space you’re actually in. The home view is a familiar grid of application icons floating in front of you in the room and instead of pointing and clicking the interaction is looking and pinching: cameras inside the headset track your eye movement and cameras on the outside track your hand movement. When I scanned the grid of icons each became highlighted as I looked at it. A quick pinch of my thumb and first finger (from either hand) acted as a click. Looking and pinching only took a couple minutes before it became a natural interaction that didn’t require forethought. This is Apple’s brilliance, designing hardware and software in harmony. I knew if they were releasing a headset they would shed the typical VR handheld controllers and had a hunch the solution would include eye tracking, but experiencing it first hand was still magical.
I glanced at the Safari icon and opened a new window to browse the web and easily resized it and moved it to a comfortable distance in front of me in the room. Perhaps the most exciting moments in the demo, however, were during a collaborative Freeform session with an Apple spokesperson, who joined me over FaceTime while also wearing a Vision Pro. She appeared as her “Persona”—a rendered version of herself created using the headset to scan her face to make an initial 3D model that then uses inputs from the eye tracking and downward facing cameras to accurately render blinks, glances and mouth movements in real time. I looked, pinched and gestured her over to the left side of my field of view, just past the other Apple spokesperson who was physically in the room with me and visible as part of the video feed of the surrounding space. We then opened up a Freeform board and both smoothly moved and resized elements. She then dropped the proverbial mic by adding a 3D model of an apartment and zoomed in. I realized I could look around the model and jumped up from my seat and started inspecting the furniture and layout as naturally as if they were physically there. Another example of Apple building key components of spatial computing in plain sight—the augmented reality toolset (AR Kit) that enabled this part of the demo has been around and included in iPhone and iPad apps since 2017. This idea of a virtual walk-around is nothing new—automotive and other product design teams have been using VR for years to evaluate early concepts before moving into physical iterations. The headsets they use, though, rely on complicated software and hardware that’s significantly more expensive than Vision Pro.
EyeSight is the name for the Vision Pro’s render of the wearer’s eyes to the outside screen of the headset to show people around them when they’re visible to the user. Using the same Persona feature that creates the wearer’s avatar in a FaceTime, EyeSight shows blinks, gazes and eye movements in real time. I didn’t get to see this in action but its intent is to facilitate communication between the user and someone in the room with them. The idea is to help you “remain more connected to the people around you” and it’s important to realize this is about being more connected while wearing the headset, not more connected in general. I’m eager to try this feature in the future—Persona works really well in a FaceTime environment but how will it translate for someone interacting with a Vision Pro wearer that they’re looking at with real eyes?
My demo was over after about 30 minutes but I would have loved to go on for much longer. I spent the rest of the day with my head flooded with questions. Would Vision Pro represent the beginning of a seismic shift akin to the iPod or iPhone? Or might it be more niche? One thing is for sure, Apple is brilliant at building product and service ecosystems spanning hardware and software. Having released and improved several of the foundational components for spatial computing over the last six years, it’s now clear they were building toward visionOS all along. What else have they been experimenting toward? Whether Vision Pro and the presumably forthcoming other Vision devices are destined to be mass or niche, they represent a major milestone as Apple’s newest product category and as excited as I am for it I also can’t help but start to look for the signals of what this could mean is coming next.
Apple Vision Pro will be available in early 2024 for $3,499.
Hero image by Josh Rubin