Apple’s VisionOS Makes a Bold Leap in Computer Interface

Apr 26, 2023

Steven Levy

Like everyone else who got to test Apple's new Vision Pro after its unveiling at the Worldwide Developers Conference in Cupertino, California, this week, I couldn't wait to experience it. But when an Apple technician at the ad hoc test facility used an optical device to check out my prescription lenses, I knew that there might be a problem. The lenses in my spectacles have prisms to address a condition that otherwise gives me double vision. Apple has a set of preground Zeiss lenses to handle most of us who wore glasses, but none could address my problem. (Since the Vision Pro is a year or so away from launch, I wouldn't have expected them to handle all prescriptions in this beta version; even after years of operation, Warby Parker still can't grind my lenses.) In any case, my fears were justified: When I got to the demo room, the setup for eye-tracking—a critical function of the device—didn't work. I was able to experience only a subset of the demos.

What I did see was enough to convince me that this is the world's most advanced consumer AR/VR device, and I was dazzled by the fidelity of both the virtual objects and icons floating in the artificially rendered room I was sitting in, and the alternate realities delivered in immersion mode, including sports events that put me at the sidelines, a 3D mindfulness dome that enveloped me in comforting petal shapes, and a stomach-churning excursion to a mountaintop that equalled the best VR I’d ever sampled. (You can read Lauren Goode's description of the full demo.)

Unfortunately, my eye-tracking issue meant I didn't get to sample what might be the most significant part of the Vision Pro: Apple's latest leap in computer interface. Without a mouse, a keyboard, or a touch-sensitive display screen, the Vision Pro lets you navigate simply by looking at the images beamed to two high-resolution micro-OLED displays and making finger gestures like tapping to choose menu items, scroll, and manipulate artificial objects. (The only other controls are a knob called a digital crown and a power button.) Apple describes this as "spatial computing," but you could also call it naked computing. Or maybe that appellation has to wait until the approximately 1-pound scuba-style facemask is swapped out in a future version for supercharged eyeglasses. Those who did test it said they could master the tools almost instantly and found themselves easily calling up documents, surfing through Safari, and grabbing photos.

VisionOS, as its called, is a significant step in a half-century journey away from computing's original prison of an interface: the awkward and inflexible command line, where nothing happened until you invoked a stream of alphanumeric characters with your keyboard, and everything that happened after that was an equally constricting keyboard workaround. Beginning in the 1960s, researchers led an assault on that command line, starting with Stanford Research Institute's Doug Engelbart, whose networked "augmenting computing" system introduced an external device called the mouse to move the cursor around and select options via menu choices. Later, scientists at Xerox PARC adapted some of those ideas to create what was to be called the graphical user interface (GUI). PARC's most famous innovator, Alan Kay, drew up plans for an ideal computer he called the Dynabook, which was sort of a holy grail of portable, intuitive computing. After viewing PARC's innovations in a 1979 lab visit, Apple engineers brought the GUI to the mass market, first with the Lisa computer and then the Macintosh. More recently, Apple provided a paradigm with the iPhone's multi-touch interface; those pinches and swipes were intuitive ways of accessing the digital faculties of the tiny but powerful phones and watches we carried in our pockets and on our wrists.

The mission of each of those computing shifts was to lower the barrier for interacting with the powerful digital world, making it less awkward to take advantage of what computers had to offer. This came at a cost. Besides being intuitive by design, the natural gestures we use when we’re not computing are free. But it's expensive to make the computer as easy to navigate and as vivid as the natural world. It required a lot more computation when we moved from the command line to bit-mapped displays that could represent alphanumeric characters in different fonts and let us drag documents that slid into file folders. The more the computer mimicked the physical world and accepted the gestures we used to navigate actual reality, the more work and innovation was required.

Vision Pro takes that to an extreme. That's why it costs $3,500, at least in this first iteration. (There's an argument to be made that the Vision Pro is a 2023 version of Apple's 1983 Lisa, a $10,000-plus computer which first brought bit-mapping and the graphical interface to a consumer device—and then got out of the way for the Macintosh, which was 75 percent cheaper and also much cooler.) Inside that facemask, Apple has crammed one of its most powerful microprocessors; another piece of custom silicon specifically designed for the device; a 4K-plus display for each eye; 12 cameras, including a lidar scanner; an array of sensors for head- and eye-tracking, 3D mapping, and previewing hand gestures; dual-driver audio pods; exotic textiles for the headband; and a special seal to prevent reality's light from seeping in.

Jeremy White

Kate Knibbs

Jeremy White

WIRED Staff

Armed with all that hardware, software, and the bounty from over 5,000 patents, the Vision Pro—and, implicitly, its successors—presumes to guide us on an ascent to the summit of natural computing. But during the demo, when I wasn't immersed in lifelike 3D depictions of baseball games, a recording studio, and a tightrope between some mountains, I sensed that this step takes us into uncharted territory. Previous interface leaps were all directed to help us reach inside the digital world to exploit its power; when you selected a folder on the Macintosh, you were putting your hand in the soup of computing. But the Vision Pro puts us inside the digital world, separating our senses from the physical realm. Even in the mode where you’re not using any apps, the Vision Pro's cameras and displays are working hard to render the actual room you’re sitting in. It looks real, but that reality is as ephemeral as the app icons floating in space, waiting until you select one by your glance. This raises the question whether we’ve gone too far to push a natural interface: Do people want to leave the real world to do their work and other digital tasks? The question is open.

As it turns out, the territory has been somewhat charted, though Apple has gone farther than any other company in actually taking us there. In an email exchange with me this week, Alan Kay told me about a slide he made in the early 1970s imagining multiple forms for the Dynabook. The 1972 sketch pictured not only a modern tablet version of the Dynabook, but a pocket-size machine inside a pair of eyeglasses, and a hand gesture to control some unseen screen. And the hand is wearing what looks like an Apple Watch!

Kay says he drew on ideas from computer pioneers Ivan Sutherland and Nicholas Negroponte for the sketch. He also pointed me to a paper Sutherland wrote in 1962 that was even wilder. "The ultimate display would, of course, be a room within which the computer can control the existence of matter," Sutherland wrote, the year Tim Cook turned 2 years old. "A chair displayed in such a room would be good enough to sit in. Handcuffs displayed in such a room would be confining, and a bullet displayed in such a room would be fatal. With appropriate programming such a display could literally be the Wonderland into which Alice walked." Though boasting that it has indeed created a wonderland, Apple hasn't made that happen. But the Vision Pro makes you feel almost like it could, boldly making its case that immersive, er, spatial computing, is the platform of the future.

But hold on. The platform of the right now—generative AI—begins with a blank field. But instead of requiring an incantation in computer-speak, you open up the powers of the digital world via the natural language of your choice. A simple prompt can produce computer code, essays, original art, freshly composed music—and a Pandora's box of fabrications and biases. The command line isn't dead yet.

Jeremy White

Kate Knibbs

Jeremy White

WIRED Staff

The Vision Pro's origins hearken back to efforts in the 1960s and ’70s to make interfaces friendlier and more powerful. In Insanely Great, my book about the Macintosh, I described how PARC's Alan Kay was working on a forgettable project called Flex, which was a bust. But the lessons learned ultimately led to Dynabook, which inspired the Macintosh.

[Flex's] failure led Kay to an examination of what a "user interface" meant. The term commonly referred to a set of screen prompts and commands that allow a person to communicate his or her wishes to the computer. "The practice of computer interface design has been around at least since humans invented tools," Kay later noted. Yet very little thinking had been devoted to promoting friendly, intuitive computer interfaces.

Could an interface be designed that ordinary people could use? This was an unconventional question in those days, when it was rarely assumed that ordinary people would ever have reason to belly up to a computer keyboard. But Kay was already pondering ideas like people relating to a computer intimately. He found himself reading Marshall McLuhan's Understanding Media and pondering its seminal koan, "The medium is the message." Then he had his flash of enlightenment, "a shock that reverberates even now," he wrote over 20 years later in The Art of Human-Computer Interface.

"The computer is a medium! I had always thought of it as a tool, perhaps a vehicle—a much weaker conception. What McLuhan was saying is that if the personal computer is truly a new medium then the very use of it would actually change thought patterns of an entire civilization."

Greg writes, "In the first season of Better Call Saul, what began as a small case of nursing home overbilling ballooned into a multistate case of corporate fraud. The overwhelming amount of research required meant that Saul's baby had to be ported out to two huge firms. It occurred to me that this would be a perfect job for AI. Yet in all the articles I've read, the focus is limited to AI's impact on creatives. Law would seem a perfect candidate for this kind of brute force research."

Greg, where have you been? AI has already transformed the law profession. Instead of having law associates painstakingly go through depositions and filings, firms can take advantage of outside services that Hoover up case files, analyze them, and zero in on key points. OpenAI itself has partnered with law firms to help the process. Had Jimmy McGill (he wasn't Saul yet) filed the Sandpaper Crossing class action suit in 2023 and not in 2002—and had it been a real lawsuit and not fictional—the massive documentation involved would have almost certainly been processed by AI tooIs, including large language model chatbots. The media has been all over this disruption.

Jeremy White

Kate Knibbs

Jeremy White

WIRED Staff

But one thing that AI can't do yet is produce reliable court filings on its own. Last month, when a lawyer outsourced the actual writing of a brief to ChatGPT, in a case against Avianca airlines, the opposing lawyers discovered that the chatbot had produced quotes and citations from cases that didn't exist. Would Jimmy have tried this? It might have made for a great Better Call Saul plot twist. But I bet that his diligent partner Kim Wexler would have overruled him. Maybe Kim would have signed the AI-pause letter, too.

You can submit questions to [email protected]. Write ASK LEVY in the subject line.

New York City turns orange. Not in a good way.

WIRED reported on the Vision Pro from the Apple Park launch.

Here's everything else Apple announced at WWDC. A 15-inch MacBook Air sounds pretty cool.

Jeremy White

Kate Knibbs

Jeremy White

WIRED Staff

A delicious deep dive into one of the many companies trying to clone Twitter.

A contrarian view on venture capitalist Marc Andreessen's 7,000 word screed that declares AI the savior of the world.

Bonus sweepstakes plug: WIRED is partnering with Speck to give two readers the chance to win a Samsung Galaxy S23 with two Speck cases. Enter here.

Plaintext ASK LEVY

Previous: I got an awful eyelash extension now my eyes are fused shut Next: CITES: A historic treaty protecting endangered species turns 50. Is it still an effective tool?

Send inquiry

Send