Apple's long-term strategy is firmly anchored in the development of spatial computing, a commitment that was emphasized at their developers conference this year. Spatial computing is not synonymous with the metaverse, nor is it simply a new iteration of VR or AR devices. So, what exactly is spatial computing? In this essay, we aim to delineate Apple's understanding and application of spatial computing, and how they are navigating their strategy away from the prevailing skepticism surrounding VR/AR and the buzz of the metaverse. True to their idiosyncratic brand identity, Apple has been rafting a distinct narrative in this space which they call spatial computing, which we have previously explored by calling it the disappearing computer paradigm (here and here).
What comes next, we frequently ask in tech? Many commentators would reply: the metaverse. Apple, however, advocates for something else: spatial computing. This isn't necessarily a pathway to a metaverse; it represents a more fundamental shift. Spatial computing is a framework that reshapes our understanding of computing and how we interact with technology, laying the groundwork upon which a metaverse, a virtual 3D world, could be established.
In recent years, numerous tech companies have jumped on the metaverse bandwagon or at least repositioned themselves to accommodate the hype within their existing strategy. Apple has adopted a more cautious approach, a stance elucidated with the introduction of the Vision Pro and its focus on spatial computing.
The prevailing vision of the metaverse is a fully immerse and interconnected 3D virtual ecosystem accessible through various devices, including phones and, ideally, virtual reality goggles and haptic suits, offering a deeper immersion into a vibrant 3D universe. Metaverse experts predominantly concur that the path to a full-blown metaverse involves crafting an entirely new virtual reality that provides an escape to another world as depicted in the film Ready Player One.
This vision foresees the internet's evolution into a continuous virtual reality inhabited by many, a stark contrast to today's internet. However, in our opinion, this should not be interpreted as the transition from a physical universe to a virtual universe, nor from an offline 'first life' to an online 'second life' Instead, we prefer to see it more gradually and hybrid. We already ‘live’ to a great extent in virtual words albeit very basic ones. When we are ‘surfing’ on the web, we use spatial metaphors — 'visiting' 2D websites, 'opening' apps, 'entering' 2D game worlds — to describe our current digital interactions. These terminologies emphasize the spatiality of digital realms but also highlight the lack of full immersion.
While the metaverse narrative champions the exploration of this ‘other place’, Apple maintains a focus on ‘this place’. The Vision Pro goggles, set for release next year, embodies this philosophy. As tech analyst Benedict Evans explained recently, when you put on the Vision Pro, you are not going anywhere. It doesn't transport you elsewhere; it starts with a projection of your immediate surroundings. It digitally simulates your environment, offering not a gateway but a digitally augmented mediation to your existing reality. You are watching a screen but the base experience is to function as a type of glasses. So, in the spatial computing paradigm of the Vision Pro, you stay rooted in the physical world first.
With the Vision Pro, users can then choose to incorporate layers, apps, or digital items into a '3D desktop,' enhancing their physical space with digital extensions that could range from a movie screen to an extension of your MacBook, embracing a digitally mediated 'window' to reality. Unfortunately, that brand name is no longer up for grabs.
Adding layers to our reality, that sounds a lot like augmented reality (AR), doesn’t it? One might argue that spatial computing bears a strong resemblance to AR, and indeed, the spatial computing paradigm aligns more closely with AR than with VR. However, it also incorporates significant VR functionalities, allowing users the option to immerse fully in a simulated environment, disconnected from their actual surroundings.
Given this, it seems prudent to separate the discussion of spatial computing and Vision Pro from the existing VR/AR discourse. Moreover, by positioning spatial computing as something else, Apple is also shielded from the long shadow cast by numerous AR/VR setbacks. Currently, the industry is navigating a “VR winter”, a period marked by the ongoing struggle to find the 'killer app' that would convince users to embrace the often cumbersome and discomfort-inducing VR goggles. Despite VR's lengthy history, that killer app has yet to materialize.
Following a period of experimentation in the 80s and 90s, the contemporary attempts by companies such as Meta, Magic Leap, Snap, and Google to popularize these technologies have largely met with disillusionment. Today, most major tech corporations have shifted their focus towards generative AI resulting in many lay-offs in divisions working on mixed reality. Apple, however, seems to spark more enthusiasm, likely due to their more cautious approach and their reputation for launching novel computational platforms.
Spatial computing thus stands apart from both the metaverse and AR and VR; it is not defined by them, nor does it herald their advent. The Metaverse is primarily a concept or an idea. VR and AR represent interfaces or technologies. Then what exactly signifies spatial computing? We consider spatial computing as the successor of desktop computing and mobile computing. It stands for a novel approach to computing that hints to a forthcoming paradigm shift.
This idea of spatial computing thus transcends being merely a future interface. Consequently, the Vision Pro is a means to realize this new paradigm, not the end goal. While the Vision Pro is poised to be a critical conduit in the spatial computing environment, Apple remains careful, refraining from declaring it the definitive interface for the future of spatial computing. Instead, it is positioning the Vision Pro as a strategic long-term investment, encouraging developers to explore its potentials without harboring immediate expectations of mass consumer adoption. In this sense, the low projected sales will not be a testament of a failed product, at least not in the short-term.
Nevertheless, a new interface like the Vision Pro, will impact how we will engage with our environment. For example, the emergence of mobile technology did not lead to the abandonment of laptops and televisions. It did, however, drastically alter our engagement with digital systems, introducing us to a world maneuvered by swiping thumbs, and birthing the app and platform economy, which subsequently paved the way for the ubiquity of social media and the phenomenon of short videos, among other advancements. With the iPhone, Apple had an important role to play in this paradigm shift. For example, by replacing 'software terminologies’ with 'apps' and rejuvenating user interfaces with retro skeuomorphic designs (this is when designers carry over elements of the original object over to the representation). The iPhone spearheaded an era marked by the influential roles of content creators, influencers, and gig economy workers, including Uber drivers and Airbnb hosts.
A new computing paradigm is thus more about a broader reconfiguration of computing devices, applications, interfaces, and business models, that is; of the entire digital Stack (read more about our framework here) and the embeddedness of users within it. Instead of yearning for a watershed moment and wondering when the new computational platform will finally arrive, it's more realistic to envision the coming decade(s) as a steady progression towards this paradigm in which Apple and competitors slowly encapsulates their users.
While the integration of spatial computing is anticipated to be gradual, Apple has boldly designated the Vision Pro as its flagship embodiment in this emergent computing paradigm. This initiative alone warrants close monitoring of the device, not only for its technical specifications but also for the pivotal role it is slated to play in the reshaping of the computing landscape.
We consider the Vision Pro a prototype that will continue to be refined before it will become attractive for larger audiences. Despite its developmental status, the prototype showcases distinguished features that set it apart from its counterparts, positioning it as a premium device in the market. It holds a promise of being more than just a device, potentially becoming an important next step in the unfolding narrative of the new computing epoch. This merits some close attention to its technical features.
The Vision Pro has outstanding features and characteristics very different from comparable devices. As Benedict Evans recently phrased it in a podcast, “It is not a VR goggle or AR glass of any type linked to a computer, but a computer built in a completely new shape”. This hints to a new paradigm. Building this high-end device for such a premium prize, Apple seems to say we want to do it good or not do it all.
Despite Evans claims, the Vision Pro still looks like a bulky goggle set that nobody will wear, definitely not in public life. Nevertheless, a deeper examination of the device's specifications reveals technical features and use cases that are indeed suggestive of the future of spatial computing.
One of the features pointing towards the future paradigm is the multimodal user interface, an area where Apple has always excelled. Multimodal interfaces support user input and processing of two or more modalities—such as speech, pen, touch, gestures, gaze, and virtual keyboard. These input modalities may coexist together on an interface, but be used either simultaneously or alternately.
With the Vision Pro, Apple is leveraging all the spatial computing building blocks that it has developed over the past years, such as voice recognition with Siri, biometric authentication with smartphone sensors, and spatial audio with the earbuds. They all come together in the new operating software running on the Vision Pro. As a result, the Vision Pro supports 'natural control' with hand, eye, and voice commands, enabling a more intuitive interaction in a blended environment of physical and virtual objects.
Furthermore, the Field of View (FOV) of this immersed environment is presumably superior to competitors. Most VR/AR devices give a sensation akin to peering through a keyhole; you have the feeling you are watching a square surrounded with blackness. We are still left guessing about Apple’s FOV, but the 23 million pixels and remarkable computational capacity indicate a potential breakthrough.
About that computing power: the device operates on two chips Apple developed over the years for other hardware; the M2 Ultra and R1. While one chip manages the internal processing of apps, the other delivers real-time, energy-efficient processing power for sensory data. In doing so, it can overlay your physical surroundings onto your screen and allow you to ‘see through’ the glasses, lending them the AR characteristic we discussed earlier.
Moreover, the processing power of these two chips sort of emulate our brain, with one processor for primary sensory data and the other for deeper brain structures. We could even say that the Vision Pro augments or potentially integrates with these artificial brains, thus transforming us into (very basic) cybernetic organisms, i.e., cyborgs. What we see from our environment is not what our eyes see, but what our artificial senses see and pass along.
However, this cyborg state will be brief, as the battery life is only up to two hours and external to the goggles. The Vision Pro is not a product to wear all day and if we believe the marketing videos, also not something you take outside. Given the high price ($3500), only a few will buy it. Calling it the Vision Pro, one might expect a more affordable version will arrive in 3 to 5 years. For now, it’s not surprising that Apple is marketing it as a complement rather than a replacement for other devices.
What are we going to do with Vision Pro? It seems that Apple has envisioned this device being utilized predominantly in private spaces such as homes or potentially during flights, rather than being a companion in public arenas. The focus is evidently tilted towards individualized home entertainment and work-related functionalities rather than fostering social interactions in virtual realms or enhancing communal spaces, paths previously explored by rival companies like Meta and Snapchat.
The Vision Pro encourages users to momentarily disconnect from the traditional interfaces of smartphones, desktops, or smart TVs, allowing a more immersive interaction with digital content by integrating it into the physical space that surrounds them. Whether it’s revisiting family holiday photos, binge-watching the latest series, or crafting a dynamic keynote presentation, the Vision Pro’s goal is to facilitate a rich and personal computing experience. Moreover, the screen, boasting a resolution surpassing 4K pixels, promises to deliver performance on par with high-end (O)LED screens, enhancing the viewing experience manifold.
With its roots firmly planted in the home entertainment sector, Vision Pro offers a broad spectrum of use cases, showcasing a prudent strategy from Apple. Similar to the app store, it empowers developers and users to navigate the scope of its functionalities, deciding what resonates and what falls short in the coming years.
In this evolving digital landscape, the allocation of apps, tasks, and features is determined by the interface most suited to the user’s immediate environment and needs. Consider, for instance, the discomfort most people experience when sending a voice message in a crowded train; textual input maintains its stature as a potent tool for communication, standing as a testament to the brilliance of the current Graphical User Interface (GUI).
The Vision Pro opens up new possibilities in this space. Imagine collaborating on a keynote presentation not just through laptops, keyboards and collaboration software tools, but also through the intuitive mediums of drawing gestures and voice inputs. This is the frontier that Apple aims to conquer as it ventures deeper into the realms of spatial computing.
While certain aspects, such as filming a child with the headset might seem unconventional today, it is vital to remember the initial skepticism surrounding smartphone cameras. Today, everyone is filming everything. It underscores the propensity of technology to reshape norms, possibly leading to unforeseen and innovative uses, altering our digital interaction landscape significantly. However, this reshaping of norms does not always imply uncritical acceptance of something that was taboo before.
Predicting the influence on social norms is far from simple as it engages with the intricate dynamics of society’s interactions with emergent technologies. This goes beyond simply being an early or late adopter, as often touted by marketeers or tech companies. On the contrary, it delves deep into the qualitative transformations’ technologies can bring into our lives, sometimes even stirring ‘techlashes’ due to their pervasive influence in mediating relationships and affecting our day-to-day lives. In other words; culture also has the capacity of saying no. Schools, for example, are now massively prohibiting smartphones in high-school education.
Devices like the Vision Pro compel us to grapple with substantial qualitative and ethical dilemmas. We remember the trajectory of Google Glass and the privacy controversies it spurred, a period when Mark Zuckerberg also prematurely declared the end of privacy. Yet, culture resisted, putting a halt to the constant surveillance facilitated by wearable technology, reaffirming the value of privacy.
Apple seems to be taking a more strategic path, aligning its product with cultural sensitivities, albeit with an undertone of risk given their longstanding commitment to privacy. Despite their efforts to demarcate themselves from the privacy transgressions of Meta and Google, the Vision Pro represents a paradigm where the very tools that promise connectivity also carry the potential of invasive surveillance, constantly monitoring users in their intimate spaces. It’s a precarious path, inviting us to question how society will perceive this initiative from a behemoth like Apple.
Moreover, as the use of smartphones in high schools shows, it is not simply about solving privacy issues. It has to do with the deeper ramifications of technology on human relations and cognitive development, of not being able to concentrate, of not paying attention to others, and even of hindering the nurturing of meaningful relationships.
Smartphones have become a ubiquitous part of our lives to the point where we feel incomplete, almost ‘amputated’ when we forget to take them with us. Currently, these devices serve as both a connector and a barrier. They link us to distant loved ones while sometimes distancing us from the people right beside us.
The Vision Pro, as a prototype of spatial computing, heralds an intensified iteration of this phenomenon, introducing a paradigm where digital devices are not just handheld, but worn, immersing us even deeper into a technology-mediated reality. This leap forwards — or perhaps, inwards — embodies the profound paradox of digital technology perfectly. On one hand, it amplifies the barrier smartphones created, placing a literal screen 'in between us'. On the other hand, it offers unprecedented levels of connectivity, facilitating a constant virtual presence with others through a multitude of digital platforms, albeit mediated through layers of technology.
The disappearing computer paradigm foretells the elimination of bulky desktops and (television) screens with obtrusive wires and cables. Instead, we interact seamlessly with technology through elements like voice and touch interfaces and projected screens. Ultimately, 'the computer' becomes so deeply embedded into our environment that it essentially disappears as an isolated entity, becoming fully ubiquitous.
This paradigm brings about the era of spatial computing, where the physical world is no longer a host for digital objects and technologies but intricately interacts and integrates with the digital space. The Vision Pro marks the start of Apple’s quest to establish a human-computer interaction paradigm in which the computer becomes practically invisible. In this paradigm, computing devices have dissolved into our everyday environment or we have merged with them (thus become “cyborgs”).
One direction in which our computers can disappear in this future world is by saturating our physical environments with sensors, AI and natural interfaces that enable input through touch, gestures, speech and/or biometric authentication and output through displays, LEDs, speakers and even actuators (e.g. smart appliances).
Thus, spatial computing is more about a blend of physical and virtual spaces rather than a migration to the latter. Seen from this perspective, purchasing groceries in a cashier-less store, talking to your smart speaker to order groceries while cooking, or using augmented maps to view restaurant details, are now common manifestations of this disappearing computer paradigm.
Another direction becomes visible in the way technology entwines with our physical bodies. Wearables such as earbuds and smartwatches are far more intertwined with our bodies than desktops and laptops ever were, providing subtle tactile and auditory feedback to aid us in our daily activities.
The smartphone appears to serve as the bridge connecting these two realms of consumer devices. But instead of one primary computing device accompanied by a few smart accessories and work-related powerhouses, it is more accurate to see spatial computing as a distributed array of interfaces, including laptops, mobiles, goggles, earbuds, watches, etc.
In this approaching era of spatial computing, our reality is filtered and articulated through advanced devices integrated into our very being and environment. We find ourselves in simulated landscapes, experiencing the world and each other through digitally-augmented lenses. It hints at a future where computers are so ingrained in our daily experiences that they seemingly vanish, becoming a natural and indistinguishable extension of ourselves.
This is a future brimming with possibilities and challenges, prompting us to navigate the complex landscape of a world where technology is not just a tool, but a continuous, ever-present companion in our interactions with reality and with each other. The question that looms is how we will balance this intricate play between connectivity and isolation, immersion and authenticity, as we step into a world where computers, while disappearing, permeate every facet of our existence.
As for Apple, its vision of spatial computing and the upcoming Vision Pro signifies a new episode in their ongoing journey towards the disappearing computer. Already rumors are circulating about a second edition with reduced prices and enhanced specs, but the leap to ubiquitous public use of the Vision Pro remains a formidable (cultural) challenge. Moreover, Apple's steady encroachment into users' lives in the spatial computing era is both intriguing and alarming. It invites scrutiny into the swelling dominance of tech giants, with society increasingly resisting their overreach. This emerging era positions Apple under a spotlight, potentially marking a pivotal juncture in its trajectory within the slowly unfolding spatial computing landscape.