While we may not yet know exactly what the Metaverse will look like, if we assume it to be a very big digital world with lots of people running around doing interesting things together, you can build a rough list of the key parts required beyond VR hardware and a fast internet connection:
- 3D Audio
- Big Crowds
- Interconnected Spaces
- Infinite Detail
- Live Editing
- Programmable Atoms
When near other people, we most often communicate using our voices. So there needs to be a way to hear many other people talking and also to hear things at a great distance. When two or more people talk at the same time, 3D audio is what allows you to understand them. If you are in a classroom or at a concert in the Metaverse, you will want to hear the teacher or performer, as well as your friend sitting right next to you, and also that group of people somewhat further away. You also will want to be able to hear music and sound effects, not just voice. Finally, the delay between when someone talks (or gestures) needs to be low enough for things to feel natural, which turns out to be about a tenth of a second (or even less, if you want to play music with people). And if you want to be able to be logged in and listening for a long time without fatigue, the audio quality needs to be outstanding.
This is a difficult problem: Most existing internet audio systems are designed for 1:1 or small groups and send an individual stream of audio to you for everyone who is speaking, and when 3D spatialization is added (which is rarely), the receiving client has to add the effect for each speaker, limiting audible sources at once time to just a few. Also, many game VoIP chat systems add closer to a second of latency, making natural conversation impossible.
Many important human experiences which we would like to replicate in the Metaverse require that a lot of people be together and visible to each other in the same place at the same time. Examples include lecture classrooms, company all-hands meetings, public streets, music concerts, or political rallies. To do this — preserving the detailed appearance of people while transmitting things like movement and body language at a low latency — is a largely unsolved problem. Current game engines and video/audio conferencing solutions can typically only support dozens, to at most hundreds, of concurrent people in one meeting room or space. We are social animals, and the powerful sense of moving in synchrony as part of a larger group is compelling and well documented. The Metaverse, to reach its full potential, will need to enable this experience for any number of people.
We will need the option to have a variable degree of pseudonymity in the Metaverse. As in the real world, context and environment will dictate what elements of your identity you will need or want to disclose. In a public space, that's probably very little. In a private company meeting, probably a lot.
Since this means that you will often meet people who you don't know, and often won't share your real name, there needs to be a way to learn something about them — such as whether they are known and well-liked by your friends, for example. You may also need to know whether someone has a good credit rating or has earned a certain college degree. Aspects of your identity will need to be stored separately and revealed selectively, making it important to have many ways of gathering information about what others think or can corroborate about you.
Reputation will also be needed to drive your ability to enter or to make changes to the virtual environment. Totally anonymous access combined with the superpowers we can have in a digital world — like action at a distance, creating things out of thin air, flying or making loud noises — will be an unstable situation. But with a good reputation system, we can confer appropriate rights and access in most cases without needing real-world ID.
Suppose that in the Metaverse your company owns an office, in a building, in a city. It is easy to imagine how this might happen . The city might be a popular destination for commerce, with appealing tax rates for the type of work your company does, and the building you work in might be a popular destination for companies offering the same kind of services you do. Furthermore, the appeal of having the office, building and city all be spaces that are connected together — meaning you can look out the window of the building and see other buildings — is due to the fact that our memories operate best when things are organized as physical structures that don’t change much over time. So we will want a way to connect spaces that are hosted, served or operated by many different people and companies into a stable, larger space, and this mechanism must be one that can be easily edited by those content creators and operators.
With the web, the connective tissue is the hyperlink, where one site links to another one through the "doorway" of clicking on a block of text or media. For the Metaverse, the strategy seems likely to be one in which developers can create volumes of space or literal doorways that link to other servers. So, the owner of the "building" server can link the space contained within one of the offices to your "office" server and visa-versa. In this way, larger spaces can be created (the city, for example) comprised of many different nested layers of servers, with the boundaries between these servers not needing to be visible at all.
Infinite Level of Detail
Reaching the scale and detail needed to create places in which many people are doing many different things will require a different approach to that traditionally taken by video game engines to render large scenes. The largest content sets used by "open world" video games (like Assassin’s Creed or Grand Theft Auto) are still very small in comparison to actual world-scale content sets.
A visitor to a virtual version of Manhattan — ascending from the subway at Columbus Circle — would be surrounded by an almost infinite amount of content, visible to them in all directions. Presenting this view to our visitor without too much loading delay requires that faraway objects be transmitted with progressively less and less information, and that objects even further away be somehow grouped together so that they can be sent as a single approximate chunk.
Moreover, all this content is constantly changing . The appearance of New York from one day or week or year to the next is very different. This means that the mechanisms for creating these lower-resolution approximations have to be constantly updated. For example, the view of a distant building is the view into a huge collection of office windows, each with constantly changing content. This capability is not yet built into any existing engines for 3D content. It needs to be created in a way that scales across content provided by many different servers. Imagine if all the billions of live pages on the web were tiles on a floor, and you were able to fly above that floor at any altitude and zoom in to look at things the way you can experience the world with Google Earth — that is the Level of Detail problem the Metaverse will need to solve.
As with the real world, the Metaverse will need to be a collective work that we can create and change together, in real time.
For example, if I come to your virtual office for a meeting, I might want to show you the latest version of a 3D model of a building my team has been working on. I’ll want to drop it on the table where we can both see it, and then you might want to change it to demonstrate your idea for a different roof design. Doing that means that we are both making live edits to the building in real-time, the same way we can make live edits to a Google document while talking on the phone. But doing this editing with 3D content, where there could be any number of people able to see it (someone walking by outside the office, for example) creates a very complex set of challenges related to privacy, permissions and streaming.
All virtual worlds are filled with objects of some kind — avatars, 3D models, sounds — the things you experience and interact with. Like in the real world, you can break them down into smaller and smaller pieces until you get to the smallest ones. Let's call these atoms. Digital atoms need to be bigger than physical atoms because there will be a lot fewer of them, given the amount computing power we have available for now. A single grain of sand, which is roughly the smallest thing we can see or pick up with our fingers, contains 50 billion billion atoms! And there are hundreds of billions of grains of sand on even a very small beach. So there needs to be far fewer virtual atoms to fit in the amount of memory we have today — maybe one atom per blade of grass.
In another example, if you want to put a picture on a wall in the Metaverse, you probably wouldn’t do it by making a tiny colored atom for each pixel in the image. Instead, you’d like to have a smart billboard object that you could command to “display the image from the following URL” using a programming language. Having a smaller number of atoms with potentially fairly complex behaviors suggests that programming languages be somehow attached to objects in the Metaverse. And if you move that billboard object around in the world, that code has to keep running for your billboard to work correctly. If you move it across servers (maybe the billboard is on the side of a bus, for example) how does the first server pass along the state of your billboard to the next one?
There are broader economic questions here too. When the truck with the billboard drives around some digital city, whose CPU is executing that billboard code? Who gets charged for that? Compute doesn't seem likely to be "too cheap to meter" in the Metaverse, so some sharing or market economy needs to exist.
If you want to buy something from or work for another avatar in the Metaverse, chances are the person behind that avatar is not from the same real-world country as you. In fact, there is no better than about a 30% chance, even if you are from a big country. So you probably don’t share a currency or payment system. Venmo, PayPal or Visa won’t work. Since digital things tend to be inexpensive and numerous, you need a payment system with low transaction fees. Because avatars will be both buyers and sellers, and many of them will work in the Metaverse in some capacity, you also need a system that lets a person easily convert the money they earned as an avatar into a real-world local currency so that they can buy food or pay rent. Supporting Visa or Venmo alone won’t do the trick.
Blockchains certainly look like a viable direction for this sort of payment system. However, they're not yet fully operational or legal everywhere, and they don't yet have the transaction capacity and speed the Metaverse will require when fully populated. Alternatively, existing payment systems might become widely distributed, fast and cheap enough to provide this capability, as demonstrated by systems like WeChat in China.
I've tried to focus on specific technical challenges for creating the Metaverse rather than discussing its specific applications, governance, open source nature or any of the many other interesting topics surrounding it. There are many great articles and references that capture other descriptions of or ideas about the Metaverse. Here are a couple to get you started: