High Fidelity’s architecture lets anyone wanting to create a virtual space put up a master server (called a “domain”) which can then recruit the help of multiple child servers to provide service for a location or event. For this test, people were visiting our welcome domain, called TheSpot (hifi://thespot). There are more than 100 domains that have been put up in this way by beta server operators using High Fidelity.
If you were at our event on Friday, you might have noticed on a few occasions that all the avatars would disappear for a few seconds, while the sound of everyone’s voices and your view of the surrounding environment remained. This was a live example of this distributed processing — the server we refer to as the “avatar mixer” was failing and restarting, while the other servers stayed connected to you. Each server is independent, and your client can connect or reconnect within a few seconds.
There are five types of servers, each used for different functions — 3D audio, avatar information, simulation of nearby objects, message transmission, and asset streaming. Additionally, domains will soon be able to use more than one of each of these servers, to scale to arbitrarily large audiences and complex environments.
In the future, people will also be able to register their devices to provide service to each other in this way in exchange for payment in HFC, enabling a compute and storage marketplace for VR. For this test, we registered five large Amazon EC2 instances to provide a separate machine for each server type. Each of these big Amazon machines has 72 cores and 25 Gbps network connectivity.
3D Audio for Crowds
To make it possible for everyone to hear each other in a large crowd but still keep the bandwidth constant, a server needs to mix together the sounds of everyone else into one compressed stream of audio for each connected client. And to make the audio 3D for an HMD-wearing user, this means a different stream for each ear, changing 90 times a second as you move your head around. This was made possible by a specialized server written to quickly compute and add together audio streams, while keeping the delay low.
Doing the Wave
Similar to audio, HMD-wearing avatars generate lots of information as they move their bodies around. A special server computes where everyone is looking, and sends them each an optimized stream for the avatars they can see, adjusted for how far away the avatar is and how close it is to their field of view. This must be done with the same low latency as with audio (about a tenth of a second), so that if you see someone waving and shouting at you from the audience, or have everyone do the wave (which I frequently do), the experience is completely normal.
Next Steps on the Road to One Billion in VR
As we bring larger and larger crowds into these tests, we will need to successfully complete additional planned phases of our development. With the largest available machines peaking out at support for a few hundred connected clients, we will next need to support multiple avatar and audio servers in one space, with people moving around between them. We will also need to improve our avatar level-of-detail systems and compression to enable getting thousands of people into view at the same time. For complex environments such as cities, we will additionally use the ability to nest object servers within each other, serving you only data from those servers you are inside or nearby.
The real world doesn’t impose limits on how many people or things can show up in the same place at the same time, and neither should VR. Join us in our monthly load tests (the next one will be on Friday, August 3rd) to help out and see how we are doing on our road to this ultimate goal.