Popularized by the rise of VR, spatial audio is the technique whereby sounds are processed to make them appear to come from their real location in space. For example, a barking dog might sound as if it is behind you and to the left. VR headsets and spatially-tracked headphones make this effect more lifelike than with normal speakers, as the spatial audio is constantly corrected for your actual head position.
Spatial audio is computationally intensive, and existing solutions are used almost entirely for local sound effects, where the receiver’s computer or phone can do the CPU work to spatialize the sound. The effect involves a series of subtle changes to the timing and frequencies of sounds to make your ears believe that they came from a certain direction.
How Does Spatial Audio Work?
There are two techniques used to move sounds left/right and in-front/behind you. Combined these techniques are often referred to by the acronym HRTF, which stands for "Head Related Transfer Function." The long name really captures two fairly simple things. The first one is just the time delay between your ears: A sound coming from your right gets to your right ear a little before it gets to your left ear, so we delay the arrival at your "far" ear to create the effect.
The second effect is a little more complicated, and is why our ears have such a funny shape (called the "Pinna"). Basically, the different frequencies of a sound are shifted differently for every direction the sound could be coming from. Your brain is trained to recognize those changes for any sound, and tell you which direction the sound is coming from. For example, the higher frequencies of a sound coming from behind you are harder to hear because they get muted by the skin of your ear (that they have to pass through). You can experience this yourself if you find a very quiet room and then rub your thumb against your fingers while moving them around your head. You will hear the frequencies shift higher and lower for different directions and angles. We (and also dogs) even instinctively tilt our heads to the side to better spatialize sounds.
So to spatialize audio, we take the original sound and do these two things — shift the time delay between the two channels and adjust the loudness of the frequencies — according to where that sound is supposed to be relative to your head.
When Should You Use Spatial Audio?
When people talk at the same time, you need spatial audio to be able to understand them. This is the ‘cocktail party’ effect, where the separation of people’s voices in space is what the brain uses to understand multiple voices at once. So if you need to have an online meeting where everyone can talk at the same time and chit-chat as we would normally, you need spatial audio to make it work. This is one of the reasons why Zoom (and other videoconferencing solutions) are frustrating and tiring — only one person can talk at once.
Spatial Audio API Is Now Available
Many apps and games would benefit from spatial audio for the above reason, not just videoconferencing solutions — social voice chat apps, games, live streaming events, virtual reality, and more. An API that allows you to integrate High Fidelity’s live spatial audio into web apps and games is now available. Learn more here and create a developer account.