You walk into your local multiplex, find your seat, and after a few minutes, the lights slowly start to dim. As the lights flicker above, that now-famous Hans-Zimmer-Inception-inspired sound thunders around you and reverberates throughout the theatre. It’s a sound that has become recognizable and increasingly popular among sound engineers at major motion picture studios. (Check out this audio clip mashup.)
Sound is a critical component of entertainment experiences and is intended to draw in the audience or participants, like The Void’s Star Wars VR experience. To date, achieving this type of audio experience has been done through stereo sound. However, rapid advancements in audio have introduced amazing innovations to how sound can be developed, processed, and experienced. Interestingly, spatial audio first appeared briefly in video games in the 1990s (a blog post for another time), but then disappeared until VR began to surge in 2012 when Oculus first emerged. Audio tech is really experiencing a new wave of momentum and industry-leading audio engineers, sound designers, and developers are starting to push the limits of audio technology.
The question must be asked – what are the major differences between stereo audio vs. binaural audio vs. spatial audio? Why would you use or want one over the other two?
Stereo Audio vs. Binaural Audio vs. Spatial Audio: Major Differences
First, we should define what we mean when we say stereo audio, binaural audio, and spatial audio.
Stereo audio is essentially recordings made through two distinct channels of audio. As a result, the audio being transferred through one side of the headset will differ from what’s being produced in the other. In this instance, stereo audio creates the illusion of immersion, but does not create a multidimensional or interactive soundscape.
Binaural audio technically refers to audio captured in a way that a person would hear the sound exactly as they would in the real world. The audio is captured using a dummy head with microphones embedded in its ears and placed in the environment where the sound is being made. For example, recording a drum set being played in a small bathroom with mostly tile is going to sound very different than the same drum set being played in an outdoor setting. The trick is that everyone’s ears are a little bit different. For example, unless the dummy head is an exact replica of your own head, it won’t sound exactly the same. Similarly, unless you have a highly trained ear, you’re most likely not going to be able to tell much of a difference. Typically, binaural audio recordings can be captured using very old tech. In most cases, it’s as simple as drilling a holes in a dummy head, putting a few small microphones around the ears, and then placing the dummy in the middle of a symphonic orchestra. When a listener puts on the headphones, it will sound as if she or he has the best seat in the house.
And then there is spatial audio. Spatial audio simulates sound in real-life, which we will come back to at the end of this post. High Fidelity co-founder and CEO Philip Rosedale describes spatial audio as the “technique whereby sounds are processed to make them appear to come from their real location in space” relative to your head and the direction it is facing. For example, a barking dog might sound as if it is behind you. In this way, sound appears more lifelike and real. Most virtual reality headsets and spatially-tracked headphones “correct” the audio relative to your position. In this way, as you slowly rotate your body to face the dog, the sound inside your headset will “move” with you.
Recording in Stereo Audio vs. Binaural Audio vs. Spatial Audio
A stereo audio recording is captured using two microphones to capture sounds simultaneously. The mono signals from each microphone are assigned to the left or right channel. The stereo effect is achieved through the slight variation in the sound between the left and right channels. By placing the microphones at slightly different locations, the recorded sound will arrive at the microphone at a slightly different time and at different levels. A difference in just a few milliseconds is enough to create the illusion of width and space absent in mono recording.
The first known binaural recording appeared in the late 19th century. Today, a binaural audio recording is typically captured using a specialized microphone, like this, and they start at about $400. The obvious drawback of binaural audio is that it is incredibly costly. It’s even incredibly expensive to purchase existing binaural audio tracks. Chances are there aren’t a whole lot of times where it is really necessary. Early binaural recordings are essentially the first demonstrations of what spatial audio would inevitably become. Another drawback is that the effect is baked into the recording, and cannot account for dynamic changes to the source positions or your head orientation.
When recording spatial audio, you first have to consider the listener’s point of view and place the microphone at that location. Specifically, as a sound engineer or developer you need to decide the x, y, and z coordinates of the listener and which direction they’re facing. A spatial audio recording is captured as if you’re using two techniques to move sounds left/right and front/back. When combined, these techniques are often referred to as Head-Related Transfer Function (HRTF). HRTF recordings can be processed through stereo headphones, but will sound as if the audio is coming from all directions, rather than just two points from the left or right channels. As such, subtle changes to the sound's timing and frequency will trick your ears into believing the sound is coming from all directions.
“To spatialize audio, we take the original sound and do these two things – shift the time delay between the two channels and adjust the loudness of the frequencies – according to where that sound is supposed to be relative to your head,” says Rosedale.
What’s particularly different about High Fidelity’s Spatial Audio, is that it’s cloud-based. In near real-time, it takes both the position of hundreds of people (or sound sources) mixes all of those sounds on the server, and then delivers back a single mixed and spatialized stream to each listener.
RELATED READING: What is Spatial Audio?
Listening to Stereo Audio vs. Binaural Audio vs. Spatial Audio
Listening to stereo audio comes from two fixed positions from either the left or right channels. In this scenario, if you take out your left earbud, you will only hear sound mapped to the right position. If you are not using headphones, you must be equally distant from the right and left speakers, as your placement within a room relative to the speakers will affect stereo imaging. Obviously this all gets more complex with the number of speakers and their placement around the room, plus the acoustics of that room.
Listening to binaural audio places you center-stage at the orchestra. Listening to binaural audio places you in a truly immersive experience. In this scenario, sound is positioned in a 360-degree space around you. However, binaural recordings sound the same no matter where your head is positioned. It really only makes sense to listen to a binaural audio recording with headphones to replicate a specific listening experience.
Listening to spatial audio places you at the center of a truly immersive environment. In fact, spatial audio allows you to experience the audio as if you are an active part of an audio scape. The sound experience changes as you turn your head or rotate your body if you’re wearing headphones or a VR headset that tracks your head movement or if you are controlling an avatar with your keyboard, mouse, or touch screen. For this reason, spatial audio is an invaluable tool for VR content creators and in other platforms trying to create a realistic sense of place with dynamic audio experiences.
Example of Stereo Audio vs. Binaural Audio
Experience the binaural audio vs. stereo audio. For this example, we recommend wearing headphones.
So Where Does Spatial Audio Fit In?
What if Zoom had spatial audio? At the end of the day, you’d rise from the desk feeling less tired. If you don’t believe us, we recommend putting on a pair of headphones and listening to the recording below.
“When people talk at the same time, you need spatial audio to be able to understand them,” said Rosedale in 'What if Zoom Had Spatial Audio?'. “This is the ‘cocktail party’ effect, where the separation of people’s voices in space is what the brain uses to understand multiple voices at once.”
Why Use Spatial Audio?
Today, more and more applications and games are benefitting from spatial audio. The use cases are boundless, from social chat apps and immersive games to live streaming events and virtual reality experiences.
Getting started with High Fidelity’s Spatial Audio API is easy. We recommend following this simple guide to begin. The guides page has additional detailed walkthroughs of example applications built using the API.
With just a few lines of code, you can deliver a real-world audio experience to your digital application.