“Sound is touch at a distance,” says Anne Fernald, Professor of Psychology at Stanford University. In the way that human touch can be catalogued emotionally in our brains, the same happens with hearing. “Sounds and emotions are linked in a fundamental way. Hearing is a crucial sense in our process of gathering and storing information for our safety and survival, as well as informing and enriching our emotional experiences as we make our way through the world,” writes Sherry Pickett, Doctor of Audiology at Greentree Hearing & Audiology.
Sound design is of course an incredibly broad field that touches many disciplines: Music, theatre, film, television, video game audio, advertising, and especially these past couple years — a huge variety of online networking events, meetings, conferences, concerts, and general social gatherings.
We’ll focus today on specifically those virtual events (and video game audio is applicable, too).
Matthew Bennett, head of sound and sensory design at Microsoft, says that we need to shift the way we think about the sound of our digital world.
“The world is not a flat screen; we are designed for a richer existence. Our brains and bodies are hungry for different kinds of information — and for layered multisensory experiences . We expect excellence in visual and hardware design, yet we’ve become accustomed to the idea that our technology often naturally sounds annoying. Understanding sound as a sensory experience can transform the way we listen, the way we think and feel, and the way we design,” Bennett writes.
Keeping those thoughts in mind, we’ll go through research that highlights three areas that contribute to excellent sound design, particularly when it is happening in real-time (as so many of our meetings and events are these days). After all, technology shouldn’t sound annoying.
3 Components of Awesome Sound Design
So then: How can we design technology so that it sounds good?
Although we are focusing on sound design for real-time virtual events and chatting while playing video games, there’s an old saying that still applies from the filmmaking business: “Sound is more than half the picture.”
Indeed it is.
1. Increased Audio Intelligibility
When sound has been rendered in a way that allows those speaking to be more intelligible, everyone benefits. How can this be done?
First, let’s define intelligibility: “In speech communication, intelligibility is a measure of how comprehensible speech is in given conditions. Intelligibility is affected by the level (loud but not too loud) and quality of the speech signal, the type and level of background noise, reverberation (some reflections but not too many), and, for speech over communication devices, the properties of the communication system.”
Where does this come into play for virtual events and game audio?
When we meet in person for networking, gathering at a conference, or playing games together, individual voices are separated by distance and appear at specific points around the physical space. This spatial presentation makes it fairly easy for us to decipher who is speaking. But when people are meeting virtually, on many platforms, voices come from the same place, and at the same distance from the listener. This creates an artificial soundscape that is the opposite of clearly intelligible speech.
There is an easy way to fix this, though: Integrate spatial audio into your platform or app.
Spatial audio delivers sounds so each source comes from a defined location in space. Put on headphones to experience what a virtual concert feels and sounds like with this sort of 3D audio…
2. More Natural Conversations
It was discovered quite a while ago, in 2001, that the majority of the conversation in business meetings with 4-8 participants is made up of overlapping talk spurts. Elizabeth Shriberg et al. writes, “Results show that both meetings and telephone conversations have high rates of overlap, suggesting that overlap is an important inherent characteristic that should not be ignored in computational models of conversation.”
What does this mean a bit more simply? It’s important that the audio at virtual events isn’t mono (AKA played from one channel). If it’s mono, it’ll sound like everything is coming from a single source, and it will be difficult to understand.
Think about those playing video games, too (Discord usage is rapidly increasing also). When players speak, it’s often during game action... meaning excited, overlapping speech.
Therefore, sound that is real-time needs to be processed spatially, so peoples’ voices aren’t lost in a sea of jumbled conversation.
Instead, take a listen to this: Imagine Zoom with spatial audio. When software is able to process people talking at the same time spatially — in real-time — this makes them more clearly understandable.
Combining audio intelligibility and the ability for many people to speak at once is a great start. Finally, using sound to create an immersive, comfortable environment where participants feel present and engaged is the icing on our sound design cake.
Step back for a moment and remember our initial thoughts from Matthew Bennett: “Understanding sound as a sensory experience.” Sound is emotional. Marci D. Cottingham and Rebecca J. Erickson’s research examined how audio diaries might be used to capture candid emotions. “Research on emotion is fraught with methodological limitations, as feelings can have non-discrete, ephemeral, and ineffable qualities. Audio diaries offer a method for capturing the sequential and varied experience of emotions as they emerge from everyday life.”
How can we apply that concept to enjoying immersive, real-time communication at virtual events and while playing games? Check out Francisco Cuadrado et al.’s research in November 2020: “Sound from media increases the immersion of the audience, adding credibility to the narration but also generating emotions in the spectator. Results showed higher emotional impact of the arousal and 3D audio conditions when both variables were combined.”
3D audio, or spatial audio, comes up again here.
James Broderick et al. also writes in IEEE about the importance of spatial audio in virtual environments. “Not only does well made spatial audio allow a user to become more immersed in their virtual experience, it is an important channel for information about their environment. With the advent of VR, games are putting more work into spatial audio and audio design, and now the results are becoming available for both research and game development.”
Integrate Spatial Audio for Amazing Sound
Seth Horowitz, an auditory neuroscientist at Brown University, has a few thoughts we’ll close with:
“Sound and the mind are very, very intricately linked, and yet we almost never pay attention to sound. Sound is always there. It's our early warning system. It's also our emotional driver. It's our attentional driver. Everything you hear has some kind of an impact on you and changes how you respond to the rest of the world.”
With our spatial audio API, we’re endeavoring to bridge the gap that exists between the “real-life” experience of being face-to-face with someone, and virtual communication. Whether that is in an online conference, networking event, virtual meeting, or while playing a game, having good, spatial sound makes a difference.