<img src="https://certify.alexametrics.com/atrk.gif?account=bIEZv1FYxz20cv" style="display:none" height="1" width="1" alt="">

When writing an application using High Fidelity’s Spatial Audio API, it is important to understand all of the variables that affect the way your app’s virtual environment sounds to users. When you determine the correct values for these variables, users of your application will happily benefit from natural-sounding, immersive spatial audio. While default values for these variables will usually sound great, incorrectly setting these variables can result in problems like users sounding too quiet, users sounding too far away, or unintentional echo.

Let’s get right into the various virtual knobs and switches that you can control while you’re developing your application. The first section below pertains to browser controls not specific to the Spatial Audio API; these controls apply to WebJS and NodeJS applications. Later sections of this document apply to all Client Library languages.

 

Arguments to getUserMedia()

If your Web application requires access to a user’s camera or microphone, you’re probably calling getUserMedia() to obtain access. One of the arguments to that function call is a JS Object which defines “media constraints” associated with an audio or video stream.

In Guides such as our “Simple Web Application” guide, we encourage developers to call getUserMedia() with the following constraints:

navigator.mediaDevices.getUserMedia({ audio: HighFidelityAudio.getBestAudioConstraints(), video: false });

HighFidelityAudio.getBestAudioConstraints() tells the browser that the audio stream returned by the function call should have the following properties:

  • Echo Cancellation is disabled.
  • Noise Suppression is disabled.
  • Automatic Gain Control is disabled.

We encourage developers to keep these three audio properties disabled for the highest input audio quality possible. However, your application may have different priorities. Read on to discover what each of these three constraints does:

Echo Cancellation (EC)

tl;dr: Encourage users to wear stereo headphones. Keep echo cancellation disabled.

Consider two users in a virtual audio environment: User A and User B.

  • Both of these users are using bookshelf speakers as their audio output devices.
  • Both of these users are using a desktop microphone as their audio input devices.
  • All microphones are unmuted.
  • Echo Cancellation is off.
  • User A and User B are in close proximity to each other in the virtual environment.

In this situation, if User A speaks into their microphone:

  1. User A’s voice is processed by the microphone, operating system, browser, and application.
  2. User A’s voice data is sent to User B’s computer.
  3. User A’s voice is output through User B’s bookshelf speakers.
  4. User A’s voice is picked up through User B’s microphone.
  5. Steps 1-4 repeat indefinitely, swapping User A and User B on each loop.

This situation manifests itself as echo, which can be annoying, discombobulating, and sometimes physically painful for users.

Thus, if you cannot guarantee that your users will be using headphones, or you cannot guarantee that users with speakers will not mute their input devices when they aren’t speaking, we recommend that you turn on echo cancellation.

However, echo cancellation comes at a cost. If User A and User B from the above scenario both had echo cancellation on, and were both talking at the same time, echo cancellation would cause both of them to be quiet or completely inaudible to a third User C. This artifact might be described as User A or User B sounding far away, or like they’re talking through a sock.

Therefore, we recommend that you encourage your users to wear stereo headphones, and keep echo cancellation disabled for maximum audio quality and minimum frustration.

Noise Suppression (NS)

tl;dr: Noise Suppression (NS) impacts audio input quality, but the impact of turning it on is usually minimal. NS can improve perceived audio quality in some cases.

Microphones and the devices to which they are connected attempt to capture audio energy from the air and turn that energy into 1s and 0s, which can then be processed and transmitted to other users in a virtual environment.

However, Things Can Go Wrong in this process between when a user speaks and when the 1s and 0s are transmitted to other users. Most commonly, electrical noise is introduced into the audio input signal. This electrical noise — which can sound like crackling or humming — can have many sources, including:

  • A low-quality microphone
  • A low-quality cable
  • A low-quality sound card

Only extremely high-quality (and expensive) audio equipment is perceptually free of electrical noise. To mitigate electrical noise and other sources of constant, predictable noise (such as the hum of an airplane engine or water through a showerhead), it is possible to enable noise suppression.

In 2021, noise suppression algorithms are quite good. They have a minimal impact on audio quality, and do a good job of removing certain kinds of noise from an audio input signal1. However, noise suppression still has a negative impact on audio quality.

Thus, for maximum audio quality and audio accuracy, disable noise suppression. Enable noise suppression for users by default if users complain of hissing or humming noises in others’ voices.

1 Products such as Krisp and Nvidia’s RTX Voice aim to remove other kinds of “noise” from an audio input signal. Here’s a very impressive demo video of RTX Voice (the video is loud, so turn your headphones down).

Automatic Gain Control (AGC)

tl;dr: Automatic Gain Control (AGC) is useful for users who can’t or don’t know how to modulate their audio input device’s gain, but it has a big negative impact on perceived audio quality. Keep AGC off if possible.

Automatic Gain Control, or AGC, aims to normalize users’ audio input device capture volume algorithmically.

Calibrating an audio input device’s capture volume is extremely important, and extremely difficult. Two users can have the exact same microphone, and, for reasons such as hardware settings or operating system settings, one can sound significantly louder than the other. No good!

To solve this problem, there are several solutions:

  • Have your users calibrate their audio input device’s levels before entering your virtual environment via a wizard or via documentation such as this guide.
    • Ugh. Nobody wants to do this.
  • Allow a space administrator to manually modulate a user’s HiFiGain setting.
    • See below for more details on HiFiGain.
    • This means a space administrator has to be paying attention. Not great.
  • Allow users to manually modulate other user’s perceived volume for them only.
    • See below for documentation regarding setOtherUserGainForThisConnection().
    • This is a fine solution, but it still requires users to take action to correct the settings of another user.
  • Enable Automatic Gain Control for users who are “too loud” or “too quiet”.
    • You might be able to do this programmatically, but, realistically, this would have to be a manual action.
  • Enable Automatic Gain Control for all users of your application.
    • This has a significant negative impact on perceived audio quality, but reduces the likelihood that two users will sound different in terms of loudness.

Take a look at the solutions above, and employ the one that best fits the needs of your application.

 

Spatial Audio API Controls

There are several controls that developers using the Spatial Audio API can use to modify the way users in your app’s virtual environment hear things. It is challenging to enumerate all of the ways that these controls interact with each other. If you have any questions about these controls — or anything else regarding the Spatial Audio API — ask your question in our Discord server (invite link is at the bottom in your "Welcome" developer email from us), or visit highfidelity.com/support.

User Position and User Orientation

Due to the realistic way that our spatial audio technology renders audio, the relative distance and orientation between two users has a significant impact on the way those two users sound to each other.

Here are a few key pieces of information to consider when developing your application:

  • Be sure that the distances between entities in your application are set in meters. For maximum realism, the Spatial Audio API relies on the distances between two entities in the virtual environment to be set in meters.
  • Ensure that your application’s render scale is also set to meters. It’s critical that the way your application sounds and the way your application looks matches.
  • Understand that two users facing away from each other will sound quieter to each other than if the two users were facing towards each other (just like in real life).

Set the values of position and orientation for a user by supplying those values in the argument to HiFiCommunicator.updateUserDataAndTransmit().

HiFiGain

Click here to read documentation about HiFiGain on docs.highfidelity.com.

The HiFiGain value:

  • Affects how loud User A will sound to User B at a given distance in 3D space.
  • Affects the distance at which User A can be heard in 3D space.

Higher values for User A means that User A will sound louder to other users around User A, and it also means that User A will be audible from a greater distance.

Set the value of HiFiGain for a user by supplying it in the argument to HiFiCommunicator.updateUserDataAndTransmit().

setOtherUserGainForThisConnection()

Click here to read documentation about setOtherUserGainForThisConnection() on docs.highfidelity.com.

It is often useful to expose a per-user volume control to all users of your application. For example, User A may find that User B is too loud, and User A may find that User C is too quiet. Instead of requiring that User B and User C modify their input volumes manually, one solution to this problem is to expose a control in your application which allows User A to modify User B and User C’s perceived volume.

Do this by calling HiFiCommunicator.setOtherUserGainForThisConnection() when, for example, a user moves a volume slider associated with another user.

HiFiCommunicator.setOtherUserGainsForThisConnection() (plural) lets you set several users’ gains at once.

volumeThreshold

Click here to read documentation about volumeThreshold on docs.highfidelity.com.

It can be useful to set a volume threshold associated with a specific client — either manually or programmatically — in order to prevent background noise from being unintentionally transmitted to other users in a virtual environment. This can improve user comfort.

If you notice that certain users of your application are always transmitting an audio signal, even if they aren’t speaking, raise the volumeThreshold associated with that user.

Set the value of volumeThreshold for a user by supplying it in the argument to HiFiCommunicator.updateUserDataAndTransmit().

Space Attenuation Coefficient and Space Frequency Rolloff

In a given High Fidelity Space, the attenuation coefficient and frequency rolloff values affect the way that sound travels through the virtual environment.

Note that it is easiest to empirically determine the best values for these variables for your application. The default values sound the most realistic, but you may find that your application has different needs.

Attenuation Coefficient

The following language regarding the attenuation coefficient is taken from the Administrative REST API documentation:

The attenuation coefficient affects how quickly the volume of a sound decreases as one moves away from the sound source. This does not affect the initial volume of the sound at the sound source.

By default, this setting is null, which means the space will instead have a global attenuation value of 0.5. The default approximates real-world sound attenuation.

More attenuation will cause sounds to decrease in volume more quickly over the same distance. Less attenuation will cause sounds to decrease in volume more slowly.

Attenuation behaves differently depending on the number used:

  • A setting between 0 and 1 represents logarithmic attenuation.
    • This sounds more natural and is recommended.
    • Settings closer to 0, for example 0.2 and 0.02, represent less attenuation respectively.
    • A number such as 0.00001 effectively disables attenuation within a reasonably sized space, so that all sound sources can be heard.
  • A setting less than 0 represents linear attenuation.
    • This does not sound as natural as logarithmic attenuation.
    • Linear attenuation causes a sound source to become completely silent after a distance in meters.
      • For example, with an attenuation setting of -10.0, the sound from a source becomes silent at 10 meters from the source.

The attenuation coefficient value can be modified in several ways:

Space Frequency Rolloff

The following language regarding frequency rolloff is taken from the Administrative REST API documentation:

The frequency rolloff value affects how quickly the volume of a sound decreases at high frequencies as one moves away from the sound source. This causes sounds to seem more 'muffled' at greater distances.

By default, this setting is null, which means the space will instead have a global frequency rolloff of 16.0. This means that at 16.0 meters, the high frequency parts of sounds above 1kHz will be reduced by a fixed amount. This is effectively a distance-dependent lowpass filter.

Like attenuation, frequency rolloff makes sounds less distinguishable at greater distances, but it does so without quieting the sound entirely.

The frequency rolloff value can be modified in several ways:

Conclusion

That’s a lot of controls! We hope that you’ve learned something by reading this document, and we hope that you’re better able to build the immersive audio application of your dreams. Please do feel free to ask any questions you may have — about these controls, or otherwise — in our Discord server, or visit highfidelity.com/support.