<img src="https://certify.alexametrics.com/atrk.gif?account=bIEZv1FYxz20cv" style="display:none" height="1" width="1" alt="">

 

Frequently Asked Questions

Introduction

help-hover
What is Spatial Audio?

Popularized by the rise of VR, spatial audio is the technique whereby sounds are processed to make them appear to come from their real location in space. For example, a barking dog might sound as if it is behind you and to the left. VR headsets and spatially-tracked headphones make this effect more lifelike than with normal speakers, as the spatial audio is constantly corrected for your actual head position.

Spatial audio is computationally intensive, and existing solutions are used almost entirely for local sound effects, where the receiver’s computer or phone can do the CPU work to spatialize the sound. The effect involves a series of subtle changes to the timing and frequencies of sounds to make your ears believe that they came from a certain direction.


^ back to top

help-hover
How does Spatial Audio work?

There are two techniques used to move sounds left/right and in-front/behind you. Combined, these techniques are often referred to by the acronym HRTF, which stands for "Head Related Transfer Function." The long name really captures two fairly simple things. The first one is just the time delay between your ears: A sound coming from your right gets to your right ear a little before it gets to your left ear, so we delay the arrival at your "far" ear to create the effect.

The second effect is a little more complicated, and is why our ears have such a funny shape (called the "Pinna"). Basically, the different frequencies of a sound are shifted differently for every direction the sound could be coming from. Your brain is trained to recognize those changes for any sound, and tell you which direction the sound is coming from. For example, the higher frequencies of a sound coming from behind you are harder to hear because they get muted by the skin of your ear (that they have to pass through). You can experience this yourself if you find a very quiet room and then rub your thumb against your fingers while moving them around your head. You will hear the frequencies shift higher and lower for different directions and angles. We (and also dogs) even instinctively tilt our heads to the side to better spatialize sounds.

So to spatialize audio, we take the original sound and do these two things — shift the time delay between the two channels and adjust the loudness of the frequencies — according to where that sound is supposed to be relative to your head.


^ back to top

help-hover
When should I use Spatial Audio?

When people talk at the same time, you need spatial audio to be able to understand them. This is the ‘cocktail party’ effect, where the separation of people’s voices in space is what the brain uses to understand multiple voices at once. So if you need to have an online meeting where everyone can talk at the same time and chit-chat as we would normally, you need spatial audio to make it work. This is one of the reasons why Zoom (and other videoconferencing solutions) are frustrating and tiring — only one person can talk at once.


^ back to top

Using High Fidelity's Spatial Audio API

help-hover
How do I get started using High Fidelity’s Spatial Audio API?

There are several ways to get started quickly with the High Fidelity Spatial Audio API. The first thing that you will need to do is create a free acount.

Within a few minutes, you can build a simple Web App that makes use of the Spatial Audio API using this guide.

Alternatively, you can dive right in to the API Documentation.

Visit our Guides page for detailed walkthroughs of example applications built using the Spatial Audio API.


^ back to top

help-hover
What is a “Space”?

A Space is a virtual 3D environment that runs on the High Fidelity Spatial Audio API Servers.

Users who enter a Space can move around that virtual 3D environment and communicate with others using their microphone or another input device.

The audio that a user emits into a Space is spatialized and then sent to all other users inside that Space.


^ back to top

help-hover
What do I do if my client gets disconnected from High Fidelity?

In uncommon circumstances — such as network outages, hosting issues, or a bug in our server code - the connection between your Client and the High Fidelity Server may be broken unintentionally.

You can handle instances of this occurrence inside a custom onConnectionStateChanged handler. This handler is passed to the constructor of the HiFiCommunicator class in your application code.

This handler is called every time the connection state between the Client and Server is changed The only argument to the handler will contain information about the new state of the connection between Client and Server. For example, the new connection state might be "Disconnected" or "Failed".

For more information about onConnectionStateChanged, visit the API Documentation.


^ back to top

help-hover
I already use another video API for video communication within my app. Can I use the Spatial Audio API for audio communication within my app?

Yes! We highly encourage developers to implement spatial audio into applications that contain video communication functionality. Doing so is a great way to reduce the mental fatigue on users that results from parsing video and audio that isn't spatialized from multiple sources simultaneously.

We offer the following code examples to help you integrate video chat into your Spatial Audio application:

Most video communication APIs will accept media streams that don't contain an audio track. Pass a video-only media stream to your video API.

Then, pass an audio-only media stream to the High Fidelity Spatial Audio API.

Doing this means your application's audio information and video information will take a different amount of time to propagate between clients; audio and video latency will differ. Through internal testing, we have found the differences in latency between audio and video through APIs is small enough to keep your users comfortable.


^ back to top

help-hover
I'm hearing an echo of my own voice in my Spatial Audio apps. How do I prevent echo?

When two or more people are communicating via the High Fidelity Spatial Audio API, it is possible for User A's voice to "echo" through User B's microphone. There are several potential causes for this echo:

  1. User B is using speakers instead of headphones
  2. User B is wearing an open-backed headset which "leaks" audio into its microphone
  3. (Unlikely) User B is playing a rude prank and purposefully routing User A's audio back through their microphone feed

While having a stern conversation with User B might be the only fix for Cause #3, other causes of echo are straightforward to fix by enabling "Echo Cancellation" on the user's input MediaStream.

In your application code, you can apply MediaTrackConstraints to your call to getUserMedia(). In most of our JavaScript documentation and example code, we suggest calling navigator.mediaDevices.getUserMedia({ audio: HighFidelityAudio.getBestAudioConstraints(), video: false }) to obtain the highest-quality audio feed from the user's audio input device.

However, using HighFidelityAudio.getBestAudioConstraints() will, if possible, disable Echo Cancellation, Noise Suppression, and Automatic Gain Control.

You should call getUserMedia() with constraints best suited for your application. To enable Echo Cancellation for users of your application, call:

navigator.mediaDevices.getUserMedia({ audio: {echoCancellation: true}, video: false })

Be mindful of the fact that enabling Echo Cancellation, Noise Suppression, and/or Automatic Gain Control has a negative impact on audio input quality.

Another potential solution to prevent echo is "push to talk", which requires users to hold down a button when they speak.


^ back to top

help-hover
How can I improve compatibility between my Spatial Audio application and older browsers?

High Fidelity’s Spatial Audio API makes use of the latest technologies built into modern browsers. Older browsers may not work properly with the Spatial Audio API, and you may want to perform browser feature detection within your application.

However, it is possible to improve compatibility by working around issues in different browsers' WebRTC implementations. You can do this by using "WebRTC adapter.js".

For more information about the WebRTC adapter, visit this link on the Mozilla Developer Network.


^ back to top

help-hover
How do I use High Fidelity's Spatial Audio API in my serverless web application?

High Fidelity provides a JavaScript library that allows anyone to embed High Fidelity's Spatial Audio in a web application that supports JavaScript. This library requires the use of JSON Web Tokens (JWTs) when making connections to the High Fidelity servers, which ensures that only the people that you have authorized can access your High Fidelity spaces.

However, the challenge for a single-page, serverless application that wants to use the Spatial Audio API is how to securely generate these tokens without exposing sensitive code to end-user browser sessions. While there are plenty of libraries that will allow you to generate JWTs using JavaScript, there is still a need to store sensitive information (specifically, your application secret) without exposing it in your browser-side source code. That application secret is used to sign your tokens, and is what ensures that connections made from your application to the High Fidelity service are authorized. Given that any information included in client-side JavaScript code is inherently not secure, there's no way to protect your application secret without involving some sort of server-side application.

One possibility is that if you require that users log in or otherwise identify themselves before using your application, you might be able to leverage that piece of middleware to also generate signed JWTs, without including your application secret in JavaScript. Another approach, if you're not already using some sort of middleware for functionality such as authentication, would be to use something like AWS Lambda or Google Firebase's Cloud Functions to generate signed JWTs for your users on demand.

Which approach to use comes down to how much (and how) you want to limit access to your application. The most important thing to keep in mind is that anyone who can access your application secret can use it to generate a JWT and connect to the High Fidelity Spatial Audio API by "pretending" to be your application. Be aware that if you don't require any sort of authentication to connect to your application, you're effectively allowing any user to connect to the Spatial Audio API "as you" -- i.e. using your application secret -- and rack up usage minutes, which could then be billed back to you by High Fidelity.

One final thing -- we strongly recommend setting expiration times on your JWTs regardless of how they are created, as those JWTs themselves will always be exposed client-side. Setting an expiration time will prevent a malicious user from reusing a JWT to connect to our Spatial Audio API "as you," and protecting access to your application secret through the use of a server-side process (as described above) will prevent a malicious user from creating their own new JWTs. (And, please note that if your application secret is exposed at any point, you can regenerate it from the account pages.)


^ back to top

Authorized Access and JWTs

help-hover
How do I ensure that only authorized users can connect to my High Fidelity Space?

When your application's client code calls connectToHiFiAudioAPIServer(), the function must be called with a JWT as its first argument. This JWT ensures that only authorized users can connect to your High Fidelity Space.

There are two methods of obtaining a JWT for passing into this function:

  1. (For testing purposes only) Manually via the High Fidelity Spatial Audio API Developer Console
  2. Programmatically

^ back to top

 

help-hover
What is a JWT (JSON Web Token)?

JSON Web Tokens (JWTs) are used by High Fidelity's Spatial Audio API to authenticate and direct incoming connections into your spaces. General information about JWTs can be found at jwt.io and on Wikipedia.

When using them with High Fidelity, JWTs can be thought of simply as encoded sets of keys and values. The values you will need to include for a given High Fidelity connection are:

  • Your application's UUID
  • Your space's UUID
  • (Optional) A string defined by your application that can be used to identify this particular user's connection

Most JWT libraries allow you to manage these values as a structure or dictionary in your application's native language.

Refer to documentation at jwt.io and in the "Get a JWT" Guide for more details and example code.


^ back to top

help-hover
Do JWTs expire?

During Alpha, Test JWTs generated on the "Space Details" page of the Developer Console do not expire. This is one reason why these Test JWTs are not suitable for production environments.

When generating a JWT dynamically, you may provide an "nbf" (Not Before) claim and/or an "exp" (Expiration Time) claim. The High Fidelity Spatial Audio API Server will honor these claims when your client connects to the server:

  • Connections made before the "nbf" time will be rejected.
  • Connections made after the "exp" time will be rejected.

If you are using jose for NodeJS as per our "Get a JWT" guide, see SignJWT.setExpirationTime() and SignJWT.setNotBefore() for how to implement these claims.


^ back to top

help-hover
Is Spatial Audio data encrypted end-to-end?

Not fully end-to-end, no. In order to combine and spatialize the audio streams, they do need to be briefly decrypted within our mixer. However, they are NEVER recorded or stored on our servers.

We use WebRTC to communicate between Spatial Audio clients and servers, and thus all data streams are encrypted in transit. The Spatial Audio Client Library sends audio data over Secure RTP (SRTP). The Library sends other data, such as client position and orientation data, over DTLS. We use TLS for signaling transport.

Once the data streams reach our servers, we must briefly decode the audio stream data into PCM format within the server process in order to mix all incoming streams. The mixed stream is then immediately sent back to connected clients via SRTP and DTLS again. Our server processes live within an AWS VPC, and those processes aren't exposed externally apart from the required WebRTC and signaling-related entry points.

Additionally, we use JWTs to authenticate users attempting to connect to our servers. Please read this guide for more information about how to use JWTs to ensure that only authorized users are allowed to enter your space.

For more information about WebRTC security, see: Is WebRTC Safe?


^ back to top