Many developers working in VR right now use Unity or Unreal. For them, producing good VR content boils down to optimizing assets and non-rendering behaviors to allow enough rendering headroom to be comfortable on their target platform. Since most of these developers are also creating VR-only experiences on the PC they can also assume a minimum hardware specification, below which, they’re free to say “Sorry, we’re not going to support you. Go get a better computer / GPU”. I am not one of those developers.
I work on an open-source, cross-platform, VR-capable application. This has some serious implications. Open-source means that we can’t use a commercial engine like Unity or Unreal for rendering. Cross-platform means we need to use a rendering API like OpenGL or Vulkan that’s widely available across many platforms. VR-capable means that while we support VR, we also support conventional 2D monitors.
In addition, our application is designed to be driven almost entirely by user generated content, so we can’t reliably optimize asset production or make assumptions about the quality of assets. We’re limited to strategies that rely on the rendering engine itself. Our target is much broader than VR capable desktop machines running windows. Ideally, we’d like user to be able to pull out a newish laptop, browse content, and have a reasonable (if less awesome than VR) experience.
Given that, I’d like to talk about some of the challenges presented by making a VR-capable app and the solutions we’ve built.
The Golden Rules of VR
There are a few things that are commonly drilled into you in the VR developer equivalent of boot camp.
- Minimize latency
- Hit your frame rate target
Why are these goals so important? They both have to do with making the image seen in an HMD match as precisely as possible the orientation of your head. If you turn your head to the left and the rendered world in the headset doesn’t rotate to the right at exactly the same rate, then the spell of presence is broken. As a result, at best, the world in the HMD seems fake, and at worst some part of your brain, unable to reconcile your physical movements with the world you see around you, decides something is very wrong and that it’s time to make you throw up.
Note that the main focus here is specifically on orientation or rotation of the head. The human visual system is finely tuned to handle rotation in a particular way. If we go back to basics, you can see why: If you’re a primitive human and hunting (or being hunted by) some large animal, a split second change in your environment could mean the difference between going home happy or hungry (or being eaten). It would be a very bad thing if, say, every time you turned your head you went completely blind.
We all come equipped to handle these kinds of problems. To maintain your ability to perceive your surroundings while turning your head, your eyes automatically track in the opposite direction to the rotation. Turn to the left 10 degrees and your eyes will rotate to the right 10 degrees over the same period of time. Furthermore, because your eyes retain their original rotation relative to the world, your brain expects that the world around you will remain fixed in position.
Together, this and other strategies for perceiving in a dynamic environment, create a set of rules that govern the interaction between your movement and vision which are important to you see the world.
With that in mind, let’s return to those hard and fast requirements for VR.
Minimizing latency means managing the time between when we measure the head position and when we send an image to your display. If you turn your head and the image on the display starts turning even a tenth of a second later, we’re back to ruining the experience and potentially triggering nausea.
In fact, a tenth of a second is far too long a delay. A commonly accepted maximum allowable latency between head movement and the display reflecting that movement is one fiftieth of a second or 20 milliseconds although some suggest it should be as low as 7 ms. So: it’s really important that when you turn your head, the HMD display should reflect the new rotation as soon as possible.
Of course there are a few ways to cheat the system. For one thing, all HMD SDKs use prediction. When we ask the SDK for a head pose, what we really want is not the pose now, but the pose the head will be in when the pixels light up, so the current SDKs will use the information about how fast the head is moving right now to anticipate where it will be when you present the frame. This improves the experience somewhat, but obviously this prediction can’t anticipate changes in head movement, such as starting a head turn or changing the direction. Also, the greater the amount of time between requesting the data and actually displaying the scene, the less accurate the predicted value will be.
To help with this, the Oculus SDK provides a feature called timewarp. When we provide the rendered image to the SDK, we also provide the head pose used to render it. The Oculus runtime can then compute the difference between the rendered pose and the actual head pose when the pixels are displayed and rotate the image slightly to compensate for any difference.
Frame Rate Targets
Both the Vive and Rift consumer versions display at 90 Hz. Ideally this means the application should generating a new frame 90 times a second to keep up with your motion. If we fail to hit this frame rate target, then when the HMD is ready to display the next frame and no new frame has been provided, the HMD may show the previous frame again. Since your head may have turned in that amount of time, the view in the HMD is no longer what it should be.
The result of this effect is perceived as ‘flicker’, ‘judder’, or ‘ghosting’ when you turn your head. You end up seeing doubled images, because as you turn your head your eyes counter-rotated in the opposite direction. Even though the same display pixels as the previous frame are lit, your eyes have rotated but are still processing the after-image of the last frame. You now perceive the same image in two different locations.
Oculus implemented asynchronous timewarp (ATW) to help with this class of problems. With this feature, the Oculus runtime detects that no new frame has been provided, so it takes the last frame along with the last head pose and does the appropriate rotation.
However, ATW only works with Rift HMDs using the 1.3 production runtime. OpenVR and the Vive HMD don’t support synchronous timewarp, much less the asynchronous version.
In order to make sure we can satisfy those two requirements for a good VR experience, I’ve done a fair bit of architectural work on the application to support:
- Display plugin architecture
- Threaded present
- Manual timewarp for non-Oculus HMDs
- Maintaining support for non-Windows platforms
We abstracted our interaction with our current display device into a plugin model, where each kind of display (Vive, Rift, 2D Monitor, 3D Monitor) was a distinct plugin that knew how to interact with that device. A display plugin reports to the application whether it’s stereo, an HMD, etc. The application renders the scene and the UI and hands them each separately to the display plugin. Not only did this take a lot of complexity out of the core codebase, but it made it easy to create debugging plugins that allow us to simulate the rendering load of a given device without actually have to have or use that device. But we still had plenty of work remaining.
Often when you see a diagram talking about how a 3D application or game works it’s broken down into two broad categories: ‘rendering’ and ‘everything else’, where ‘everything else’ includes input handling, physics simulation, AI, etc and ‘rendering’ includes executing the commands to the underlying drawing API (usually OpenGL or Direct3D) and presentation of the result of the completed commands to the output device. Sometimes these will all be on one thread; sometimes rendering will be on it’s own thread. For our purposes, it’s actually important to break down the rendering block up into those two distinct parts:
- executing the commands, which we still call rendering; and
- sending the result to the output device, which we will simply refer to as presentation
When I started working on Interface (the working title for our client), almost everything that wasn’t network or audio handling was on one thread: the main thread. This presented a challenge, because of the conflict between blocking display functions and the nature of event oriented applications.
Because you don’t know how long rendering will take, you want to start as early as possible after the most recent frame has been displayed. However, most functions that actually present a finished image to the display block until the next vertical-sync for the display. This means that on the thread where we do our rendering, we end up spending almost all the time either rendering the scene or waiting for that scene to be displayed. If this thread is also the thread where we handle input, window movement message, etc, then all that other work to do can pile up while we’re doing the render/wait tango and we end up starving out input handling, with results like laggy, jerky mouse and keyboard response.
The ideal solution to this is to have neither rendering a scene nor presenting a scene on the same thread as the input handling. Moving rendering off of the main thread is currently a work in progress, but moving presentation off the main thread was actually pretty straightforward.
Here’s what we do: Once the application is done rendering the resulting scene is handed off to the display plugin. The display plugins then queue it for presentation on a separate thread. This neatly solves part of the goal of hitting the target frame rate. The presentation thread’s one job is to take the latest frame it’s received and put it to the output device. The application is then free to render as fast as it can, up to the device frame rate, and hopefully hitting that goal.
If the rate at which the application renders frames is lower than the display refresh rate, we are still able to keep sending the most recently rendered frame to the display device on the present thread. Prior to the release of the consumer Rift, this allowed us to have the same benefits of asynchronous timewarp without support in the Oculus runtime. It drastically improved the experience on the Rift devices. The Rift was able to adjust the most recent image and display it, even if the rendering/main thread was still busy generating the next frame.
Unfortunately, this change did not improve the experience on the Vive. In fact, because the timing of the presentation thread typically introduces an additional one frame delay into the end to end process, it increased the average latency between the sensor sample and rendering the frame. Since the Vive has no mechanism like timewarp to correct for this, the experience actually went down in quality. So how do we fix this?
The solution is that where an HMD SDK doesn’t support timewarp natively, we do it ourselves. The presentation thread already does a small amount of work, compositing the rendered scene, the user interface layer and the mouse cursor together. It’s relatively simple to modify that compositing code to do a similar kind of rotational adjustment performed by the Rift runtime.
There are some limitations to this approach. In the Rift runtime, Oculus is able to take advantage of GPU driver extensions that allow a one context to operate at a higher priority than others. This means that if the GPU is running near capacity, the timewarping operation can still be relied on to complete in a predictable amount of time because the GPU driver will pause working on commands in other contexts when the timewarp work is sent. Unfortunately, such extensions are currently only available under NDA to HMD manufacturers, so we can’t take advantage of them. However, because virtually all our content is user generated, we aren’t yet at the point where content will often drive an HMD capable GPU to that level, leaving us plenty of headroom to do this reprojection.