Search Unity

Fixing Time.deltaTime in Unity 2020.2 for smoother gameplay: What did it take?

, October 1, 2020

Unity 2020.2 beta introduces a fix to an issue that afflicts many development platforms: inconsistent Time.deltaTime values, which lead to jerky, stuttering movements. Read this blog post to understand what was going on and how the upcoming version of Unity helps you create slightly smoother gameplay.

Since the dawn of gaming, achieving framerate-independent movement in video games meant taking frame delta time into account:

void Update()
{
transform.position += m_Velocity * Time.deltaTime;
}

This achieves the desired effect of an object moving at constant average velocity, regardless of the frame rate the game is running at. It should, in theory, also move the object at a steady pace if your frame rate is rock solid. In practice, the picture is quite different. If you looked at actual reported Time.deltaTime values, you might have seen this:

6.854 ms
7.423 ms
6.691 ms
6.707 ms
7.045 ms
7.346 ms
6.513 ms

This is an issue that affects many game engines, including Unity – and we’re thankful to our users for bringing it to our attention. Happily, Unity 2020.2 beta begins to address it.

So why does this happen? Why, when the frame rate is locked to constant 144 fps, is Time.deltaTime not equal to 1144 seconds (~6.94 ms) every time? In this blog post, I’ll take you on the journey of investigating and ultimately fixing this phenomenon.

What is delta time and why is it important?

In layman’s terms, delta time is the amount of time your last frame took to complete. It sounds simple, but it’s not as intuitive as you might think. In most game development books you’ll find this canonical definition of a game loop:

while (true)
{
ProcessInput();
Update();
Render();
}

With a game loop like this, it’s easy to calculate delta time:

var time = GetTime();
while (true)
{
var lastTime = time;
time = GetTime();
var deltaTime = time – lastTime;
ProcessInput();
Update(deltaTime);
Render(deltaTime);
}

While this model is simple and easy to understand, it’s highly inadequate for modern game engines. To achieve high performance, engines nowadays use a technique called “pipelining,” which allows an engine to work on more than one frame at any given time.

Compare this:

To this:

In both of these cases, individual parts of the game loop take the same amount of time, but the second case executes them in parallel, which allows it to push out more than twice as many frames in the same amount of time. Pipelining the engine changes the frame time from being equal to the sum of all pipeline stages to being equal to the longest one.

However, even that is a simplification of what actually happens every frame in the engine:

  • Each pipeline stage takes a different amount of time every frame. Perhaps this frame has more objects on the screen than the last, which would make rendering take longer. Or perhaps the player rolled their face on the keyboard, which made input processing take longer.
  • Since different pipeline stages take different amounts of time, we need to artificially halt the faster ones so they don’t get ahead too much. Most commonly, this is implemented by waiting until some previous frame is flipped to the front buffer (also known as the screen buffer). If VSync is enabled, this additionally synchronizes to the start of the display’s VBLANK period. I’ll touch more on this later.

With that knowledge in mind, let’s take a look at a typical frame timeline in Unity 2020.1. Since platform selection and various settings significantly affect it, this article will assume a Windows Standalone player with multithreaded rendering enabled, graphics jobs disabled, vsync enabled and QualitySettings.maxQueuedFrames set to 2 running on a 144 Hz monitor without dropping any frames. Click on the image to see it in full size:

Unity’s frame pipeline wasn’t implemented from scratch. Instead, it evolved over the last decade to become what it is today. If you go back to past versions of Unity, you will find that it changes every few releases.

You may immediately notice a couple things about it:

  • Once all the work is submitted to the GPU, Unity doesn’t wait for that frame to be flipped to the screen: instead, it waits for the previous one. This is controlled by the QualitySettings.maxQueuedFrames API. This setting describes how far the frame that is currently being displayed can be behind the frame that’s currently rendering. The minimum possible value is 1, since the best you can do is render framen+1 when framen is being displayed on the screen. Since it is set to 2 in this case (which is the default), Unity makes sure that framen gets displayed on the screen before it starts rendering framen+2 (for instance, before Unity starts rendering frame5, it waits for frame3 to appear on the screen).
  • Frame5 takes longer to render on the GPU than a single refresh interval of the monitor (7.22 ms vs 6.94 ms); however, none of the frames are dropped. This happens because QualitySettings.maxQueuedFrames with the value of 2 delays when the actual frame appears on the screen, which produces a buffer in the time that safeguards against dropping frames, as long as the “spike” doesn’t become the norm. If it were set to 1, Unity would have surely dropped the frame, as it would no longer overlap the work.

Even though screen refresh happens every 6.94 ms, Unity’s time sampling presents a different image:

tdeltaTime(5) = 1.4 + 3.19 + 1.51 + 0.5 + 0.67 = 7.27 ms
tdeltaTime(6) = 1.45 + 2.81 + 1.48 + 0.5 + 0.4 = 6.64 ms
tdeltaTime(7) = 1.43 + 3.13 + 1.61 + 0.51 + 0.35 = 7.03 ms

The delta time average in this case ((7.27 + 6.64 + 7.03)/3 = 6.98 ms) is very close to the actual monitor refresh rate (6.94 ms), and if you were to measure this for a longer period of time, it would eventually average out to exactly 6.94 ms. Unfortunately, if you use this delta time as it is to calculate visible object movement, you will introduce a very subtle jitter. To illustrate this, I created a simple Unity project. It contains three green squares moving across the world space:

The camera is attached to the top cube, so it appears perfectly still on the screen. If Time.deltaTime is accurate, the middle and bottom cubes would appear to be still as well. The cubes move twice the width of the display every second: the higher the velocity, the more visible the jitter becomes. To illustrate movement, I placed purple and pink non-moving cubes in fixed positions in the background so that you can tell how fast the cubes are actually moving.

In Unity 2020.1, the middle and the bottom cubes don’t quite match the top cube movement – they jitter slightly. Below is a video captured with a slow-motion camera (slowed down 20x):

 

Identifying the source of delta time variation

So where do these delta time inconsistencies come from? The display shows each frame for a fixed amount of time, changing the picture every 6.94 ms. This is the real delta time because that’s how much time it takes for a frame to appear on the screen and that’s the amount of time the player of your game will observe each frame for.

Each 6.94 ms interval consists of two parts: processing and sleeping. The example frame timeline shows that the delta time is calculated on the main thread, so it will be our main focus. The processing part of the main thread consists of pumping OS messages, processing input, calling Update and issuing rendering commands. “Wait for render thread” is the sleeping part. The sum of these two intervals is equal to the real frame time:

tprocessing + twaiting = 6.94 ms

Both of these timings fluctuate for various reasons every frame, but their sum remains constant. If the processing time increases, the waiting time will decrease and vice versa, so they always equal exactly 6.94 ms. In fact, the sum of all the parts leading up to the wait always equals 6.94 ms:

tissueGPUCommands(4) + tpumpOSMessages(5) + tprocessInput(5) + tUpdate(5) + twait(5) = 1.51 + 0.5 + 0.67 + 1.45 + 2.81 = 6.94 ms
tissueGPUCommands(5) + tpumpOSMessages(6) + tprocessInput(6) + tUpdate(6) + twait(6) = 1.48 + 0.5 + 0.4 + 1.43 + 3.13 = 6.94 ms
tissueGPUCommands(6) + tpumpOSMessages(7) + tprocessInput(7) + tUpdate(7) + twait(7) = 1.61 + 0.51 + 0.35 + 1.28 + 3.19 = 6.94 ms

However, Unity queries time at the beginning of Update. Because of that, any variation in time it takes to issue rendering commands, pump OS messages or process input events will throw off the result.

A simplified Unity main thread loop can be defined like this:

while (!ShouldQuit())
{
PumpOSMessages();
UpdateInput();
SampleTime(); // We sample time here!
Update();
WaitForRenderThread();
IssueRenderingCommands();
}

The solution to this problem seems to be straightforward: just move the time sampling to after the wait, so the game loop becomes this:

while (!ShouldQuit())
{
PumpOSMessages();
UpdateInput();
Update();
WaitForRenderThread();
SampleTime();
IssueRenderingCommands();
}

However, this change doesn’t work correctly: rendering has different time readings than Update(), which has adverse effects on all sorts of things. One option is to save the sampled time at this point and update engine time only at the beginning of the next frame. However, that would mean the engine would be using time from before rendering the latest frame.

Since moving SampleTime() to after the Update() is not effective, perhaps moving the wait to the beginning of the frame will be more successful:

while (!ShouldQuit())
{
PumpOSMessages();
UpdateInput();
WaitForRenderThread();
SampleTime();
Update();
IssueRenderingCommands();
}

Unfortunately, that causes another issue: now the render thread must finish rendering almost as soon as requested, which means that the rendering thread will benefit only minimally from doing work in parallel.

Let’s look back at the frame timeline:

Unity enforces pipeline synchronization by waiting for the render thread each frame. This is needed so that the main thread doesn’t run too far ahead of what is being displayed on the screen. Render thread is considered to be “done working” when it finishes rendering and waits for a frame to appear on the screen. In other words, it waits for the back buffer to be flipped and become the front buffer. However, the render thread doesn’t actually care when the previous frame was displayed on the screen – only the main thread is concerned about it because it needs to throttle itself. So instead of having the render thread wait for the frame to appear on the screen, this wait can be moved to the main thread. Let’s call it WaitForLastPresentation(). The main thread loop becomes:

while (!ShouldQuit())
{
PumpOSMessages();
UpdateInput();
WaitForLastPresentation();
SampleTime();
Update();
WaitForRenderThread();
IssueRenderingCommands();
}

Time is now sampled just after the wait portion of the loop, so the timing will be aligned with the monitor’s refresh rate. Time is also sampled at the beginning of the frame, so Update() and Render() see the same timings.

It is very important to note that WaitForLastPresention() does not wait for the framen – 1 to appear on the screen. If that was the case, no pipelining would be done at all. Instead, it waits for framen – QualitySettings.maxQueuedFrames to appear on the screen, which allows the main thread to continue without waiting for the last frame to complete (unless maxQueuedFrames is set to 1, in which case every frame must be completed before a new one starts).

Achieving stability: We need to go deeper!

After implementing this solution, delta time became much more stable than it was before, but some jitter and occasional variance still occurred. We depend on the operating system waking up the engine from sleep on time. This can take multiple microseconds and therefore introduce jitter to the delta time, especially on desktop platforms where multiple programs are running at the same time.

So what do you do now? It turns out that most graphics APIs/platforms allow you to extract the exact timestamp of a frame being presented to the screen (or an off-screen buffer). For instance, Direct3D 11 and 12 have IDXGISwapChain::GetFrameStatistics, while macOS provides CVDisplayLink. There are a few downsides with this approach, though:

  • You need to write separate extraction code for every supported graphics API, which means that time measurement code is now platform-specific and each platform has its own separate implementation. Since each platform behaves differently, a change like this runs the risk of catastrophic consequences.
  • With some graphics APIs, to obtain this timestamp, VSync must be enabled. This means if VSync is disabled, the time must still be calculated manually.

However, I believe this approach is worth the risk and effort. The result obtained using this method is very reliable and produces the timings that directly correspond to what is seen on the display.

Since we now extract sampling time from the graphics API, WaitForLastPresention() and SampleTime() steps are combined into a new step:

while (!ShouldQuit())
{
PumpOSMessages();
UpdateInput();
WaitForLastPresentationAndGetTimestamp();
Update();
WaitForRenderThread();
IssueRenderingCommands();
}

With that, the problem of jittery movement is solved.

Input latency considerations

Input latency is a tricky subject. It’s not very easy to measure accurately, and it can be introduced by various different factors: input hardware, operating system, drivers, game engine, game logic, and the display. Here I focus on the game engine factor of the input latency since Unity can’t affect the other factors.

Engine input latency is the time between the input OS message becoming available and the image getting dispatched to the display. Given the main thread loop, you can visualize input latency as part of code (assuming QualitySettings.maxQueuedFrames is set to 2):

PumpOSMessages(); // Pump input OS messages for frame 0
UpdateInput(); // Process input for frame 0
——————— // Earliest input event from the OS that didn’t become part of frame 0 arrives here!
WaitForLastPresentationAndGetTimestamp(); // Wait for frame -2 to appear on the screen
Update(); // Update game state for frame 0
WaitForRenderThread(); // Wait until all commands from frame -1 are submitted to the GPU
IssueRenderingCommands(); // Send rendering commands for frame 0 to the rendering thread
PumpOSMessages(); // Pump input OS messages for frame 1
UpdateInput(); // Process input for frame 1
WaitForLastPresentationAndGetTimestamp(); // Wait for frame -1 to appear on the screen
Update(); // Update game state for frame 1, finally seeing the input event that arrived
WaitForRenderThread(); // Wait until all commands from frame 0 are submitted to the GPU
IssueRenderingCommands(); // Send rendering commands for frame 1 to the rendering thread
PumpOSMessages(); // Pump input OS messages for frame 2
UpdateInput(); // Process input for frame 2
WaitForLastPresentationAndGetTimestamp(); // Wait for frame 0 to appear on the screen
Update(); // Update game state for frame 2
WaitForRenderThread(); // Wait until all commands from frame 1 are submitted to the GPU
IssueRenderingCommands(); // Send rendering commands for frame 2 to the rendering thread
PumpOSMessages(); // Pump input OS messages for frame 3
UpdateInput(); // Process input for frame 3
WaitForLastPresentationAndGetTimestamp(); // Wait for frame 1 to appear on the screen. This is where the changes from our input event appear.

Phew, that’s it! Quite a lot of things happen between input being available as an OS message and its results being visible on the screen. If Unity is not dropping frames and the time spent by the game loop is mostly waiting compared to processing, the worst-case scenario of input latency from the engine for 144hz refresh rate is 4 * 6.94 = 27.76 ms, because we’re waiting for previous frames to appear on screen four times (that means four refresh rate intervals).

You can improve latency by pumping OS events and updating input after waiting for the previous frame to be displayed:

while (!ShouldQuit())
{
WaitForLastPresentationAndGetTimestamp();
PumpOSMessages();
UpdateInput();
Update();
WaitForRenderThread();
IssueRenderingCommands();
}

This eliminates one wait from the equation, and now the worst-case input latency is 3 * 6.94 = 20.82 ms.

It is possible to reduce input latency even further by reducing QualitySettings.maxQueuedFrames to 1 on platforms that support it. Then, the chain of input processing looks like this:

——————— // Input event arrives from the OS!
WaitForLastPresentationAndGetTimestamp(); // Wait for frame -2 to appear on the screen
PumpOSMessages(); // Pump input OS messages for frame 0
UpdateInput(); // Process input for frame 0
Update(); // Update game state for frame 0 with the input event that we are measuring
WaitForRenderThread(); // Wait until all commands from frame -1 are submitted to the GPU
IssueRenderingCommands(); // Send rendering commands for frame 0 to the rendering thread
WaitForLastPresentationAndGetTimestamp(); // Wait for frame 0 to appear on the screen. This is where the changes from our input event appear.

Now, the worst-case input latency is 2 * 6.94 = 13.88 ms. This is as low as we can possibly go when using VSync.

Warning: Setting QualitySettings.maxQueuedFrames to 1 will essentially disable pipelining in the engine, which will make it much harder to hit your target frame rate. Keep in mind that if you do end up running at a lower frame rate, your input latency will likely be worse than if you kept QualitySettings.maxQueuedFrames at 2. For instance, if it causes you to drop to 72 frames per second, your input latency will be 2 * 172 = 27.8 ms, which is worse than the previous latency of 20.82 ms. If you want to make use of this setting, we suggest you add it as an option to your game settings menu so gamers with fast hardware can reduce QualitySettings.maxQueuedFrames, while gamers with slower hardware can keep the default setting.

VSync effects on input latency

Disabling VSync can also help reduce input latency in certain situations. Recall that input latency is the amount of time that passes between an input becoming available from the OS and the frame that processed the input being displayed on the screen or, as a mathematical equation:

latency = tdisplaytinput

Given this equation there are two ways to reduce input latency: either make tdisplay lower (get the image to the display sooner) or make tinput higher (query input events later).

Sending image data from the GPU to display is extremely data-intensive. Just do the math: to send a 2560×1440 non-HDR image to the display 144 times per second requires transmitting 12.7 gigabits every second (24 bits per pixel * 2560 * 1440 * 144). This data cannot be transmitted in an instant: the GPU is constantly transmitting pixels to the display. After each frame is transmitted, there’s a brief break, and transmitting the next frame begins. This break period is called VBLANK. When VSync is enabled, you’re essentially telling the OS to flip the frame buffer only during VBLANK:

When you turn VSync off, the back buffer gets flipped to the front buffer the moment rendering is finished, which means that the display will suddenly start taking data from the new image in the middle of its refresh cycle, causing the upper part of the frame to be from the older frame and the lower part of the frame to be from the newer frame:

This phenomenon is known as “tearing.” Tearing allows us to reduce tdisplay for the lower part of the frame, sacrificing visual quality and animation smoothness for input latency. This is especially effective when the game’s frame rate is lower than VSync interval, which allows a partial recovery of the latency caused by a missed VSync. It is also more effective in games where the upper part of the screen is occupied by UI or a skybox, which makes it harder to notice tearing.

Another way disabling VSync can help reduce input latency is by increasing tinput. If the game is capable of rendering at a much higher frame rate than the refresh rate (for instance, at 150 fps on a 60 Hz display), then disabling VSync will make the game pump OS events several times during each refresh interval, which will reduce the average time they’re sitting in the OS input queue waiting for the engine to process them.

Keep in mind that disabling VSync should ultimately be up to the player of your game since it affects visual quality and can potentially cause nausea if the tearing ends up being noticeable. It is a best practice to provide a settings option in your game to enable/disable it if it’s supported by the platform.

Conclusion

With this fix implemented, Unity’s frame timeline looks like this:

But does it actually improve the smoothness of object movement? You bet it does!

We ran the Unity 2020.1 demo we showed at the start of this post in Unity 2020.2.0b1. Here is the resulting slow-motion video:

 

This fix is available in the 2020.2 beta for these platforms and graphics APIs:

  • Windows, Xbox One, Universal Windows Platform (D3D11 and D3D12)
  • macOS, iOS, tvOS (Metal)
  • Playstation 4
  • Switch

We plan to implement this for the remainder of our supported platforms in the near future.

Follow this forum thread for updates, and let us know what you think about our work so far.

Further reading on frame timing

Unity 2020.2 beta and beyond

If you’re interested in learning more about what’s available in 2020.2, check out the beta blog post and register for the Unity 2020.2 beta webinar. We’ve also recently shared our roadmap plans for 2021.

29 replies on “Fixing Time.deltaTime in Unity 2020.2 for smoother gameplay: What did it take?”

Why did you start with this code:

void Update()
{
transform.position += m_Velocity * Time.deltaTime;
}

While **everywhere** – including Unity official documentation, you can read: you should **NOT** change / calculate physics in Update, but in FixedUpdate, using Time.fixedDeltaTime.

Why is that? How can we trust an article beginning with that? What opinion/best practice should we trust in Unity?

Thank you for addressing this issue! It’s remarkable how intricate deltaTime calculation can be. Great work by the team!

Speaking of deltaTime, when you have a game with VSync on, Application.targetFramerate is 60 and playing on a 144 Hz monitor, the game will run as if it was in fast-forward. Is there any chance there might be a “cappedDeltaTime” of sorts exposed to Animator, Timeline, ParticleSystem, Shaders etc. or a more global setting to set the FPS limit to Application.targetFramerate with VSync on?

Thanks for all the hard work. Thanks for taking the time to detail the issue and solution. Looking forward to the fix.

Good that it’s fixed, but with Unity 2020b5 I’ve run into bug – Time.deltaTime cannot be larger than 0.1f for some reason. So if my framerate lower than 10 fps (and I’m trying to create benchmark were such low framerates a common thing on low-end hardware) it’s not working (.

This improvement is really welcoming.
Unfortunately, I am one of people who spent good amount of time trying to figure it out why my time delta is so not consistent.
I am not angry that it took so long, what is weird is the fact that the issue was just recently acknowledged as problem with the engine.
I have seen many topis talking about it and the most common outcome was: “Your code is wrong, it’s not Unity’s fault”.
The main issue I see it took 10 years to acknowledge there is an issue in the first place.
How many things like that cost developers time because the only answer on internet they can find is: “It’s developers problem, not Unity issue”.
I hope we won’t see anything else like this going that path.

Nice work, sometimes it’s amazing how you can still find & fix issues like these that have been around for 10 years (and using ‘standard’ methods that have been around for ages). In my engine I found the same issue and tried fixed timesteps (assuming you’d have no dropframes, which is ofcourse not really realistic). Unfortunately my whole timing structure was based on integer milliseconds and doing 60Hz meant alternatively jumping 16 and 17 ms at times. :-) Similar issues will come up anyway by the 1ms difference, although subtle.

The depth of this article!! Thanks so much for breaking this down and for doing the work to make delta time smoother. Please more blog posts like this :)

This is very very nice!

The following 2 questions come to my mind:

1) At the beginning of the post its mentioned that several assumptions are made (Graphics jobs disabled, etc.),
does this means the fix only works under those settings or was it mentioned just to keep the same game loop structure across the examples?

2) Does the fix works regardless if we are using URP or Built-in Renderer?

What a nice, in-depth article. I for one really appreciate the time that went into both implementing these improvements and writing such a detailed report.

Nice work, now if you could just figure out how to break the rendering buffer up into layers so user input could jump to the next rendering frame set we could almost eradicate input latency or at least get it down to nearly next frame timings. e.g. Player shoots and a muzzle flash is displayed in the next frame set to render.

This is actually what the VR headset drivers do. They implement a technique called late latch. The VR headset orientation is updated to GPU memory after the GPU command buffer has been recorded. Just before the GPU renders that queued frame. This is similar to the technique used by the hardware mouse cursor.

This oculus blog post describes the technique in more detail:
https://developer.oculus.com/blog/optimizing-vr-graphics-with-late-latching/

This technique works well with hardware devices that you can poll at precise time intervals (modern mouse can be polled 1000 times per second). However the GPU can’t poll the CPU to give it an exact game state at this exact moment. It takes significant time for the CPU to calculate one frame. Thus we must start calculating it in advance. But there’s no guarantee that a full CPU frame is ready when the GPU would need to late update the data. It might be ready every other time and miss the deadline every other frame. This would cause massive juddering to the animation. The CPU and the GPU are running asynchronously for a reason (both can run at the same time without waiting). If we add wait for late latch data, we lose this asynchronous execution. It’s not an easy problem to solve in generic case. High frequency hardware sensors such as mouse and VR headset can be made to work acceptably well, but even in these cases, if you do sudden large movements, you might turn the camera too much and see missing objects on screen edges. Visibility culling algorithm generated draw calls 1-2 frames ago. Just updating the camera matrix isn’t enough.

Sad that it took you 10+ years to finally look into it. Imagine now many users suffered due to this over those years.
But hey! At least you have something to write about in the blog!

“What is delta time and why is it important?”
A great headline for a blog post from an engine team.
Quite well reflects how unity perceives it’s users.

Is this response really necessary?
Have you addressed every possible issue you are aware of in your own projects?
Did Unity work on any other features other than this issue?
Is it possible there were other priorities that superseded this work?
Have any successful games been shipped with Unity with the delta time issue?
Do you evaluate your own work with the same level of negativity?

When commenting ask yourself what your intended outcome is. If the answer is to ultimately to hurt someone, just don’t. Please.

Yes, this new approach hasn’t been in the engine for past 10 years till now. But fun fact: it hasn’t been in most competing game engines out there either, at least not in the ones you can get access to. What Unity has done in past has been pretty standard way of doing it. So you are essentially blaming Unity here for doing something in past that most engines have done and still do for the delta time measurement.

Now that they made it better, Unity actually has edge over competing engines in this regard, yet you manage only to leave a negative comment? I guess some people are never happy. I know I’m thrilled about this change :)

I’d be thrilled about this change if I hadn’t been using hacks to get around the deltatime issues for seven years now. The simple reality is that not addressing known issues in the engine for years, up to a decade in time, is the standard for Unity and it’s what’s caused a lot of people, myself included, to give up on the engine entirely.

Leave a Reply

Your email address will not be published. Required fields are marked *