Search Unity

Optimizing loading performance: Understanding the Async Upload Pipeline

, October 8, 2018

Nobody likes loading screens. Did you know that you can quickly adjust Async Upload Pipeline (AUP) parameters to significantly improve your loading times? This article details how meshes and textures are loaded through the AUP. This understanding could help you speed up loading time significantly – some projects have seen over 2x performance improvements!

Read on to learn how the AUP works from a technical standpoint and what APIs you should be using to get the most out of it.

Try it Out

The latest, most optimal implementation of the Asset Upload Pipeline is available in the 2018.3 beta.


Download 2018.3 Beta Today


First, let’s take a detailed look at when the AUP is used and how the loading process works.

When is the Async Upload Pipeline used?

Prior to 2018.3, the AUP only handled textures. Starting with 2018.3 beta, the AUP now loads textures and meshes, but there are some exceptions. Textures that are read/write enabled, or meshes that are read/write enabled or compressed, will not use the AUP. (Note that Texture Mipmap Streaming, which was introduced in 2018.2, also uses AUP.)

How the loading process works

During the build process, the Texture or Mesh Object is written to a serialized file and the large binary data (texture or vertex data) is written to an accompanying .resS file. This layout applies to both player data and asset bundles. The separation of the object and binary data allows for faster loading of the serialized file (which will generally contain small objects), and it enables streamlined loading of the large binary data from the .resS file after. When the Texture or Mesh Object is deserialized, it submits a command to the AUP’s command queue. Once that command completes, the Texture or Mesh data has been uploaded to the GPU and the object can be integrated on the main thread.

Figure: Layout of mesh and texture data when serialized for a build.

During the upload process, the large binary data from the .resS file is read to a fixed-sized ring buffer. Once in memory, the data is uploaded to the GPU in a time-sliced fashion on the render thread. The size of the ring buffer and the duration of the time-slice are the two parameters that you can change to affect the behavior of the system.

The Async Upload Pipeline has the following process for each command:

  1. Wait until the required memory is available in the ring buffer.
  2. Read data from the source .resS file to the allocated memory.
  3. Perform post-processing (texture decompression, mesh collision generation, per platform fixup, etc).
  4. Upload in a time-sliced manner on the render thread
  5. Release Ring Buffer memory.

Multiple commands can be in progress simultaneously, but all must allocate their required memory out of the same shared ring buffer. When the ring buffer fills up, new commands will wait; this waiting will not cause main-thread blocking or affect frame rate, it simply slows the async loading process.

A summary of these impacts are as follows:

Load Pipeline Comparison
Without AUP AUP Impact on you
Memory Usage Allocate as data is read out of default heap. (High memory  watermarks) Fixed size ring buffer Reduced high memory watermarks
Upload Process Upload as data is available Amortized uploading with fixed time-slice Hitchless uploading
Post Processing Performed on loading thread (blocks loading thread) Performed on jobs in background Faster Loading

What public APIs are available to adjust loading parameters

To take full advantage of the AUP in 2018.3, there are three parameters that can be adjusted at runtime for this system:

  • QualitySettings.asyncUploadTimeSlice – The amount of time in milliseconds spent uploading textures and mesh data on the render thread for each frame. When an async load operation is in progress, the system will perform two time slices of this size. The default value is 2ms. If this value is too small, you could become bottlenecked on texture/mesh GPU uploading. A value too large, on the other hand, might result in framerate hitching.
  • QualitySettings.asyncUploadBufferSize – The size of the Ring Buffer in Megabytes. When the upload time slice occurs each frame, we want to be sure that we have enough data in the ring buffer to utilize the entire time-slice. If the ring buffer is too small, the upload time slice will be cut short. The default was 4MB in 2018.2 but has increased 16MB in 2018.3.
  • QualitySettings.asyncUploadPersistentBuffer – Introduced in 2018.3, this flag determines if the upload ring buffer is deallocated when all pending reads are complete. Allocating and deallocating this buffer can often cause memory fragmentation, so it should generally be left at its default(true). If you really need to reclaim memory when you are not loading, you can set this value to false.

These settings can be adjusted through the scripting API or via the QualitySettings menu.

Example workflow

Let’s examine a workload with lots of textures and meshes being uploaded through the Async Upload Pipeline using the default 2ms time slice and a 4MB ring buffer. Since we’re loading, we get 2 time-slices per render frame, so we should have 4 milliseconds of upload time. Looking at the profiler data, we only use about 1.5 milliseconds. We can also see that immediately after the upload, a new read operation is issued now that memory is available in the ring buffer. This is a sign that a larger ring buffer is needed.

Let’s try increasing the Ring Buffer and since we’re in a loading screen, it is also a good idea to increase the upload time-slice. Here’s what a 16MB Ring Buffer and 4-millisecond time slice look like:

Now we can see that we are spending almost all our render thread time uploading, and just a short time between uploads rendering the frame.

Below are the loading times of the sample workload with a variety of upload time slices and Ring Buffer sizes. Tests were run on a MacBook Pro, 2.8GHz Intel Core i7 running OS X El Capitan. Upload speeds and I/O speeds will vary on different platforms and devices. The workload is a subset of the Viking Village sample project that we use internally for performance testing. Because there are other objects being loaded, we aren’t able to get the precise performance win of the different values. It’s safe to say in this case, however, that the texture and mesh loading is at least twice as fast when switching from the 4MB/2MS settings to the 16MB/4MS settings.

Experimenting with these parameters outputs the following results.

To optimize loading times for this particular sample project, we should, therefore, configure settings like this:

Takeaways and recommendations

General recommendations for optimizing loading speed of textures and meshes:

  • Choose the largest QualitySettings.asyncUploadTimeSlice that doesn’t result in dropping frames.
  • During loading screens, temporarily increase QualitySettings.asyncUploadTimeSlice.
  • Use the profiler to examine the time slice utilization. The time slice will show up as AsyncUploadManager.AsyncResourceUpload in the profiler. Increase QualitySettings.asyncUploadBufferSize if your time slice is not being fully utilized.
  • Things will generally load faster with a larger QualitySettings.asyncUploadBufferSize, so if you can afford the memory, increase it to 16MB or 32MB.
  • Leave QualitySettings.asyncUploadPersistentBuffer set to true unless you have a compelling reason to reduce your runtime memory usage while not loading.


Q: How often will time-sliced uploading occur on the render thread?

  • Time-sliced uploading will occur once per render frame, or twice during an async load operation. VSync affects this pipeline. While the render thread is waiting for a VSync, you could be uploading. If you are running at 16ms frames and then one frame goes long, say 17ms, you will end up waiting for the vsync for 15ms. In general, the higher the frame rate, the more frequently upload time slices will occur.

Q: What is loaded through the AUP?

  • Textures that are not read/write-enabled are uploaded through the AUP.
  • As of 2018.2, texture mipmaps are streamed through the AUP.
  • As of 2018.3, meshes are also uploaded through the AUP so long as they are uncompressed and not read/write enabled.

Q: What if the ring buffer is not large enough to hold the data being uploaded(for example a really large texture)?

  • Upload commands that are larger than the ring buffer will wait until the ring buffer is fully consumed, then the ring buffer will be reallocated to fit the large allocation. Once the upload is complete, the ring buffer will be reallocated to its original size.

Q: How do synchronous load APIs work? For example, Resources.Load, AssetBundle.LoadAsset, etc.

  • Synchronous loading calls use the AUP and will essentially block the main thread until the async upload operation completes. The type of loading API used is not relevant.

Tell us what you think

We’re always looking for feedback.  Let us know what you think in the comments or on the Unity 2018.3 beta forum!

32 replies on “Optimizing loading performance: Understanding the Async Upload Pipeline”

Thank you for good information.
I want to apply this but I can’t find asyncUpload info in profiler.
async mesh upload, resource upload etc I can’t see anything.
I use LoadSceneAsync in coroutine.
Is there setting to use async upload?
I use Unity 2018.3.0b9 and 2017.3.1p4
Thank you

Thank you for good information.
I want to apply this but I can’t find asyncUpload info in profiler.
async mesh upload, resource upload etc I can’t see anything.
I use LoadSceneAsync in coroutine.
Is there setting to use async upload?
I use Unity 2018.3.0b9 and 2017.3.1p4
Thank you

Still confused with time slice , is there any useful resource to understand time slic in the asynchronous upload progress ?
for example, why add this time slice feature, how this value affect the asynchronous upload ?what ‘s the relative btween time slice with frame rate?

About the ring buffer… Do I properly understand that having a larger ring buffer would mean that stutters are more likely to occur during the upload time?

Thanks so much!
Now big problem is Asynchronous loading shaders – when shaders appears on screen, it’s take 100-600ms on render tread on iPhone 6 – it’s totally jork game…

You can try our project: MadOut2 BigCityOnline

Need add possibility to load and compile shaders async too!

Awesome feature. Question to Unity staff: how does this feature affect multi scene async loading, if it does? I would like to load in sections of my level as I move around. Also, what API is there to manage meshes and audio (if any)? Texture mip streaming was a great start but there is not much information about everything else.

Thanks again!


thank you very much for this post, great information!

You wrote “the higher the frame rate, the more frequently upload time slices will occur”. Does this mean if I turn off VSync and set the applicationTargetFramerate as high as possible, it affects the loading time in a positive fashion?

I’m asking, because I did the exact opposite. I reduced framerate to 20fps and turned on VSync during loading screens, thinking it would give Unity more resources to actually load scenes, assets and integrate those faster. I thought I trade faster loading for more hiccups in framerate.

Thanks for your answer in advance.

What about triggering the upload of resources? Is this still bound to renderers becoming visible on cameras and does it still require “trick”s like rendering one frame behind a full-screen overlay? Or are there proper APIs for ensuring resources that I know will be needed can be loaded/uploaded completely during a loading screen?

as console developer who stock with unity 2017 … it pisses me of that all the cool stuff is only unity 2018 without any support for 2017 make regret that i didn’t switch to unreal engine

Uh… how does it screw over older versions when you can just… download all the previous versions if you want to?

Soooo… if I understand your post… are you saying : “INJUSTICE!!! I picked and am clinging on an older version of your software and it doesn’t do what the newer version does – you suck, It would never happen in other superior engines, I demand support for everything everywhere “. Dude, either get with the newer versions or go to Unreal and deal with their pros and cons there- porting your project to a newer version of Unity will probably be easier on your soul than porting it to Unreal… You are always limited to the functionality offered in your version, maybe sometimes you get support later on, but they are primarily trying to make new stuff and implement it in the newer versions and move forward so you can get new S*!t in the newer versions FASTER and more robust instead of using countless resources focusing on compatibility issues for people using outdated versions. Maybe you wanna work on your new VR project and you will chose Unity 4.2, but you want job system and ECS and VR and basically everything? And I bet If I go find and install several versions of Unreal I will probably find dosens of examples with similar/same problems. I’m not sure why your post triggered me but I find your logic a bit faulty. Or just perhaps Unity might be looking for people that are willing to make packages to offer support of new functionalities for older versions and by what I read you are the right man/woman/apache for the job.

Any source to back this up? First time I hear about it but I haven’t dabbled in consoles with Unity yet.

Comments are closed.