Search Unity

The Unity 2020.2 release features several optimizations that are now available for testing in beta. Read on to see where you can expect to see major speed-ups and get behind-the-scenes insights into what we’ve done to make these improvements.

Writing high-performance code is an integral part of efficient software development and has always been part of the development process at Unity. Two years ago, we took the bold step of forming a dedicated Optimization Team to focus on performance as a feature in its own right, which I now have the privilege of leading. See below for an overview of what we’ve got for the Unity 2020.2 release, and check out the Unity 2020.2 beta release notes for a list of all the other improvements.

Nested Prefab optimizations

The Optimization Team worked closely with the original developers of this feature, the Scene Management Team, on various optimizations to Nested Prefabs, including: 

  • Reduced modifications of dynamic array of Properties
  • Changed the sorting strategy for Modification array
  • Changed to using a hash set for faster lookups

When loading instances of Prefabs, we apply modifications to the various properties that are different in the instance, compared to the original Prefab asset. These are PropertyModifications. When merging PropertyModifications, there are updates and insertions to a dynamic array, and as this struct is very large, this quickly becomes costly. By not erasing the PropertyModifications from the new Property array, but keeping track of the already updated properties, the method was sped up by 60x (from 3,300 ms to 54 ms on a test project).

When updating PropertyModifications, the modification list might not be sorted correctly, thus a new sort is needed. Previously this was done by sorting the old modifications into a new array. On a test project, this took 11 seconds. By instead doing the sorting in containers and pointing to the modifications, the sorting was brought down to 44 ms (250x faster), and in the case where no modifications were needed down to 11 ms (800x faster).

Additionally, searching for the propertyPath in the list of modifications was sped up by 50x (from 300 ms to 6 ms) by changing the container to a hash set. This gives an overall optimization in generating property diffs.

ScriptedImporter optimization

  • Optimized a nested loop of linear searches in RegisterScriptedImporters

Database scalability tests showed that the performance of the Editor scripted importers registration function scaled badly as the number of importers being registered increased. The function was optimized by storing importers in a dictionary by file extension, to speed up searching for conflicts. The overall optimization was found to be between 12 to over 800 times faster when processing 100 to 5,000 importers (for overall improvement, see the graph on the right):

Editor workflow optimizations

  • Reduced string copies and allocs in key Editor tasks
  • Optimized find references in scenes by using temp memory

The team replaced lots of slow string memory allocations with temp memory labels, for strings that only exist within a single frame. In practice, most strings in Unity exist only as local variables within a single function call, so they can use a fast memory allocator. Most string utility functions now use temp memory.

Some of this work has already landed in 2020.1. Below are some graphs showing how many slow string allocations were removed by this work. The graphs show the number of slow string allocations between 2020.1.0a12 and 2020.2.0a20 in different projects, over several iterations of improvements (x-axis is iteration, y-axis is the number of string allocations):

Another Editor workflow optimization came from FindReferencesInScene. Previously, right-clicking an asset in the Project View and selecting Find References in Scene could be slow in large scenes.

By avoiding excessive smart pointer dereferences, and making use of temp memory, we improved the speed for general use cases by approximately 10%

In cases where the Scene was missing references, attempting to dereference their smart pointers meant trying to load an invalid filename from the filesystem every time. By detecting the invalid filename, and avoiding asking the filesystem to open a file that we know will fail, we reduced the search time by up to 3x.

Job System

  • JobQueue optimization giving a ~2x speed-up for scheduling of large parallel jobs

In collaboration with other internal teams, we have been working on optimizations to the JobQueue. This started with profiling the DOTS Sample project early in the year, which highlighted an unexpectedly high cost in AtomicStack::Pop(). Further investigation showed that the problem was in the memory management system in the JobQueue, especially for the JobInfo, which was using an AtomicStack as a memory management pool of items.

In the Data-Oriented Technology Stack (DOTS), there are ForEach jobs that require a Pop() per element in the ForEach for memory allocation and a Push() per element for memory deallocation. This leads to contention on the head item in the AtomicStack. 

Another team within Unity implemented a new atomic container specifically for the memory management use case with support for allocating chunks of elements as a single operation to avoid the Pop() per element in ForEach job.

Early local performance testing results were encouraging, showing improved performance scaling of up to 2x as the number of job worker threads is increased:

A member of the DOTS Team pointed out the use case where the new container should show a performance benefit, i.e., the JobQueue ForEach job. 

This example is on Android. Green is the new code, Red is the old code:

Optimized Camera.main

  • Eliminated unnecessary searching by storing a dedicated list of main camera nodes

Using Camera.main has always been ill advised, because of the searching it performs. Previously, all GameObjects with tags were previously searched, and any GameObjects with a matching tag were pulled out into a temporary array. Then that second list would be searched, and if any object had an enabled camera component, it was returned.

The new approach stores a dedicated list of objects with the MainCamera tag, and does not use a secondary array of potential matches. Instead, the list is queried directly, and as soon as a match is found, it is returned. All objects that are considered are objects with the MainCamera tag, so the chance of success is much higher.

In contrived test cases containing 50,000 objects, we saw speed increase by 21,000x to 51,000x

In a Spotlight Team customer project (shown below) many hundreds of milliseconds vanished to nothing after this improvement.

Optimized RenderManager camera usage

  • Reduced the impact of sorting the cameras in RenderManager

Previously, every time a camera was added to, or removed from, the RenderManager class, a linked list was updated to keep the active cameras sorted by depth. Every change required memory allocations and pointer dereferences to check the depth of each camera, which could be slow with many cameras.

Now, the list is sorted only when ordering is needed – because only rendering cares about a sorted list. So during loading, cameras can be added/removed to a flat array (fewer allocations!), and the sorting happens only on the first time a sorted list is requested (during rendering). This test shows the performance improvement in the final timings (the orange bar on the far right is the new code):

Texture loading optimizations

  • 2D texture and Cubemap creation occur on a thread on most graphics backends
  • 2D texture and Single Mip Cubemap loading optimized on consoles

To reduce hitches during texture loading, we moved 2D texture creation from the graphics thread to a worker thread. Unity 2019 releases included this optimization for most graphics backends. In Unity 2020.2, we fixed an additional case for DirectX 12, removing an 80 ms stall with an 8K texture.

We optimized Texture2D loading for consoles in Unity 2020.1 by moving the texture swizzling offline and loading directly to GPU memory. Performance gains are up to 30% for a 2D texture load, depending on texture size and platform.

In Unity 2020.2 we also optimized cubemap loading for consoles. For a 2K cubemap, some consoles saw a 30 ms savings on the job thread, cutting up to 15 ms of overall loading time for an individual texture.

Profile Analyzer 1.0.0

  • Released Profiler Analyzer 1.0.x as a verified package in 2020.2

Profiling and analysis always guide our performance optimization efforts, using a combination of platform-specific profiling tools and Unity’s own custom Profiler. To assist our profiling efforts, we wrote the Profile Analyzer tool, which will be available as a verified package in 2020.2. 

Profile Analyzer 1.0.0 and 1.0.x updates include numerous quality-of-life bug fixes, a few performance optimizations, and the addition of some small features, such as:

  • Optional column to show the threads a marker appears in
  • Multi-selection support to frame time graph UI
  • Sorting in the thread selection UI

The Profiler Team will be leading the future development of the tool.

Like what you see?

These updates are just part of our ongoing performance-enhancing contributions at Unity. We’re already hard at work to bring you more performance improvements in 2021. Please continue to send us feedback and let us know if there are areas you would like us to focus on in the comments.

Take Unity 2020.2 beta for a test run

With Unity 2020.2, we’re continuing our 2020 focus on performance, stability and workflow improvements. Join the beta program and let us know what you think about all the upcoming updates on the 2020.2 beta forum.

 

25 replies on “New performance improvements in Unity 2020.2”

This is all great work! However, why when running on my 32/core 64 thread threadripper with 64gb of ram and nvme with read/write 7000mb/s / 5000mb/s I still wait just as long for anything to load. Texture imports don’t seem to benefit from more cores, there’s really no benefits from having more cores.

Texture imports do benefit from cores, but not the whole process. Texture compression is multi-threaded and should scale to all cores, as well as some parts of other texture processing (sRGB conversions, etc.).

We are optimizing texture import time as we speak; some optimizations are already in current 2021.1 alpha builds. More coming along the way.

That said, some parts of texture importing won’t ever scale to multiple cores — e.g. if source texture file is a .PNG file, then decoding of the PNG can’t be multi-threaded (it just can’t; that’s how PNG works). To get more core scaling there (and for other similar situations), Unity would have to import multiple assets in parallel — today it does not, but there’s also work underway to make that happen eventually.

Can we use Camera.main without any performance issue or we should use traditional way of declaring Camera variable and assigning it via inspector ?

Declaring a Camera variable and assigning it via the Inspector will still be marginally faster than accessing Camera.main, because all function calls into the Unity engine code come with a small amount of extra cost, vs. using a cached variable in your script.

Great work! But could you also fix/optimize the static/dynamic batch generator? Currently in one of my projects the batch generator breaks, because the object is affected by multiple forward lights even though I have only one light (spotlight) active in my scene… I get like 100 extra drawcalls because of this and I think I need to merge the meshes using code, because the meshes are small and just let the GPU to decide which of them are visible and which are not…

Great news! I hope the assets importing time will be minimized in Unity 2020.2, it takes about an hour whenever I open a project on a new machine.

Asset import optimizations (mostly focusing on textures & meshes right now) are happening as we speak. Meanwhile, using Accelerator (née Cache Server) might help to reduce import work across multiple machines.

Why not use open hashing for an easier search of objects with a certain tag like Laurent suggested? Let’s say you have n different tags (buckets) and m objects for each tag. You could store a list (call it BucketList) of n pointers to linked lists (call them List_i) where each one of those linked lists corresponds to a different tag and when you want to find an object with a certain tag, you calculate the hash of that tag which gives you a position in the list BucketList, and that tells you which List_i contains objects with that specific tag. Using a good hash function (like this one which operates on strings http://www.azillionmonkeys.com/qed/hash.html) makes this lookup fast and the memory needed is O(number of tagged objects).

Incredible work guys! Really appreciate short overview articles like these.
Really good news ahead for 2020.2!
The Unity devs community is rooting for you!!!

I never use tags because they’re so slow.
Now if you generalize the camera.main optimization to all tags and make query against a hash of the tag name as fast then tag becomes very AI friendly.

The optimization that I want the most is lazy importing of assets. So that I don’t have to wait a few hours the first time I open a project on a new machine. There was talk of it last fall. Please make that happen and don’t drop the ball on it.

Yes I’m also looking forward to not having to compress all the textures at build time, and only compress that ones actually used the build

Reading between the lines I can see there’s a lot of really questionable historical decisions in the Unity codebase! Optimisations are the best thing for us in VR land, so thank you for keeping the code simpler and faster :-)

In Unity 2020 why blendshape contained model files have a small size than 2019? (Same model file) Is there anything new about blendshapes? I cannot found in release notes. The thing I know is Unity manages blendshapes with a special shader which is avoid re-uploading mesh to GPU. In 2020 is this still available? Thanks!

Deixe uma resposta

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *