Search Unity

In Unity 4.6 / 5.0, the generation of batches for rendering of the UI system is very slow. This is due to a few factors, but ultimately our deadlines kept us from dedicating the time to polishing this part of the UI, instead focusing on the usability and API side of things.

In the final sprints of finishing the UI, we were lucky enough to have some help with optimisation. After we shipped, we decided to take a step back and analyse exactly why things were slow and how we could fix them.

If you want the quick and dirty:  We managed to move everything (apart from job scheduling) away from the main thread as well as drastically fix up some of the algorithms we were using in the batch sorting.

Performance Project

We developed a few UI performance test scenes to get a good baseline to work with when testing the performance changes. They stress the UI in a variety of ways. The test that was most applicable to the sorting / batch generation test had the canvas is completely filled with ‘buttons’. There are overlaps between the text on the button and the button background, so there will always will be some overhead in calculating what can batch with what. The test constantly modifies the UI elements so that rebatching is required every frame.

The test can be configured to place UI elements in an ordered way (taking advantage of spacial closeness), or a random way (potentially stressing the sorting algorithms more). It was clear to us that batch sorting needed to be fast in both scenarios. In 4.6 / 5.0 it is fast in neither.

It should be noted that the performance test tends to have ~10k UI elements. This is not something we would expect to see in a ‘real’ UI, most UI’s we’ve experienced have ~300 items per canvas.

All performance and profiling done from my MacBook Air (13-inch, Mid 2013).

Portion of the test scene:unnamed

Original (pre 4.6, no stats)

During the 4.6 betas we were getting feedback that batch sorting was very slow when there were many elements on the canvas. This was due to there basically being NO smartness when we were trying to figure out batch draw order. We would simply iterate the elements on the canvas and see what we collided with and then assign a depth based on some rules. This meant that as we added more elements to the scene, things would get slower (O(N^2)), much slower. This is ‘bad vibes’ in terms of performance.

4.6 / 5.0 release (baseline)

We did some work on the sorting that took advantage of the idea that ordered drawn elements would normally be in a similar location on the screen. From this, a bounding box was built (per group of n elements) and then new elements were collided with this group before being collided with individual elements. This lead to a decent performance increase in scenes that had locality between UI elements, but in randomly ordered scenes, or scenes were elements were spaced far apart, the improvements were only marginal.

 If we take a look at this version you can see that when placing random elements the batch performance massively breaks down, taking roughly 100ms to sort and populate a scene…. that’s for reals slow.

unnamed (1)

Looking at this in the timeline profiler also reveals another worrying situation… we are completely blocking anything else from happening. The batch generation is run just before UI is rendered, this is after a late update and often after scene cameras are rendered. It looks like it would make sense to bring the batch generation to be right after late update so that it can happen while a scene would normally be rendered.

unnamed (2)

Improved sorting (Take 1)

We did a first pass on improving sorting. It was still based on the idea of element locality, but with a few more smarts. It would try and keep groups ‘batchable’, so we could include / exclude batchability on a whole group level. It was faster, but still fell down when given very spatially separate scenes and did not scale well with the number of renderable elements.

Non spatially grouped inputunnamed (3)

Spatially grouped input 
unnamed (4)

This is pretty poor. It was clear that we needed a new approach.

Improved sorting (Take 2)

As mentioned earlier, sorting tends to break down and be slow in larger UI scenes with spread elements. We took a step back and thought about what might be a better approach. In the end we decided to implement a canvas grid structure. Each grid square becomes a ‘bucket’ and any UI element that touches a square gets added to that bucket. This means that when adding a new UI element we only need to look into the buckets that the element touches to find what it can / can’t batch with. This led to significant performance improvements when the scene was ordered randomly.

Non spatially grouped input unnamed (5)

Spatially grouped inputunnamed (6)

Comparable performance between setups!

Geometry Job

We reached the first step on the path to pulling the UI off the main thread by using the new Geometry Job system which was introduced in Unity 5. This is an internal feature that can be used to populate a vertex / index buffers in a threaded way. The changes that were made here allowed us to move a whole bunch of code off the main thread as the timeline below shows. There is some small overhead with regards to managing the geometry job, we have to create the job and job instructions, for example, which requires some memory, but this is negligible compared to the previous main thread cost.

unnamed (7)

Simplifying the batch sort

During the optimisation process, we did a bunch of smaller, profiler guided optimisations. The biggest gain was probably when we vectorised a bunch of our rectangular overlap checks in the sorting. Basically, getting our data into a super nice, DOD, layout ready for overlap checking, then checking with one call… it removed the overlap checks from a hot spot in the c++ profiler when before they accounted for ~60% of the sort time. As you can see doing this really helped our sort performance a bunch. But there was still a ways to go, and that was taking the 7ms off the main thread.

 vectorised overlap code
unnamed (8)

Taking it all off (the main thread)

The next logical step for us was to remove UI generation from the main thread. For this, we used the internal Job system to schedule a number of tasks. Some of them are serial, others are able to go wide and execute in parallel. Here is the breakdown:

1) Split incoming UI instructions into renderable instructions (1 UI instruction can contain many draw calls due to submeshs and multiple materials). This task goes wide. It allocates memory to accommodate the maximum possible number of renderable instructions. The incoming instructions are then processed in parallel and placed into the output array. This array is then ‘compressed’ down in a combine job into a contiguous section of memory just containing the valid instructions.

2) Sort the renderable instructions. Compare depths, overlaps ect. Basically sort for a command buffer the requires the LEAST amount of state change when rendering.

3) Batch Generation

  1. Generate the render command buffer. Create draw calls (batches / sub batches).
  2. Generate the transform instructions that the geometry job can use.
The jobs are scheduled right after LateUpdate. This allows them to execute while a normal scene would be rendering and before the UI would be displayed. When these jobs are scheduled a fence is held by the main thread. It will be waited on by both the call for the canvas rendering and the Geometry job until all the required data has been generated.

In the example below, you can see the geometry job ‘stall’ as it waits for the batch generation to be completed, we need to do more testing around this but as these scenes do not have any renderable elements aside from UI this issue would decrease as the complexity of the scene increases.

unnamed (9)

Executing on a machine that has a few more cores than my MacBook Air
unnamed (10)
So there we have it, 0.4ms on the main thread for a very expensive UI

Other performance things we did

  • 2D Rect clipping (most UI’s don’t really need stencil buffer it turns out, and this reduces draw calls and state change).
  • 2D Rect culling (if your element is out or render bounds… cull it).
  • Smarter canvas command buffer
    • Allow text / normal elements to share the same shaders / materials
    • Massively reduce set pass calls
    • Push a lot of UI specific data into material property blocks
    • Normally 1 set pass call for a UI, then multiple draw calls
  • * Combine UI into 1 mesh / index buffer
    • Use DrawIndexRange for rendering
    • One VBO / index buffer that resizes as needed
    • Splits to a new draw call when > 2^16 indicies

Next Steps

Right now, the sorting / batch generation is behaving acceptably; there are, or course, things we can do to make if faster, but the biggest issue is the time it takes to process the geometry job. As it’s now off the main thread and an isolated job, it’s a good candidate for tidying and speeding up. I’m fairly certain we are doing some dumb things still (is that branching in a tight inner loop?), and it’s also using a bunch of slow maths that could handle being vectorised very nicely.
At a higher level it is also worth looking at the situations that lead to a rebatch happening and attempting to minimise those. As always there is more work to do, but what is described here is in Unity 5.2 and already a significant improvement.

Take Away

Many of the new features in Unity 5.2 are pretty great. They allowed us to completely minimise the cost of the UI system on the main thread, as well as optimise the batching in general. When we were working, we used a strongly profiler guided approach to find out where the issues were; in one or two places, we decided to completely step back and try again when we realised the old solution was inadequate. Internally at Unity we are doing a lot more of this kind of work, really trying to address pain points and issues that you are reporting to us in a way that makes Unity better for everyone. Thank you for reporting bugs and real projects that have issues for us to investigate.
UI team

42 评论

订阅评论

评论被关闭。

  1. The UI is still broken in the latest 5.2.0p1. Half of the gui is unclickable. raycasts go thru.
    Maybe thats why the gui is faster, because IT IS NOT WORKING.

    1. What I want to know is if these improvements can be applied to the Legacy GUI system.
      The UGUI system is kinda unwieldly, and it’s difficult to make procedural layouts that use data, which is something the Legacy GUI is incredibly good at doing.

      After a little bit of fiddling with both, I’d even say the Legacy GUI is easier to make look good at different resolutions than UGUI is, and it doesn’t clutter up the hierarchy with tons of objects.

    2. If there is an eventsystem in the scene (and no graphic raycasters), then your UI will not be clickable.

      For every canvas root node you should ensure you have a Graphic Raycaster, as a graphicRaycaster does not raycast nest UI elements on canvas root nodes not in the same root node as tha Raycaster itself.

      You also need some kind of InputModule in the scene, either one of the default InputModules, or one created by yourself.

      I have no problems making the UGUI system work for me, using my own InputManager/Inut module handlers.

  2. I also upgraded to 5.2 and for my iOS game the CPU usage dropped from 45% to 20%. Great! But on the other hand the GPU usage increased, which now makes my game a lot slower. In the xcode performance profiler the renderer now runs constantly at 100%, before it only ran at 33%.

    Very strange…:(

  3. I upgraded to 5.2 yesterday and it’s killing me softly. So many weird issues now. My Android game is pretty much unplayable at this point. Reverting to 5.1.3.

  4. i see, tried the 5.2.0 and i think its faster because it is not working. complete canvases cant get raycasts.
    buttons not working.
    as you can see from here others also noticed it: http://forum.unity3d.com/threads/upgrading-to-unity-5-2-ui-problem-with-raycast-target.353586/
    the whole game is now broken. had to fall back to 5.1.3p2 and now everything works again.

  5. I don’t see shadows or other mesh modifiers in Performance Project (looking in picture), they heavily influence on UI performance / memory allocations and are the bottleneck of whole system.

  6. Interesting!I will update unity 5.2~

  7. This rules! Thanks for going so in depth. We are updating our UI heavy project, XPETS, to 5.2 right now. Excited to see if we get some performance increases!

  8. First tests shows a dramatic decrease in frame rate for our UI-based game.

    For one Android phone the frame rate dropped to about half of Unity 5.1.2. Will analyse why this is happening in the coming week. We had performance problems before and they are far worse now…

  9. Whether is the 2D Rect Mask only for rectangle mask, not for nonRectangle mask?

    1. အမ ရ ထ တ လ တ ဖတ ပ ည လ မ အ မ ပ ပသ ရတယ ..ဒ လ ပ ပ ..လ တစ ယ က မ တစ ခ ခ က တ ထ ခ ကတ ခ ည ပ ပ …အမမ လ ထ ခ တ attraction တရပ ရ နတယ ..အ ဒ က ဘ လ တ ပ တတ ဘ …အမစ က ဖတ ပ ရင သ တယ လ တ တ မ တယ ….အမ ပ ရ င ပ စ… န က ဆ ပ တ စ သ လ အ တ က က မ သဗ ..ဘ တ .. ရ င ရယ က ရ င ရယ ……… ခင တ …မ နတ

  10. Great job, one more reason to complete my transition to the new UI.

    Quick question: should we expect benefits from multithreading when making 3D UIs ? Are they also rendered at the end of the frame ? (I would assume they are not)
    What about rendering the UI to a rendertexture? I suspect multithreading would not provide the same gain for similar reasons

  11. Miguel Ferreira

    九月 8, 2015 9:10 上午

    Hi,

    Great news, I just have one small question. What exactly do you mean with “vectorised a bunch of our rectangular overlap checks in the sorting” and “and it’s also using a bunch of slow maths that could handle being vectorised very nicely”? Could you give one small example?

    1. Allows us to perform the same operation on datasets in a parallel way. https://en.wikipedia.org/wiki/SIMD

  12. Do you rewrite the whole UI System and Change the API of UI System? And Is it expensive to update existing project to unity 5.2? finally, Whether is the new UI System compatible with the old UI System’s API?

    1. This was for the in game 4.6+ UI system. API is the same, just a backend upgrade.

  13. Since you mention command buffers, will we have the ability to use the UI system with the existing command buffer API? Specifically, I want to render canvas renderers to a render texture, but I think it only works with mesh renderers. Maybe also expose the UI rendering step in the CameraEvent. This would be quite useful for making special effects for the UI, especially for text.

  14. Yeah, just when I need it.

  15. That parallel UI/geometry rendering scheme is a thing of beauty! Great job guys!

    We are currently in the process of updating ALL of KSP’s UI to use 4.6/5 canvas components (from the many redundant/conflicting UI solutions we had before)… That in itself is already a massive load off our frame times, so Unity 5.2 should make our UI overhaul that much more worth the effort!

    Very great news indeed! Many thanks from the KSP team!

    Cheers

    1. Looking forward to seeing what you come up with :)

  16. Is there any hope of UI performance improvements in 4.6.x? (The move to 5 is not a trivial thing for projects with a lot of baked lighting…)

    1. 6 comments below, Tim wrote “Only for 5.2”.

  17. Francesco Miglietta

    九月 7, 2015 6:32 下午

    Good Job UI Team!

    You don’t know how many games are just UI-based ;)

    Cheers

  18. Is Unity 5.2 still slated for tomorrow?

    1. Aras Pranckevičius

      九月 7, 2015 6:03 下午

      Yes

  19. Hey, I’d love a blog post talking about what you are planning to visual scripting :D

    1. Just cause it’s simple doesn’t mean it’s not super helpufl.

    2. Heck yeah ba-ebye keep them coming!

  20. Is this UI optimisation benefit the 2d sprite sorting for isometric map ?

    1. *You* are awesome! And I might re-think cnninag at some point now that you’ve brought to mind the possibility of doing it in December rather than the summer! I loved cnninag days as a child, but it was alway so hot.

  21. Also take a look at UI performance when used with Unity’s animation system, for me, completely killing performance.

    1. I’m experiencing the same problem when attempting to fade in/out a group of UI elements.

  22. Only for Unity 5.2? Not 4.x?

    1. Only for 5.2

  23. “We reached the first step on the path to pulling the UI off the main thread by using the new Geometry Job system which was introduced in Unity 5. This is an internal feature that can be used to populate a vertex / index buffers in a threaded way.”

    In this sentence, the word “internal” really sucks :D.

    Any idea when mere mortals.. I mean when “us users” can implement and shedule jobs like this? ;)

    1. I can’t say currently. I’m not sure if it’s on the roadmap.

      1. Aras Pranckevičius

        九月 7, 2015 5:44 下午

        Ability to efficiently “create” geometry from threads (i.e. exposing our “geometry job” thing to scripting) is on the wish list of things we want to do. We have some experiments in that area, but nothing we’re ready to ship/test yet. Stay tuned!

        1. we cannot wait to give this a go as the current dynamic mesh generation is the main cpu bottleneck for us

      2. Weeeeh! Same here, Bless, ang Triz maoy paspas kaayo mataikka sa iyahang name. Sa nag-tour mi sa school, ang amahan nawala pa, ang Triz diritso nakahinumdom sa locker nila. Ah kids!As for the bus riding, it’s Triz’s 3rd day today and for the past 2 days, she had enjoyed the experience. We asked her if she wants to be dropped by pero niingon man nga she is fine with riding the bus. I was all worried at first, pero pagkakita naku nga daghan pa mas gamay sa iyaha, katong mga kinder pa, na-comforted ra ko. Shiloh will do great for sure! And the mommy? Nah worries will always be there! Early BPC hopping here!

  24. Now this is finally great news, especially for slow but heavily multicore oriented platforms like Android or iOS in the future (the ipad air2 sits on 3 cores) :D

    Congratulations, looking forward to run further tests on 5.2 more sooner than later.

  25. Great to see this sort of a hard focus on optimization, this actually might be the last kick I need to migrate over to the new UI & update to 5.2 once it’s out!