Search Unity

As part of a recent session at Unite Now, we discussed how technology in the Burst compiler enables developers who are building projects with Unity to take advantage of the Arm Neon instruction set. You can use the Burst compiler when targeting Android devices to improve the performance of Unity projects supported by Arm architecture.

Unity and Arm have formed a partnership to enhance the mobile game development experience for the billion-plus Arm-powered mobile devices in the Android ecosystem. 

For game developers, performance is paramount. Year after year, Arm invests in improving its CPU and GPU technologies to provide the advances in performance and efficiency needed to build richer experiences. Recently, Arm announced two new products, Cortex-A78, which provides greatly improved power efficiency, and the even more impressive Cortex-X1. These hardware developments are complemented by advances in compiler technology for the Arm architecture. Compilers ensure that when you develop high-performance games, they are translated and optimized into efficient binaries that make the best use of the Arm architecture’s features.  

About Burst

Burst is an ahead of time compiler technology that can be used to accelerate the performance of Unity projects made using the new Data-Oriented Technology Stack (DOTS) and the Unity Job System. Burst works by compiling a subset of the C# language, known as High-Performance C# (HPC#), to make efficient use of a device’s power by deploying advanced optimizations built on top of the LLVM compiler framework. 

Burst is great for exploiting hidden parallelism in your applications. Using Burst from a DOTS project is easy, and it can unlock big performance benefits in CPU-bound algorithms. In this video, you can see a side-by-side comparison of a scripted run through in a demo environment with and without Burst enabled. 

The demo shows three examples of simulations using Unity Physics. You will see that the Burst-compiled code is able to compute frames with higher numbers of physics elements faster, allowing for better performance, less thermal throttling, lower battery consumption, and more engaging content. 

How does Burst work?

We say that Burst brings performance for free, but how does that work?

Burst transforms HPC# code into LLVM IR, an intermediate language used by the LLVM compiler framework. This allows the compiler to take full advantage of LLVM’s support for code generation for the Arm architecture to generate efficient machine code optimized around the data flow of your program. A diagram of this flow is shown below. 

Mike Acton has given a talk called “Data-oriented design and C++,” which features the key line “know your hardware, know your data” as a means of achieving maximum performance. Burst works well because it gives visibility to the constraints on array aliasing that are guaranteed by the HPC# language and the DOTS framework, and it can make use of LLVM’s knowledge of your hardware architecture. This enables Burst to make target-specific transformations based on the properties of scripts written against the Unity APIs.  

How to program for Burst

You can use Burst to compile C# scripts that make use of the Unity Jobs System in DOTS. This is done by adding the [BurstCompile] attribute to your Job definition: 

We can use the Burst Inspector, found in the Jobs menu, to see what code will be generated. Note that for this demonstration, we have disabled Safety Checks and are using Burst 1.3.3. 

In the Burst Inspector that appears, we enable code generation for Armv8-A by selecting the ARMV8A_AARCH64 target.

We can now see the AArch64 code that will be generated for our C# loop, including a core loop using the Neon instruction set.

For more details on using the Burst compiler, please see the instruction manual, check out this Unite Now talk, where we go through the steps above in more detail, or head to the forums to get more information or ask questions about using Burst in your next project.

24 replies on “Enhancing mobile performance with the Burst compiler”

Thank you for the great job! We’re working together with the Arm compiler team on more optimizations.

We had a more than 10x performance boost in CPU heavy parts of a mobile game and the effort was just less than a week to convert all related code to use burst and unmanaged memory in a proper manner.

In our case access to raw textures and meshes helped a lot as well.

I wanted to do that right after the video went public, but it slipped through cracks somehow. Let me see what I can do.

Now that Unity Learn is free, I would love to see more stuff on Burst. Thanks!
Also there is a typo, “Mike Action” > “Mike Acton”

What point is there to Burst coupled with ECS if systems’ main thread work and job scheduling exceeds the execution time of Burst jobs most of the time?

Hey there Nikolai. Thanks for reaching out! Wanted to pass along from the team that they are aware of this overhead, and are actively working on reducing it.

As Trey said, job scheduling optimization are on their way. The goal of the team working on it is to ensure that the overhead of scheduling a job is minimal (thus greatly improving the throughput). We’ll keep you posted!

Burst is one of the best techs to come out of Unity, it gave us a 10x-20x speed increase, truly amazing

We have to think about the future too. Your current projects will probably just stick to current tech, but future projects can be made with DOTS from the get go.

If we only want solutions that remain 100% compatible with the old monobehaviour flow, we’ll never see anything that brings huge improvements. Not even the best engineers in the world could make all that performance happen without requiring rewrites

I guess “for free” means “without using more memory”, which is a common trade-off for many optimizations like pooling, caching, baking etc.

Bit sensitive over an optimisation article? No need to call someone a liar. Frank is right, that it means ‘for free’ without any trade-offs and is welcome.

If you are creating a new game with ECS, then you don’t have to rewrite anything. It is true that you can’t take your existing game, add [BurstCompile] and enjoy performance improvements – I’ve also covered that in the slide explaining the limitations. However, as people have answered, there is no memory overhead or other kind of impact, so it’s _almost_ for free.

Would be interesting to see a comparison between “regular PhysX”, “Havok Physics” and “Unity Physics”.

The one shown here feels very flawed, comparing “Unity Physics, optimized for Burst but not using Burst” with “Unity Physics as it was designed”… This does not show the power of Burst at all.

(I think Burst is great, but your example doesn’t make sense)

Thank you for the kind words.
I would rather say Unity.Physics is designed for ECS (pure C#, stateless) and not exactly optimized for Burst – there is still room for optimizations, especially with the hardware intrinsics in Burst. We’ll be working on that.

Comments are closed.