Search Unity

Enhanced Aliasing with Burst

September 7, 2020 in Technology | 18 min. read
Topics covered
Share

Is this article helpful for you?

Thank you for your feedback!

The Unity Burst Compiler transforms your C# code into highly optimized machine code. Since the first stable release of Burst Compiler a year ago, we have been working to improve the quality, experience, and robustness of the compiler. As we’ve released a major new version, Burst 1.3, we would like to take this opportunity to give you more insights about why we are excited about a key performance focused feature - our new enhanced aliasing support.

The new compiler intrinsics Unity.Burst.CompilerServices.Aliasing.ExpectAliased and Unity.Burst.CompilerServices.Aliasing.ExpectNotAliased allow users to gain deep insight into how the compiler understands the code they write. These new intrinsics are combined with extended support for the [Unity.Burst.NoAlias] attribute, we've given our users a new superpower in the quest for performance.

Takeaways

In this blog post we will explain the concept of aliasing, how to use the [NoAlias] attribute to explain how the memory in your data structures alias, and how to use our new aliasing compiler intrinsics to be certain the compiler understands your code the way you do.

Aliasing

Aliasing is when two pointers to data happen to be pointing to the same memory allocation.

int Foo(ref int a, ref int b)
{
    b = 13;
    a = 42;
    return b;
}

The above is a classic performance related aliasing problem - the compiler without any external information cannot assume whether a aliases with b, and so produces the following nonoptimal assembly:

mov     dword ptr [rdx], 13
mov     dword ptr [rcx], 42
mov     eax, dword ptr [rdx]
ret

As can be seen it:

  • Stores 13 into b.
  • Stores 42 into a.
  • Reloads the value from b to return it.

It has to reload b because the compiler does not know whether a and b are backed by the same memory or not - if they were backed by the same memory then b will contain the value 42, if they were not it would contain the value 13.

A More Complex Example

Let's look at the following simple job:

[BurstCompile]
private struct CopyJob : IJob
{
    [ReadOnly]
    public NativeArray<float> Input;

    [WriteOnly]
    public NativeArray<float> Output;

    public void Execute()
    {
        for (int i = 0; i < Input.Length; i++)
        {
            Output[i] = Input[i];
        }
    }
}

The above job is simply copying from one buffer to another. If Input and Output do not alias above, EG. none of the memory locations backing them do not overlap, then the output from this job is:

If a compiler is aware that these two buffers do not alias, like Burst is with the above code example, then the compiler can vectorize the code such that it can copy N things instead of one at at time:

Let's look at what would happen if Input and Output happened to alias above. Firstly, the safety system will catch these common kinds of cases and provide user feedback if a mistake has been made. But let's assume you've turned safety checks off, what would happen?

As you can see, because the memory locations slightly overlap, the value a from the Input ends up propagated across the entirety of Output. Let's assume that the compiler also vectorized this example because it wrongly thought the memory locations did not alias, what would happen now?

Very bad things happen - the Output will not contain the data you expected.

Aliasing limits the Burst compilers ability to optimize code. It has an especially hard toll on vectorization - if the compiler thinks that any of the variables being used in the loop can alias, it generally cannot safely vectorize the loop. In Burst 1.3.0 and later, with our extended and improved aliasing support, we have vastly improved our performance story around aliasing.

Introducing the [NoAlias] Attribute

In Burst 1.3.0 we've extended where the [NoAlias] attribute can be placed to four places:

  • On a function parameter it signifies that the parameter does not alias with any other parameter to the function, or with the ‘this’ pointer.
  • On a field it signifies that the field does not alias with any other field of the struct.
  • On a struct itself it signifies that the address of the struct cannot appear within the struct itself.
  • On a function return value it signifies that the returned pointer does not alias with any other pointer ever returned from the same function.

In cases of fields and parameters, if the field type or parameter type is a struct, "does not alias with X" means that all pointers that can be found through any of the fields (even indirectly) of that struct are guaranteed not to alias with X.

In cases of parameters, note that a [NoAlias] attribute on a parameter guarantees it does not alias with this, which often is a job struct, which contains all data for the struct. In Entities.ForEach() scenarios, this will contain all the variables that were captured by the lambda.

We will now go through an example of each of these uses in turn.

NoAlias Function Parameter

If we look again at the example with Foo above, we can now add a [NoAlias] attribute and see what we get:

int Foo([NoAlias] ref int a, ref int b)
{
    b = 13;
    a = 42;
    return b;
}

Which turns into:

mov     dword ptr [rdx], 13
mov     dword ptr [rcx], 42
mov     eax, 13
ret

Notice that the load from ‘b’ has been replaced with moving the constant 13 into the return register.

NoAlias Struct Field

Let's take the same example from above but apply it to a struct instead:

struct Bar
{
    public NativeArray<int> a;
    public NativeArray<float> b;
}

int Foo(ref Bar bar)
{
    bar.b[0] = 42.0f;
    bar.a[0] = 13;
    return (int)bar.b[0];
}

The above produces the following assembly:

mov       rax, qword ptr [rcx + 16]
mov       dword ptr [rax], 1109917696
mov       rcx, qword ptr [rcx]
mov       dword ptr [rcx], 13
cvttss2si eax, dword ptr [rax]
ret

Which when parsed into our speech translates to:

  • Loads the address of the data in ‘b’ into rax.
  • Stores 42 into it (1109917696 is 0x‭42280000‬ which is 42.0f).
  • Loads the address of the data in ‘a’ into rcx.
  • Stores 13 into it.
  • Reloads the data in ‘b’ and converts it to an integer for returning.

Let's assume that you as the user know that the two NativeArray's are not backed by the same memory, you could:

struct Bar
{
    [NoAlias]
    public NativeArray<int> a;

    [NoAlias]
    public NativeArray<float> b;
}

int Foo(ref Bar bar)
{
    bar.b[0] = 42.0f;
    bar.a[0] = 13;
    return (int)bar.b[0];
}

By attributing both a and b with [NoAlias] we have told the compiler that they definitely do not alias with each other within the struct, which produces the following assembly:

mov     rax, qword ptr [rcx + 16]
mov     dword ptr [rax], 1109917696
mov     rax, qword ptr [rcx]
mov     dword ptr [rax], 13
mov     eax, 42
ret

Notice that the compiler can now just return the integer constant 42!

NoAlias on a Struct

Nearly all structs you will create as a user will be able to have the assumption that the pointer to the struct does not appear within the struct itself. Let's take a look at a classic example where this is not true:

unsafe struct CircularList
{
    public CircularList* next;

    public CircularList()
    {
        // The 'empty' list just points to itself.
        next = this;
    }
}

Lists are one of the few structures where it is normal to have the pointer to the struct accessible from somewhere within the struct itself.

Now onto a more concrete example of where [NoAlias] on a struct can help:

unsafe struct Bar
{
    public int i;
    public void* p;
}

float Foo(ref Bar bar)
{
    *(int*)bar.p = 42;
    return ((float*)bar.p)[bar.i];
}

Which produces the following assembly:

mov     rax, qword ptr [rcx + 8]
mov     dword ptr [rax], 42
mov     rax, qword ptr [rcx + 8]
mov     ecx, dword ptr [rcx]
movss   xmm0, dword ptr [rax + 4*rcx]
ret

As can be seen it:

  • Loads ‘p’ into rax.
  • Stores 42 into ‘p’.
  • Loads ‘p’ into rax again!
  • Loads ‘i’ into ecx.
  • Returns the index into ‘p’ by ‘i’.

Notice that it loaded ‘p’ twice - why? The reason is that the compiler does not know whether ‘p’ points to the address of the struct bar itself - so once it has stored 42 into ‘p’, it has to reload the address of ‘p’ from ‘bar’, just in case. A wasted load!

Let's add [NoAlias] now:

[NoAlias]
unsafe struct Bar
{
    public int i;
    public void* p;
}

float Foo(ref Bar bar)
{
    *(int*)bar.p = 42;
    return ((float*)bar.p)[bar.i];
}

Which produces the following assembly:

mov     rax, qword ptr [rcx + 8]
mov     dword ptr [rax], 42
mov     ecx, dword ptr [rcx]
movss   xmm0, dword ptr [rax + 4*rcx]
ret

Notice that it only loaded the address of ‘p’ once, because we've told the compiler that ‘p’ cannot be the pointer to ‘bar’.

NoAlias Function Return

Some functions can only return a unique pointer. For instance, malloc will only ever give you a unique pointer. For these cases [return:NoAlias] can provide the compiler with some useful information.

Let's take an example using a bump allocator backed with a stack allocation:

// Only ever returns a unique address into the stackalloc'ed memory.
// We've made this no-inline as the compiler will always try and inline
// small functions like these, which would defeat the purpose of this
// example!
[MethodImpl(MethodImplOptions.NoInlining)]
unsafe int* BumpAlloc(int* alloca)
{
    int location = alloca[0]++;
    return alloca + location;
}

unsafe int Func()
{
    int* alloca = stackalloc int[128];

    // Store our size at the start of the alloca.
    alloca[0] = 1;

    int* ptr1 = BumpAlloc(alloca);
    int* ptr2 = BumpAlloc(alloca);

    *ptr1 = 42;
    *ptr2 = 13;

    return *ptr1;
}

Which produces the following assembly:

push    rsi
push    rdi
push    rbx
sub     rsp, 544
lea     rcx, [rsp + 36]
movabs  rax, offset memset
mov     r8d, 508
xor     edx, edx
call    rax
mov     dword ptr [rsp + 32], 1
movabs  rbx, offset "BumpAlloc(int* alloca)"
lea     rsi, [rsp + 32]
mov     rcx, rsi
call    rbx
mov     rdi, rax
mov     rcx, rsi
call    rbx
mov     dword ptr [rdi], 42
mov     dword ptr [rax], 13
mov     eax, dword ptr [rdi]
add     rsp, 544
pop     rbx
pop     rdi
pop     rsi
ret

It's quite a lot of assembly, but the key bit is that it:

  • Has ‘ptr1’ in rdi.
  • Has ‘ptr2’ in rax.
  • Stores 42 into ‘ptr1’.
  • Stores 13 into ‘ptr2’.
  • Loads ‘ptr1’ again to return it.

Let's now add our [return: NoAlias] attribute:

// We've made this no-inline as the compiler will always try and inline
// small functions like these, which would defeat the purpose of this
// example!
[MethodImpl(MethodImplOptions.NoInlining)]
[return: NoAlias]
unsafe int* BumpAlloc(int* alloca)
{
    int location = alloca[0]++;
    return alloca + location;
}

unsafe int Func()
{
    int* alloca = stackalloc int[128];

    // Store our size at the start of the alloca.
    alloca[0] = 1;

    int* ptr1 = BumpAlloc(alloca);
    int* ptr2 = BumpAlloc(alloca);

    *ptr1 = 42;
    *ptr2 = 13;

    return *ptr1;
}

Which produces:

push    rsi
push    rdi
push    rbx
sub     rsp, 544
lea     rcx, [rsp + 36]
movabs  rax, offset memset
mov     r8d, 508
xor     edx, edx
call    rax
mov     dword ptr [rsp + 32], 1
movabs  rbx, offset "BumpAlloc(int* alloca)"
lea     rsi, [rsp + 32]
mov     rcx, rsi
call    rbx
mov     rdi, rax
mov     rcx, rsi
call    rbx
mov     dword ptr [rdi], 42
mov     dword ptr [rax], 13
mov     eax, 42
add     rsp, 544
pop     rbx
pop     rdi
pop     rsi
ret

And notice that the compiler doesn't reload ‘ptr2’ but simply moves 42 into the return register.

[return: NoAlias] should only ever be used on functions that are 100% guaranteed to produce a unique pointer, like with the bump-allocating example above, or with things like malloc. It is also important to note that the compiler aggressively inlines functions for performance considerations, and so small functions like the above will likely be inlined into their parents and produce the same result without the attribute (which is why we had to force no-inlining on the called function).

Function Cloning for Better Aliasing Deduction

For function calls where Burst knows about the aliasing between parameters to the function, Burst can infer the aliasing and propagate this onto the called function to allow for greater optimization opportunities. Let's look at an example:

// We've made this no-inline as the compiler will always try and inline
// small functions like these, which would defeat the purpose of this
// example!
[MethodImpl(MethodImplOptions.NoInlining)]
int Bar(ref int a, ref int b)
{
    a = 42;
    b = 13;
    return a;
}

int Foo()
{
    var a = 53;
    var b = -2;

    return Bar(ref a, ref b);
}
Previously the code for Bar would be:
mov     dword ptr [rcx], 42
mov     dword ptr [rdx], 13
mov     eax, dword ptr [rcx]
ret

This is because within the Bar function, the compiler did not know the aliasing of ‘a’ and ‘b’. This is in line with what other compiler technologies will do with this code snippet.

Burst is smarter than this though, and through a process of function cloning Burst will create a copy of Bar where the aliasing properties of ‘a’ and ‘b’ are known not to alias, and replace the original call to Bar with a call to the copy. This results in the following assembly:

mov     dword ptr [rcx], 42
mov     dword ptr [rdx], 13
mov     eax, 42
ret

Which as we can see doesn't perform the second load from ‘a’.

Aliasing Checks

Since aliasing is so key to the compilers ability to optimize for performance, we've added some aliasing intrinsics:

  • Unity.Burst.CompilerServices.Aliasing.ExpectAliased expects that the two pointers do alias, and generates a compiler error if not.
  • Unity.Burst.CompilerServices.Aliasing.ExpectNotAliased expects that the two pointers do not alias, and generates a compiler error if not.

An example:

using static Unity.Burst.CompilerServices.Aliasing;

[BurstCompile]
private struct CopyJob : IJob
{
    [ReadOnly]
    public NativeArray<float> Input;

    [WriteOnly]
    public NativeArray<float> Output;

    public unsafe void Execute()
    {
        // NativeContainer attributed structs (like NativeArray) cannot alias with each other in a job struct!
        ExpectNotAliased(Input.getUnsafePtr(), Output.getUnsafePtr());

        // NativeContainer structs cannot appear in other NativeContainer structs.
        ExpectNotAliased(in Input, in Output);
        ExpectNotAliased(in Input, Input.getUnsafePtr());
        ExpectNotAliased(in Input, Output.getUnsafePtr());
        ExpectNotAliased(in Output, Input.getUnsafePtr());
        ExpectNotAliased(in Output, Output.getUnsafePtr());

        // But things definitely alias with themselves!
        ExpectAliased(in Input, in Input);
        ExpectAliased(Input.getUnsafePtr(), Input.getUnsafePtr());
        ExpectAliased(in Output, in Output);
        ExpectAliased(Output.getUnsafePtr(), Output.getUnsafePtr());
    }
}

These intrinsics allow you to be certain that the compiler has all the information that you as the user know. These are compile time checks. When the code you write to produce the arguments for the intrinsics do not have side effects, there is no runtime cost for these aliasing intrinsics. They are particularly useful when you have some code that is performance sensitive that you want to be sure that any later changes do not change the assumptions the compiler can make about aliasing. With Burst, and the control we have over the compiler, we can provide this sort of in-depth feedback from the compiler to our users to ensure your code remains as optimized as you intended.

Job System Aliasing

The Unity Job System has some built-in assumptions it can make about aliasing. The rules are:

  1. Any struct with a [JobProducerType] (EG. anything like IJob, IJobParallelFor, etc) knows that any field of that struct that is a [NativeContainer] (EG. NativeArray, NativeSlice, etc) cannot alias with any other field that is also a [NativeContainer].
  2. The above is true except for fields that have the [NativeDisableContainerSafetyRestriction] attribute on them. For these fields, the user has explicitly told the Job System that this field can alias with any other field of the struct.
  3. Any struct with a [NativeContainer] cannot have the ‘this’ pointer of that struct within the struct itself.

Ok all the formal definitions over, let's look at some code to better explain the above rules:

[BurstCompile]
private struct JobSystemAliasingJob : IJobParallelFor
{
    public NativeArray<float> a;
    public NativeArray<float> b;

    [NativeDisableContainerSafetyRestriction]
    public NativeArray<float> c;

    public unsafe void Execute(int i)
    {
        // a & b do not alias because they are [NativeContainer]'s.
        ExpectNotAliased(a.GetUnsafePtr(), b.GetUnsafePtr());

        // But since c has [NativeDisableContainerSafetyRestriction] it can alias them.
        ExpectAliased(b.GetUnsafePtr(), c.GetUnsafePtr());
        ExpectAliased(a.GetUnsafePtr(), c.GetUnsafePtr());

        // No [NativeContainer]'s this pointer can appear within itself.
        ExpectNotAliased(in a, a.GetUnsafePtr());
        ExpectNotAliased(in b, b.GetUnsafePtr());
        ExpectNotAliased(in c, c.GetUnsafePtr());
    }
}

Walking through the above aliasing checks:

  • a and b do not alias since they are both [NativeContainer]'s contained within a [JobProducerType] struct.
  • But since c has the field attribute [NativeDisableContainerSafetyRestriction] it can alias with a or b.
  • And the pointers to each of a, b, or c cannot appear within them (EG. in this case the data backing the NativeArray cannot be the data backing the contents of the array).

These built-in aliasing rules allow Burst to perform pretty darn good optimizations for most user code, allowing the performance by default that we strive for.

Common Use Case Scenario

Many users will write code along the lines of BasicJob below:

[BurstCompile]
private struct BasicJob : IJobParallelFor
{
    public NativeArray<float> a;
    public NativeArray<float> b;
    public NativeArray<float> c;
    public NativeArray<float> o;

    public void Execute(int i)
    {
        o[i] = a[i] * b[i] + c[i];
    }
}

The code is loading from three arrays, combining their results, and storing it to a fourth array. This kind of code is great for the compiler because it allows it to generate vectorized code, making the most of the powerful CPUs we all have in our mobiles and desktop computers today.

If we look at the Burst Inspector view of the above job:

We can see the code is vectorized - the compiler has done a good job here! The compiler is able to vectorize because as we explained above the Unity Job System has rules that each variable in a job struct cannot alias any other member in the struct.

But there are cases that can be seen in the wild where developers are building up data structures where Burst has no information on how the aliasing works with those structures, for example:

[BurstCompile]
private struct NotEnoughAliasingInformationJob : IJobParallelFor
{
    public struct Data
    {
        public NativeArray<float> a;
        public NativeArray<float> b;
        public NativeArray<float> c;
        public NativeArray<float> o;
    }

    public Data d;

    public void Execute(int i)
    {
        d.o[i] = d.a[i] * d.b[i] + d.c[i];
    }
}

In the above example we've just wrapped the data members from the BasicJob in a new struct Data, and stored this struct as the only variable in the parent job struct. Let's see what the Burst Inspector shows us now:

Burst has been smart enough to vectorize this example - but at the cost of having to check that all of the pointers being used are not overlapping at the start of the loop.

This is because the Job System aliasing rules only give Burst guarantees about direct variable members of a struct - not anything derived from them. So Burst has to assume that the native array backing the variables a, b, c, and o is the same variable - meaning the complicated and performance draining dance of 'Do any of these pointers actually equal each other?'. So how can we fix this? By using our [NoAlias] attribute to explain this to Burst!

[BurstCompile]
private struct WithAliasingInformationJob : IJobParallelFor
{
    public struct Data
    {
        [NoAlias]
        public NativeArray<float> a;
        [NoAlias]
        public NativeArray<float> b;
        [NoAlias]
        public NativeArray<float> c;
        [NoAlias]
        public NativeArray<float> o;
    }

    public Data d;

    public void Execute(int i)
    {
        d.o[i] = d.a[i] * d.b[i] + d.c[i];
    }
}

In the WithAliasingInformationJob job above, we can see that there are new [NoAlias] attributes set on the fields of Data. These [NoAlias] attributes are telling Burst that:

  • a, b, c, and o do not alias with any other member of Data that has a [NoAlias] attribute.
  • So each variable does not alias with any other variable in the struct because they all have the [NoAlias] attribute.

And again we'll look at the Burst Inspector:

With this change we have removed all those expensive runtime pointer checks, and can just get on with running the vectorized loop - nice!

Using the new Unity.Burst.CompilerServices.Aliasing intrinsics will ensure that you never accidentally change the code to affect aliasing again in the future. For example:

[BurstCompile]
private struct WithAliasingInformationAndIntrinsicsJob : IJobParallelFor
{
    public struct Data
    {
        [NoAlias]
        public NativeArray<float> a;
        [NoAlias]
        public NativeArray<float> b;
        [NoAlias]
        public NativeArray<float> c;
        [NoAlias]
        public NativeArray<float> o;
    }

    public Data d;

    public unsafe void Execute(int i)
    {
    	// Check a does not alias with the other three.
        ExpectNotAliased(d.a.GetUnsafePtr(), d.b.GetUnsafePtr());
        ExpectNotAliased(d.a.GetUnsafePtr(), d.c.GetUnsafePtr());
        ExpectNotAliased(d.a.GetUnsafePtr(), d.o.GetUnsafePtr());

        // Check b does not alias with the other two (it has already been checked against a above).
        ExpectNotAliased(d.b.GetUnsafePtr(), d.c.GetUnsafePtr());
        ExpectNotAliased(d.b.GetUnsafePtr(), d.o.GetUnsafePtr());

        // Check that c and o do not alias (the other combinations have been checked above).
        ExpectNotAliased(d.c.GetUnsafePtr(), d.o.GetUnsafePtr());

        d.o[i] = d.a[i] * d.b[i] + d.c[i];
    }
}

These checks do not cause a compiler error in the above job - which means as we already seen, Burst has enough information because of the added [NoAlias] attributes to detect and optimize this case.

Now while this is a contrived example for the sake of conciseness in this blog, these kind of aliasing hints can provide very real-world performance benefit in your code. As we always recommend, using the Burst Inspector when iterating on code modifications you have made will ensure that you keep stepping towards a more optimized future.

Conclusion

With the release of Burst 1.3.0 we provided you another set of tools to get the maximum performance from your code. With the extended and enhanced [NoAlias] support you can perfectly control how your data structures work. And the new compiler intrinsics give you a meaningful insight into how the compiler understands your code.

If you haven’t started with Burst yet and would like to learn more about our work on the new Data-Oriented Technology Stack (DOTS), head over to our DOTS pages, where we will be adding more learning resources and links to talks from our teams as more becomes available. 

We always welcome your feedback - join the forum here to let us know how we can help you level up your Burst code in future.

September 7, 2020 in Technology | 18 min. read

Is this article helpful for you?

Thank you for your feedback!

Topics covered