Search Unity

The Unity Burst Compiler transforms your C# code into highly optimized machine code. One question that we get often from our amazing forum users like @dreamingimlatios surrounds in parameters to functions within Burst code. Should developers use them and where? We’ve put together this post to try and explain them a bit more in detail.

What are in parameters

C# 7.2 introduced in parameter modifiers as a way to pass something by reference to a function where the called function is not allowed to modify the data.

In parameters are a really useful language concept because it enforces a contract between the developer and the compiler as to how data will be used and modified. The in parameter modifier allows arguments to be passed by reference where the called function is not allowed to modify the data. It pairs up with the out parameter modifier (where parameters must be modified by the function) and the ref parameter modifier (where parameter values may be modified).

Indirect arguments and the ABI

Let’s look at the following simple job:

The above code can be broken down into:

  •  Call the DoSomething method which takes two structs passed by value.
  •  It performs some operation on the data, then returns the result (the operation doesn’t really matter for the purposes of this demo).
  •  Note that we’ve placed [MethodImpl(MethodImplOptions.NoInlining)] on the DoSomething method. We do this for two reasons:
    • It lets us pinpoint the specific method in the resulting assembly using the Burst Inspector.
    • It lets us simulate what would happen if the DoSomething method was really a very large function that Burst would not have inlined anyway. 

Now if we pull up the Burst Inspector, we can begin to dive into what is actually produced by the compiler for the above code:

Note the assembly we’ve highlighted in the red box – this is the number of bytes of stack required by the function. And now the Execute method itself:

And again note the highlighted red rectangle region – this is doing a bunch of copies between some memory address in the register rax, and the stack in rsp. So why is it doing this you might ask?

Welcome to the wonderful world of ABI – Application Binary Interface. Aeons and aeons ago when computers were bigger than most modern houses, some smart computer people realised that if two different people were going to write programs such that code from both of these people could be used together – they’d have to agree on the rules for doing that.

When data is passed from a caller to a callee, using a function, the compiler has to agree where function parameters are located so that the caller knows where to put the data, and the callee knows where to retrieve the data from.

Passing data from one function to another has a set of rules that the caller and callee have to both understand so that they can make sense of the right data in the right location. The rules in this case are called calling conventions, and there are lots of weird and wonderful varieties. Each operating system tends to have a different convention, some operating systems have multiple, but what is important is that both sides follow the same rules and not behave in a way you didn’t expect!

Most calling conventions allow simple data (primitive types or small structs) to be passed by value and in registers – the most efficient way to pass data. But large structs, anything more than 16 bytes in size or so, will generally have to be passed indirectly. If we look again at the simple job we showed above, we’ve now modified it to show you what the compiler has had to do to the code to conform to the ABI:

So the compiler has:

  •  Changed the arguments a and b to the ‘DoSomething’ function to be passed by reference instead.
  •  Added two new local variables InDataACopy and InDataBCopy in the Execute method.
  •  It has to take a copy of the data from InDataA and InDataB into these variables.
  •  Then call the DoSomething function passing these local variables by reference.

Now if we look again at the Burst Inspector output:

This is the assembly that the compiler generated copies map to. We’re copying a bunch of data. Now let’s instead look at the same example but using in parameters:

Now let us look again at the stack allocation size of this new job:

We can see that the stack size has shrunk to just 32 bytes from 192 previously. Next let’s look at the call to the DoSomething function:

Here we can see that the loads and stores that we previously had to make a copy of InDataA and InDataB are now gone – because we’ve told the compiler that it doesn’t need them. Nice! Using in parameters here let us tell the compiler how to do a better job at generating code, and if you imagine the DoSomething method was inside an inner loop that was really performance sensitive then we’ve just cut out a ton of instructions from that code.

NativeArray: note of caution

One slightly strange bit of C#’s in parameters is that you don’t have to explicitly mark the call site argument as having in, like you would with ref:

What happens behind the scenes is that the compiler will insert a local variable, store 42 into it, and then pass that by in for you, like:

So even though we’ve added in to the function, we’re still getting the copy that we were trying to avoid. One case where this comes up is with NativeArray – whose indexer returns the T by value and not by reference. We do this so as to avoid any dangling references to destroyed data in the NativeArray, and to ensure that no memory violations occur.

Let’s add a variant of our job to explore this:

In the new job we have:

  •  Changed it to be an IJobParallelFor.
  •  It now runs across arrays of data instead of a single element.
  •  The DoSomething callsite does not have an explicit in because the NativeArray indexer returns a value and not a reference.

And let’s look at the assembly as shown in the Burst Inspector:

The highlighted region shows that the loads and stores we were previously avoiding by using in parameters have returned, and we’re having to do them for every iteration of the loop now – doh! So how can we avoid this copy? By using a helper function as provided by UnsafeUtility:

In the above example we’ve added a new helper method GetElementAsRef. This just takes a native array and an index, and uses the UnsafeUtility.ArrayElementAsRef helper to return a reference to the element, rather than return by value. This code is unsafe because if you deleted the NativeArray, and thus removed the memory backing the allocation, referencing the element of the array would result in reading from either dead, or potentially reused, memory. As long as you’ve taken this consideration into account, we can now pass references into our native arrays explicitly by in to our DoSomething method, and if we look in the Burst Inspector once again:

We can see that the loads and stores to take a copy of the data are gone, and we’re back to efficient and performant code – nice!

Defensive Copies in C#

When the developers of C# announced in parameters they wrote a blog post talking about the performance characteristics of using them. One line I’ll quote from the post is: ‘It means that you should never pass a non-readonly struct as an in parameter.’

The reason the advice is to never pass a non-readonly struct as an in parameter is because if you call an instance method on that struct, it can cause the compiler to have to generate a copy of the in parameter in case the instance method could have modified the state. Let’s look at an example of this:

So in the above example we’re passing a SomeStruct as an in parameter to SomeMethod, and then calling an instance method on the struct. The C# compiler will notice this and generate a defensive copy of s in SomeMethod:

This is the IL generated by the compiler – and we can see that it will perform a ldobj and stloc.0 to take a copy of the in parameter.

In nearly all cases, as long as the instance method does not modify the state of the struct, Burst can deduce this and remove the defensive copy:

In the code above we can see that because the instance method did not modify the in parameter’s data, Burst has completely removed the copy. So although the general advice for C# code might be to only use in parameters with readonly structs, in Bursted HPC# as long as you do not store into the in parameter data you should be fine.

Conclusion

In parameters are a really powerful and useful language construct that provide a contract between developers and the compiler – a contract that lets you get optimal performance. As we’ve explored in this blog post:

  •  If you have non-inlined functions that take large structs by value, making these in parameters instead can lead to performance gains.
  •  You must be careful that at the callsite of the function that you have data that can be passed by in without resulting in copies – explicitly using the in modifier on the callsite will let the compiler tell you that this is the case.
  •  Using the Burst Inspector like we’ve shown here can give you tremendous insights into your code, please use it!

If you haven’t started with Burst yet and would like to learn more about our work on the new Data-Oriented Technology Stack (DOTS), head over to our DOTS pages, where we will be adding more learning resources and links to talks from our teams as more becomes available. 

We always welcome your feedback – join the forum here to let us know how we can help you level up your Burst code in future.

9 replies on “In parameters in Burst”

I really feel that the burst debugger needs some more friendly UI for those of us that cannot really read machine code like this. It would be great if your system could detect these potentials for performance improvements, and highlight them.

For example, if you added a roslyin compiler plugin for Jetbrains Rider, we could have these sorts of tips show up directly in the code editor, which would be very very useful.

Indeed. I’m reluctant to use this window even though I’m generally considered a low-level guy among my pears.

I’ve been playing around in parameters few months back and in my case it turned out to add more assembly lines (probably overhead?) in jobs.
I tend to have many small, utility functions that get used commonly in codebase [e.g. float CalculateTriangleArea(float4 a, float4 b, float4 c)].
It is also common to have some of their parameters hardcoded in job.

I’ve pressumed that this is the case of compiler being forced to take address of local temporary that is compile-time constant and not investigated further.

Example with inlined function:

[BurstCompile]
public struct MyJob : IJob
{
public readonly struct SomeStruct
{
public readonly float3 Position;
public readonly float4x4 Rotation;

public SomeStruct(float3 position, float4x4 rotation)
{
Position = position;
Rotation = rotation;
}
}

public SomeStruct InDataA;
public float3 OutData;

private static float3 DoSomething(SomeStruct a, SomeStruct b)
{
return math.rotate(a.Rotation, a.Position) +
math.rotate(b.Rotation, b.Position);
}

public unsafe void Execute()
{
OutData = DoSomething(InDataA, new SomeStruct(math.float3(1,2,3), float4x4.identity));
}
}

emits:

vmovsd xmm0, qword ptr [rcx]
vbroadcastss xmm1, xmm0
vmulps xmm1, xmm1, xmmword ptr [rcx + 12]
vpermilps xmm0, xmm0, 213
vmulps xmm0, xmm0, xmmword ptr [rcx + 28]
vaddps xmm0, xmm1, xmm0
vbroadcastss xmm1, dword ptr [rcx + 8]
vmulps xmm1, xmm1, xmmword ptr [rcx + 44]
vaddps xmm0, xmm0, xmm1
vaddps xmm0, xmm0, xmmword ptr [rip + __xmm@0000000040400000400000003f800000]
vmovss dword ptr [rcx + 76], xmm0
vextractps dword ptr [rcx + 80], xmm0, 1
vextractps dword ptr [rcx + 84], xmm0, 2
ret

whereas when we add in we get an extra vinsertps instruction.

Thus for:

private static float3 DoSomething(in SomeStruct a, in SomeStruct b){…}

we get:

vmovsd xmm0, qword ptr [rcx]
vinsertps xmm1, xmm0, dword ptr [rcx + 8], 32
vbroadcastss xmm2, xmm0
vmulps xmm2, xmm2, xmmword ptr [rcx + 12]
vpermilps xmm0, xmm0, 213
vmulps xmm0, xmm0, xmmword ptr [rcx + 28]
vaddps xmm0, xmm2, xmm0
vpermilps xmm1, xmm1, 234
vmulps xmm1, xmm1, xmmword ptr [rcx + 44]
vaddps xmm0, xmm1, xmm0
vaddps xmm0, xmm0, xmmword ptr [rip + __xmm@0000000040400000400000003f800000]
vmovss dword ptr [rcx + 76], xmm0
vextractps dword ptr [rcx + 80], xmm0, 1
vextractps dword ptr [rcx + 84], xmm0, 2
ret

Second example, for which I cannot determine which assembly is “better”: MultiplyRefJob vs MultiplyInRefJob vs MultiplyInRef2Job:

[MethodImpl(MethodImplOptions.NoInlining)]
public static float4 Multiply(float4 vec, float v)
{
return vec * v;
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static float4 MultiplyIn(in float4 vec, in float v)
{
return vec * v;
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static float4 MultiplyIn2(in float4 vec, float v)
{
return vec * v;
}

[BurstCompile]
public struct MultiplyRefJob : IJob
{
public NativeArray x;
public NativeArray y;

public unsafe void Execute()
{
UnsafeUtility.ArrayElementAsRef(y.GetUnsafePtr(), 0) =
Multiply(UnsafeUtility.ArrayElementAsRef(x.GetUnsafePtr(), 0), 1000f);
}
}

[BurstCompile]
public struct MultiplyInRefJob : IJob
{
public NativeArray x;
public NativeArray y;

public unsafe void Execute()
{
UnsafeUtility.ArrayElementAsRef(y.GetUnsafePtr(), 0) =
MultiplyIn(UnsafeUtility.ArrayElementAsRef(x.GetUnsafePtr(), 0), 1000f);
}
}

[BurstCompile]
public struct MultiplyInRef2Job : IJob
{
public NativeArray x;
public NativeArray y;

public unsafe void Execute()
{
UnsafeUtility.ArrayElementAsRef(y.GetUnsafePtr(), 0) =
MultiplyIn2(UnsafeUtility.ArrayElementAsRef(x.GetUnsafePtr(), 0), 1000f);
}
}

MultiplyRefJob has bigger stack (48) than MultiplyInRefJob (32), but Multiply second parameter 1000f have been hardcoded into the function body:

“BurstFoos.Multiply(Unity.Mathematics.float4 vec, float v)_49D95762617E63CF”:
vbroadcastss xmm0, dword ptr [rip + __real@447a0000]
vmulps xmm0, xmm0, xmmword ptr [rcx]
ret

Probably the winner is MultiplyInRef2Job, which has 32-bytes stack and hardcoded 1000f as second parameter. (Assuming optimization for speed, not size, right?).

Thus, it seems, that having in float v disables hardcoding 1000f into MultiplyIn function. I have no idea how it propagates though function chain calls.

PS. Burst Inspector converts in into ref in its call name, what suggests that in == ref readonly is not supported.
PPS. I’ve used the newest Burst package version 1.4.1

“NativeArray – whose indexer returns the T by value and not by reference. We do this so as to avoid any dangling references to destroyed data in the NativeArray, and to ensure that no memory violations occur.”

Dangling references? Returning ref T would be fine and is already used in Span.

Ref variables can only be used in local scope and can’t be used as class/struct fields thus heavily removing possibility of dangling references.

Couldn’t you just add a compiler Tag e.g.

[GetElementsAsRef]
OutData[i] = DoSomething(InDataA[i], InDataB[i]);

To get around the boilerplate code bloat you would otherwise need.

Comments are closed.