Search Unity

In this final episode of our IL2CPP micro-optimization miniseries, we’ll explore the high cost of something called “boxing”, and we’ll see how IL2CPP can avoid it when it is done unnecessarily.

Heap allocations are slow

Like many programming languages, C# allows the memory for objects to be allocated on the stack (a small, “fast”, scope-specific, block of memory) or the heap (a large, “slow”, global block of memory). Usually allocating space for an object on the heap is much more expensive than allocating space on the stack. It also involves tracking that allocated memory in the garbage collector, which has an additional cost. So we try to avoid heap allocations where possible.

C# lets us do this by separating types into value types (which can be allocated on the stack), and reference types (which must be allocated on the heap). Types like int and float are value types, string and object are reference types. User-defined value types use the struct keyword. User-defined reference types use the class keyword. Note that a value type can never hold a the value null. In C#, the null value can only be assigned to reference types. Keep this distinction in mind as we continue.

Being good performance citizens, we try to avoid heap allocations unless they are necessary. But sometimes we need to convert a value type on the stack into a reference type on the heap. This process is called boxing. Boxing:

  1. Allocates space on the heap
  2. Informs the garbage collector about the new object
  3. Copies the data from the value type object into the new reference type object

Ugh, let’s add boxing to our list of things to avoid!

That pesky compiler

Suppose we are happily writing code, avoiding unnecessary heap allocations and boxing. Maybe we have some trees for our world, and each has a size which scales with its age:

Elsewhere in our code, we have this convenient method to sum up the size of many things (including possibly Tree objects):

This looks safe enough, but let’s peer into a little bit of the Intermediate Language (IL) code that the C# compiler generates:

The C# compiler has implemented the if (things[i] != null) check using boxing! If the type T is already a reference type, then the box opcode is pretty cheap – it just returns the existing pointer to the array element. But if type T is a value type (like our Tree type), then that box opcode is very costly. Of course, value types can never be null, so why do we need to implement the check in the first place? And what if we need to compute the size of one hundred Tree objects, or maybe one thousand Tree objects? That unnecessary boxing will quickly become very important.

The fastest code is anything you don’t execute

The C# compiler needs to provide a general implementation that works for any possible type T, so it is stuck with this slower code. But a compiler like IL2CPP can be a bit more aggressive when it generates code that will be executed and when it doesn’t generate the code that won’t!

IL2CPP will create an implementation of The TotalSize<T> method specifically for the case where T is a Tree. the IL code above looks like this in generated C++ code:

IL2CPP recognized that the box opcode is unnecessary for a value type, because we can prove ahead of time that a value type object will never be null. In a tight loop, this removal of an unnecessary allocation and copy of data can have a significant positive impact on performance.

Wrapping up

As with the other micro-optimizations discussed in this series, this one is a common optimization for .NET code generators. All of the scripting backends used by Unity currently perform this optimization for you, so you can get back to writing your code.

We hope you have enjoyed this miniseries about micro-optimizations. As we continue to improve the code generators and runtimes used by Unity, we’ll offer more insight into the micro-optimizations that go on behind the scenes.

17 Comments

Subscribe to comments

Comments are closed.

  1. In the page Learning, there is nothing about IL2CPP
    https://unity3d.com/learn/tutorials/topics/scripting
    A video that fit in a new ADVANCED GAMEPLAY SCRIPTING section.
    Could cover complex calculating for third person camera position, mesh impact deformation, alternative physics for videogames, aerodynamics drag…

    1. Yes, nothing is there yet. I’ll keep your suggestions in mind, and try to determine the best way to present this material. Thanks!

  2. theothermonarch

    August 16, 2016 at 1:46 am

    Not going to work for nullable types I assume.

  3. Robert Cummings

    August 12, 2016 at 12:14 am

    Mad science. Love it.

  4. “IL2CPP will create an implementation of The TotalSize method specifically for the case where T is a Tree.”

    The method is generic, why does it create the implementation for this specific case? not really clear from the post.

    Also, if IL2CPP can make these assumptions, why doesn’t the Mono compiler work in the same way ?

    1. I think what they mean is “IL2CPP will create an implementation of The TotalSize method specifically for the case where T is a a value type ( as is the case for Tree).”

      So they mean, IL2CPP will recognise that Tree is a value type, and react accordingly by omitting the boxing line.

    2. I’m sorry that this isn’t clear, let me try to explain it a bit more. The method is generic in C# code, but when executable code is created by some engine (either JIT, like Mono on desktop, or AOT, like IL2CPP), that engine is responsible to create an implementation of TotalSize for any possible T.

      A JIT engine does this as it encounters each usage of a type for T. An AOT engine inspects all of the code first to find all of the places T is actually used. In both cases, the result is the same – a specific implementation for each type T. Often the engine can share the implementation of the generic method to reduce code size.
      See this post for more details about how IL2CPP does this.

      But at execution time, there is code specifically created for TotalSize so IL2CPP can make this optimization. The Mono C# compiler can’t make this some optimization, since it is not generating executable code. But the Mono JIT and the Mono AOT engine both do make this optimization.

      1. …Just to add, the way generics are implemented in .Net CLR (and by IL2CPP) (i.e. to generate an implementation for any possible T) is in itself an optimisation of sorts. The other known way of doing this is called Type Erasure and is used in the Java Universe. In this case, only one implementation of the generic method is generated. However the type is essentially replaced with the base classObject and applies casts where necessary. This implementation however has many disadvantages compared to the .Net/IL2CPP approach. The only real advantage to type erasure is that the resulting binary could be significantly smaller.

  5. Now that live training are back, is it posible a live training about IL2CPP for beginners?

    1. What kind of topics do you have in mind? Ideally, everything related to IL2CPP should “just work”, as it is pretty low-level and behind-the-scenes, so I’m not sure what we could cover. But I’m open to ideas.

      1. Well In my case my knowledge of IL2CPP is null. So I do not know If is C++ that I can write as a script and attach it in inspector or is a Unity optimisation as for example a new compiler, or C# conversion that is included when we press build (or else) . So a didactic introduction video made by a teacher of what it is in first place and how to use it can help beginners (visual artist with visual skills that also do coding optimisation) to turn the light on it. The ABC. At that point the Unity user can make a decision of investigating more about IL2CPP. There are documentation and videos about IL2CPP (BCDE..) but looks more advance. A live training can be an A.B.C.D.E. where an introduce what IL2CPP is , what is for and how to use it. This blog article can be an example of how to use it. I can understand it. But not how to implement it. Thanks for the consideration.

      2. After explaining the abc you could cover a 3d or 2d vector calculation. In this way is generic and if we want to optimise a random vector, wind, make lift force or light calculations, shadows are cover. Or also to speed up game Start operations when generating procedurals ?

        1. I really like these suggestions, Alan. It feels like what we need is some kind of high level overview of how scripting works in Unity. What are the alternatives for users? What are the trade-offs for each alternative? These answers might really help organize the process of thinking about scripting in Unity. You’ve spurred some ideas in my mind about this now. Thanks!

  6. Unfortunately C# does not include a logic that deems constraints as part of the signature of an operation for overload checking, and thus, if you write two TotalSize operations, where one has a contraint “where T: struct, HasSize” and the other “where T: class, HasSize”, the compiler will claim that it is the same operation defined twice.

    Although some may argue whether the “if (things[i] != null)” check should be included in TotalSize or two different names should be assigned the methods (one for class types and another for strcuts), that optimization is indeed imvho a great feature in ILL2CPP.

    So, nice addition. Thanks a lot!

    1. The example code I’ve used here is a bit contrived to show off this optimization, for sure. But we have seen significant performance improvement in real-world projects.

  7. Can IL2CPP prevent boxing when enums are used as keys for dictionaries? I can of course implement the IEqualityComparer, but I’m just curious whether the compiler could do something in this case.

    1. That is a bit more difficult to do. Enum types override ToString, so IL2CPP can’t make many assumptions about when to avoid boxing them, as that could change program behavior. It’s probably best to use IEqualityComparer explicitly.