Search Unity

This is the second blog post in the IL2CPP Internals series. In this post, we will investigate the C++ code generated by il2cpp.exe. Along the way, we will see how managed types are represented in native code, take a look at runtime checks used to support the .NET virtual machine, see how loops are generated and more!

We will get into some very version-specific code that is certainly going to change in later versions of Unity. Still, the concepts will remain the same.

Example project

I’ll use the latest version of Unity available, 5.0.1p1, for this example. As in the first post in this series, I’ll start with an empty project and add one script file. This time, it has the following contents:



I’ll build this project for WebGL, running the Unity editor on Windows. I’ve selected the Development Player option in the Build Settings, so that we can get relatively nice names in the generated C++ code. I’ve also set the Enable Exceptions option in the WebGL Player Settings to Full.

Overview of the generated code

After the WebGL build is complete, the generated C++ code is available in the Temp\StagingArea\Data\il2cppOutput directory in my project directory. Once the editor is closed, this directory will be deleted. As long as the editor is open though, this directory will remain unchanged, so we can inspect it.

The il2cpp.exe utility generated a number of files, even for this small project. I see 4625 header files  and 89 C++ source code files. To get a handle on all of this code, I like to use a text editor which works with Exuberant CTags. CTags will usually generate a tags file quickly for this code, which makes it easier to navigate.

Initially, you can see that many of the generated C++ files are not from the simple script code, but instead are the converted version of the code in the standard libraries, like mscorlib.dll. As mentioned in the first post in this series, the IL2CPP scripting backend uses the same standard library code as the Mono scripting backend. Note that we convert the code in mscorlib.dll and other standard library assemblies each time il2cpp.exe runs. This might seem unnecessary, since that code does not change.

However, the IL2CPP scripting backend always uses byte code stripping to decrease the executable size. So even small changes in the script code can cause many different parts of the standard library code to be used or not, depending on the situation. Therefore, we need to convert the mscorlib.dll assembly each time. We are researching better ways to do incremental builds, but we don’t have any good solutions yet.

How managed code maps to generated C++ code

For each type in the managed code, il2cpp.exe will generate one header file for the C++ definition of the type and another header file for the method declarations for the type. For example, let’s look at the contents of the converted UnityEngine.Vector3 type. The header file for the type is named UnityEngine_UnityEngine_Vector3.h. The name is created based on the name of the assembly, UnityEngine.dll followed by the namespace and name of the type. The code looks like this:



The il2cpp.exe utility has converted each of the three instance fields, and done a little bit of name mangling to avoid conflicts and reserved words. By using leading underscores, we are using some reserved names in C++, but so far we’ve not seen any conflicts with C++ standard library code.

The UnityEngine_UnityEngine_Vector3MethodDeclarations.h file contains the method declarations for all of the methods in Vector3. For example, Vector3 overrides the Object.ToString method:

Note the comment, which indicates the managed method this native declaration represents. I often find it useful to search the files in the output for the name of the managed method in this format, especially for methods with common names, like ToString.

Notice a few interesting things about all methods converted by il2cpp.exe:

  • These are not member functions in C++. All methods are free functions, where the first argument is the “this” pointer. For static functions in managed code, IL2CPP always passes a value of NULL for this first argument. By always declaring methods with the “this” pointer as the first argument, we simplify the method generation code in il2cpp.exe and we make invoking methods via other methods (like delegates) simpler for generated code.
  • Every method has an additional argument of type MethodInfo* which includes the metadata about the method that is used for things like virtual method invocation. The Mono scripting backend uses platform-specific trampolines to pass this metadata. For IL2CPP, we’ve decided to avoid the use of trampolines to aid in portability.
  • All methods are declared extern “C” so that il2cpp.exe can sometimes lie to the C++ compiler and treat all methods as if they had the same type.
  • Types are named with a “_t” suffix. Methods are named with a “_m” suffix. Naming conflicts are resolved by appended an unique number to each name. These numbers will change if anything in the user script code changes, so you cannot depend on them from build to build.

The first two points imply that every method has at least two parameters, the “this” pointer and the MethodInfo pointer. Do these extra parameters cause unnecessary overhead? While they clearly do add overhead, we haven’t seen anything so far which suggests that those extra arguments cause performance problems. Although it may seem that they would, profiling has shown that the difference in performance is not measurable.

We can jump to the definition of this ToString method using Ctags. It is in the Bulk_UnityEngine_0.cpp file. The code in that method definition doesn’t look too much like the C# code in the Vector3::ToString() method. However, if you use a tool like ILSpy to reflect the code for the Vector3::ToString() method, you’ll see that the generated C++ code looks very similar to the IL code.

Why doesn’t il2cpp.exe generate a separate C++ file for the method definitions for each type, as it does for the method declarations? This Bulk_UnityEngine_0.cpp file is pretty large, 20,481 lines actually! We found the C++ compilers we were using had trouble with a large number of source files. Compiling four thousand .cpp files took much longer than compiling the same source code in 80 .cpp files. So il2cpp.exe batches the methods definitions for types into groups and generates one C++ file per group.

Now jump back to the method declarations header file and notice this line near the top of the file:



The il2cpp-codegen.h file contains the interface which generated code uses to access the libil2cpp runtime services. We’ll discuss some ways that the runtime is used by generated code later.

Method prologues

Let’s take a look at the definition of the Vector3::ToString() method. Specifically, it has a common prologue that is emitted in all methods by il2cpp.exe.


The first line of this prologue creates a local variable of type StackTraceSentry. This variable is used to track the managed call stack, so that IL2CPP can report it in calls like Environment.StackTrace. Code generation of this entry is actually optional, and is enabled in this case by the --enable-stacktrace option passed to il2cpp.exe (since I set Enable Exceptions option in the WebGL Player Settings to Full). For small functions, we found that the overhead of this variable has a negative impact on performance. So for iOS and other platforms where we can use platform-specific stack trace information, we never emit this line into generated code. For WebGL, we don’t have platform-specific stack trace support, so it is necessary to allow managed code exceptions to work properly.

The second part of the prologue does lazy initialization of type metadata for any array or generic types used in the method body. So the name ObjectU5BU5D_t4 is the name of the type System.Object[]. This part of the prologue is only executed once and often does nothing if the type was already initialized elsewhere, so we have not seen any adverse performance implications from this generated code.

Is this code thread safe though? What if two threads call Vector3::ToString() at the same time? Actually, this code is not problematic, since all of the code in the libil2cpp runtime used for type initialization is safe to call from multiple threads. It is possible (maybe even likely) that il2cpp_codegen_class_from_type function will be called more than once, but the actual work it does will only occur once, on one thread. Method execution won’t continue until that initialization is complete. So this method prologue is thread safe.

Runtime checks

The next part of the method creates an object array, stores the value of the x field of Vector3 in a local, then boxes the local and adds it to the array at index zero. Here is the generated C++ code (with some annotations):

The three runtime checks are not present in the IL code, but are instead injected by il2cpp.exe.

  • The NullCheck code will throw a NullReferenceException if the value of the array is null.
  • The IL2CPP_ARRAY_BOUNDS_CHECK code will throw an IndexOutOfRangeException if the array index is not correct.
  • The ArrayElementTypeCheck code will thrown an ArrayTypeMismatchException if the type of the element being added to the array is not correct.

These three runtime checks are all guarantees provided by the .NET virtual machine. Rather than injecting code, the Mono scripting backend uses platform specific signaling mechanism to handle these same runtime checks. For IL2CPP, we wanted to be more platform agnostic and support platforms like WebGL, where there is no platform-specific signaling mechanism, so il2cpp.exe injects these checks.

Do these runtime checks cause performance problems though? In most cases, we’ve not seen any adverse impact on performance and they provide the benefits and safety which are required by the .NET virtual machine. In a few specific cases though, we are seeing these checks lead to degraded performance, especially in tight loops. We’re working on a way now to allow managed code to be annotated to remove these runtime checks when il2cpp.exe generates C++ code. Stay tuned on this one.

Static Fields

Now that we’ve seen how instance fields look (in the Vector3 type), let’s see how static fields are converted and accessed. Find the definition of the HelloWorld_Start_m3 method, which is in the Bulk_Assembly-CSharp_0.cpp file in my build. From there, jump to the Important_t1 type (in theAssemblyU2DCSharp_HelloWorld_Important.h file):



Notice that il2cpp.exe has generated a separate C++ struct to hold the static field for this type, since the static field is shared between all instances of this type. So at runtime, there will be one instance of the Important_t1_StaticFields type created, and all of the instances of the Important_t1 type will share that instance of the static fields type. In generated code, the static field is accessed like this:

The type metadata for Important_t1 holds a pointer to the single instance of the Important_t1_StaticFields type, and that instance is used to obtain the value of the static field.


Managed exceptions are converted by il2cpp.exe to C++ exceptions. We have chosen this path to again avoid platform-specific solutions. When il2cpp.exe needs to emit code to raise a managed exception, it calls the il2cpp_codegen_raise_exception function.

The code in our HelloWorld_Start_m3 method to throw and catch a managed exception looks like this:



All managed exceptions are wrapped in the C++ Il2CppExceptionWrapper type. When the generated code catches an exception of that type, it unpacks the C++ representation of the managed exception (which has type Exception_t8). In this case, we’re looking only for a InvalidOperationException, so if we don’t find an exception of that type, a copy of the C++ exception is thrown again. If we do find the correct type, the code jumps to the implementation of the catch handler, and writes out the exception message.


This code brings up an interesting point. What are those labels and goto statements doing in there? These constructs are not necessary in structured programming! However, IL does not have structured programming concepts like loops and if/then statements. Since it is lower-level, il2cpp.exe follows lower-level concepts in generated code.

For example, let’s look at the for loop in the HelloWorld_Start_m3 method:



Here the V_2 variable is the loop index. Is starts off with a value of 0, then is incremented at the bottom of the loop in this line:



The ending condition in the loop is then checked here:

As long as V_2 is less than 3, the goto statement jumps to the IL_00af label, which is the top of the loop body. You might be able to guess that il2cpp.exe is currently generating C++ code directly from IL, without using an intermediate abstract syntax tree representation. If you guessed this, you are correct. You may have also noticed in the Runtime checks section above, some of the generated code looks like this:

Clearly, the L_2 variable is not necessary here. Most C++ compilers can optimize away this additional assignment, but we would like to avoid emitting it at all. We’re currently researching the possibility of using an AST to better understand the IL code and generate better C++ code for cases involving local variables and for loops, among others.


We’ve just scratched the surface of the C++ code generated by the IL2CPP scripting backend for a very simple project. If you haven’t done so already, I encourage you dig into the generated code in your project. As you explore, keep in mind that the generated C++ code will look different in future versions of Unity, as we are constantly working to improve the build and runtime performance of the IL2CPP scripting backend.

By converting IL code to C++, we’ve been able to obtain a nice balance between portable and performant code. We can have many of the nice developer-friendly features of managed code, while still getting the benefits of quality machine code that C++ compiler provides for various platforms.

In future posts, we’ll explore more generated code, including method calls, sharing of method implementations and wrappers for calls to native libraries. But next time we will debug some of the generated code for an iOS 64-bit build using Xcode.

24 replies on “IL2CPP Internals: A Tour of Generated Code”

[…] the second blog post in this series (about generated code), we mentioned that all method definitions are free functions […]

[…] 文章的源地址: […]

Just curious, why haven’t you used AST from the beginning for code generation but? With AST you do have a lot of info about the context.

Looking at this code I’m wondering if this il2cpp is good enough that we don’t need plugins in C++ ?

[…] this post, I’m using Unity 5.0.1p3 on OSX. I’ll use the same example project as in the post about generated code, but this time I’ll build for the iOS target using the IL2CPP scripting backend. As I did in […]

Why don’t you pre-compile the .net core dlls in advance and then link them when compiling our custom C++ code? As I understand this could heavily reduce the compile time. You could at least do this as a fast-compile option so when we don’t care about the file size could test the game on the device quickly, and then when we want to release the game use the more optimized compilation path.

IL2CPP has a long way to go before it reaches maturity but it’s definitely many steps in the right direction. Given how much more work has to be done to get it there, I think it’s about time Unity seriously considered open sourcing the IL2CPP project. I trust that the guys at Unity Tech are very capable of pulling this off, but at this rate, it will be another year and a half (maybe 2) before we have a trusty IL2CPP.exe that can take just about any IL and spit out highly optimised, lean and error free C++ code. Besides I also think it’s a very inefficient use of your software engineering talent, who should be focusing their energies on game engine tech, not a general purpose IL to C++ transpiler. I appreciate all the work that is being done and the weekly patches that are fixing bugs as quickly as possible. But I can’t help but feel like it’s far too big a project for a team as small as yours and it brings back memories of all the man hours that were wasted porting the engine to flash!

Yes !

Please Unity, Open source it, wait until it matures a few years, see how people you could benefit from the community…and in the meanwhile update Mono.

Does/will il2cpp support all CLR 2.0 assemblies, including those compiled from C# 6.0 sources with all that fancy language features like async/await etc?

This was a nice read. Thanks for sharing it, much appreciated! Can hardly wait for the next post :)

I wonder if all the C# optimization knowledge we built up over the years still applies when using Il2Cpp.

For example, in C# it is often beneficial to use ValueType’s rather than ReferenceType’s, or to avoid using an enum to index into a dictionary, or the whole 2d arrays vs jagged arrays topic, or to cache anonymous delegates to avoid memory allocation each time the runtime comes across them and all these things we now do to squeeze out the last bit of performance.

Can forget all these C# specific optimizations when targeting Il2Cpp?

Nice post ! very informative :)

From my experience (and also from looking at how the generated code looks like), it seems that in some cases, it could help to have a dedicated c++ implementation (not a generated one).

You said that you guys didn’t want to implement all the Mono class libraries, but what about the engine itself? do you have the option to “plug in” a c++ implementation instead of generating code ?
(I realize that probably most of the engine *is* native already, but for the managed parts, is this possible ?)

I like the fact that you guys are working on implementing annonations to disable array bounds checks for specified places in the code. Was one of the first things that came to my mind the moment loops were mentioned in the intro of this post. Was glad to find out it was already being worked on.

The tech looks promising, surely keeping my eyes open for more blog posts about this!

This doesn’t seem to me c++, it is more c-style classes. I can understand the way because of simplicity. I wonder you found out generating more cpp files increases compile time drastic. This is in my experience the case related to the count of header include preprocessor directives per cpp file, because the compiler need to recompile for every cpp file. This could be solved with precompiled headers and is the preferred way.
And I don’t really understand the double underscore, it’s really a no go by default.
Very interesting!

Comments are closed.