Search Unity

An introduction to IL2CPP internals

May 6, 2015 in Engine & platform | 10 min. read
Topics covered
Share

Is this article helpful for you?

Thank you for your feedback!

Almost a year ago now, we started to talk about the future of scripting in Unity. The new IL2CPP scripting backend promised to bring a highly-performant, highly-portable virtual machine to Unity. In January, we shipped our first platform using IL2CPP, iOS 64-bit. The Unity 5 release brought another platform, WebGL. Thanks to the input from our tremendous community of users, we have shipped many patch release updates for IL2CPP, steadily improving its compiler and runtime. We have no plans to stop improving IL2CPP, but we thought it might be a good idea to take a step back and tell you a little bit about how IL2CPP works from the inside out. Over the next few months, we’re planning to write about the following topics (and maybe others) in this IL2CPP Internals series of posts:

  1. The basics - toolchain and command line arguments (this post)
  2. A tour of generated code
  3. Debugging tips for generated code
  4. Method calls (normal methods, virtual methods, etc.)
  5. Generic sharing implementation
  6. P/invoke wrappers for types and methods
  7. Garbage collector integration
  8. Testing frameworks and usage

In order to make this series of posts possible, we’re going to discuss some details about the IL2CPP implementation that will surely change in the future. Hopefully we can still provide some useful and interesting information.

What is IL2CPP?

The technology that we refer to as IL2CPP has two distinct parts.

  • An ahead-of-time (AOT) compiler
  • A runtime library to support the virtual machine

The AOT compiler translates Intermediate Language (IL), the low-level output from .NET compilers, to C++ source code. The runtime library provides services and abstractions like a garbage collector, platform-independent access to threads and files, and implementations of internal calls (native code which modifies managed data structures directly).

The AOT compiler

The IL2CPP AOT compiler is named il2cpp.exe. On Windows you can find it in the Editor\Data\il2cpp directory. On OSX it is in the Contents/Frameworks/il2cpp/build directory in the Unity installation.

The il2cpp.exe utility is a managed executable, written entirely in C#. We compile it with both .NET and Mono compilers during our development of IL2CPP. The il2cpp.exe utility accepts managed assemblies compiled with the Mono compiler that ships with Unity and generates C++ code which we pass on to a platform-specific C++ compiler.

You can think about the IL2CPP toolchain like this:

il2cpp toolchain smaller

The runtime library

The other part of the IL2CPP technology is a runtime library to support the virtual machine. We have implemented this library using almost entirely C++ code (it has a little bit of platform-specific assembly code, but let’s keep that between the two of us). We call the runtime library libil2cpp, and it is shipped as a static library linked into the player executable. One of the key benefits of the IL2CPP technology is this simple and portable runtime library.

You can find some clues about how the libil2cpp code is organized by looking at the header files for libil2cpp we ship with Unity (you’ll find them in the  Editor\Data\PlaybackEngines\webglsupport\BuildTools\Libraries\libil2cpp\include directory on Windows, or the Contents/Frameworks/il2cpp/libil2cpp directory on OSX). For example, the interface between the C++ code generated by il2cpp.exe and the libil2cpp runtime is located in the codegen/il2cpp-codegen.h header file.

One key part of the runtime is the garbage collector. We’re shipping Unity 5 with libgc, the Boehm-Demers-Weiser garbage collector. However, libil2cpp has been designed to allow us to use other garbage collectors. For example, we are researching an integration of the Microsoft GC which was open-sourced as part of the CoreCLR. We’ll have more to say about this in our post about garbage collector integration later in the series.

How is il2cpp.exe executed?

Let’s take a look at an example. I’ll be using Unity 5.0.1 on Windows, and I’ll start with a new, empty project. So that we have at least one user script to convert, I’ll add this simple MonoBehaviour component to the Main Camera game object:


using UnityEngine;

public class HelloWorld : MonoBehaviour {
void Start () {
Debug.Log("Hello, IL2CPP!");
}
}

When I build for the WebGL platform, I can use Process Explorer to see the command line Unity used to run il2cpp.exe:


"C:\Program Files\Unity\Editor\Data\MonoBleedingEdge\bin\mono.exe" "C:\Program Files\Unity\Editor\Data\il2cpp/il2cpp.exe" --copy-level=None --enable-generic-sharing --enable-unity-event-support --output-format=Compact --extra-types.file="C:\Program Files\Unity\Editor\Data\il2cpp\il2cpp_default_extra_types.txt" "C:\Users\Josh Peterson\Documents\IL2CPP Blog Example\Temp\StagingArea\Data\Managed\Assembly-CSharp.dll" "C:\Users\Josh Peterson\Documents\IL2CPP Blog Example\Temp\StagingArea\Data\Managed\UnityEngine.UI.dll" "C:\Users\Josh Peterson\Documents\IL2CPP Blog Example\Temp\StagingArea\Data\il2cppOutput"

That command line is pretty long and horrible, so let’s unpack it. First, Unity is running this executable:


"C:\Program Files\Unity\Editor\Data\MonoBleedingEdge\bin\mono.exe"

The next argument on the command line is the il2cpp.exe utility itself.


"C:\Program Files\Unity\Editor\Data\il2cpp/il2cpp.exe"

The remaining command line arguments are passed to il2cpp.exe, not mono.exe. Let’s look at them. First, Unity passes five flags to il2cpp.exe:

  • --copy-level=None
    • Specify that il2cpp.exe should not perform an special file copies of the generated C++ code.
  • --enable-generic-sharing
    • This is a code and binary size reduction feature. IL2CPP will share the implementation of generic methods when it can.
  • --enable-unity-event-support
    • Special support to ensure that code for Unity events, which are accessed via reflection, is correctly generated.
  • --output-format=Compact
    • Generate C++ code in a format that requires fewer characters for type and method names. This code is difficult to debug, since the names in the IL code are not preserved, but it often compiles faster, since there is less code for the C++ compiler to parse.
  • --extra-types.file="C:\Program Files\Unity\Editor\Data\il2cpp\il2cpp_default_extra_types.txt"
    • Use the default (and empty) extra types file. This file can be added in a Unity project to let il2cpp.exe know which generic or array types will be created at runtime, but are not present in the IL code.

It is important to note that these command line arguments can and will be changed in later releases. We’re not at a point yet where we have a stable and supported set of command line arguments for il2cpp.exe. Finally, we have a list of two files and one directory on the command line:

  • "C:\Users\Josh Peterson\Documents\IL2CPP Blog Example\Temp\StagingArea\Data\Managed\Assembly-CSharp.dll"
  • "C:\Users\Josh Peterson\Documents\IL2CPP Blog Example\Temp\StagingArea\Data\Managed\UnityEngine.UI.dll"
  • "C:\Users\Josh Peterson\Documents\IL2CPP Blog Example\Temp\StagingArea\Data\il2cppOutput"

The il2cpp.exe utility accepts a list of all of the IL assemblies it should convert. In this case they are the assembly containing my simple MonoBehaviour, Assembly-CSharp.dll, and the GUI assembly, UnityEngine.UI.dll. Note that there are a few conspicuously missing assembles here. Clearly, my script references UnityEngine.dll, and that references at least mscorlib.dll, and maybe other assemblies. Where are they? Actually, il2cpp.exe resolves those assemblies internally. They can be mentioned on the command line, but they are not necessary. Unity only needs to mention the root assemblies (those which are not referenced by any other assembly) explicitly.

The last argument on the il2cpp.exe command line is the directory where the output C++ files should be created. If you are curious, have a look at the generated files in that directory, they will be the subject of the next post in this series. Before you do though, you might want to choose the “Development Player” option in the WebGL build settings. That will remove the --output-format=Compact command line argument and give you better type and method names in the generated C++ code.

Try changing various options in the WebGL or iOS Player Settings. You should be able to see different command line options passed to il2cpp.exe to enable different code generation steps. For example, changing the “Enable Exceptions” setting in the WebGL Player Settings to a value of “Full” adds the --emit-null-checks, --enable-stacktrace, and --enable-array-bounds-check arguments to the il2cpp.exe command line.

What does IL2CPP not do?

I’d like to point out one of the challenges that we did not take on with IL2CPP, and we could not be happier that we ignored it. We did not attempt to re-write the C# standard library with IL2CPP. When you build a Unity project which uses the IL2CPP scripting backend, all of the C# standard library code in mscorlib.dll, System.dll, etc. is the exact same code used for the Mono scripting backend.

We rely on C# standard library code that is already well-known by users and well-tested in Unity projects. So when we investigate a bug related to IL2CPP, we can be fairly confident that the bug is in either the AOT compiler or the runtime library, and nowhere else.

How we develop, test, and ship IL2CPP

Since the initial public release of IL2CPP at version 4.6.1p5 in January, we’ve shipped 6 full releases and 7 patch releases (across versions 4.6 and 5.0 of Unity). We have corrected more than 100 bugs mentioned in the release notes.

In order to make this continuous improvement happen, we develop against only one version of the IL2CPP code internally, which sits on the bleeding edge of the trunk branch in Unity used to ship alpha and beta releases. Just before each release, we port the IL2CPP changes to the specific release branch, run our tests, and verify all of the bugs we fixed are corrected in that version. Our QA and Sustained Engineering teams have done incredible work to make delivery at this rate possible. This means that our users are never more than about one week away from the latest fixes for IL2CPP bugs.

Our user community has proven invaluable by submitting many high quality bug reports. We appreciate all the feedback from our users to help continually improve IL2CPP, and we look forward to more of it.

The development team working on IL2CPP has a strong test-first mentality. We often employee Test Driven Design practices, and seldom merge a pull request without good tests. This strategy works well for a technology like IL2CPP, where we have clear inputs and outputs. It means that the vast majority of the bugs we see are not unexpected behavior, but rather unexpected cases (e.g. it is possible to use an 64-bit IntPtr as a 32-bit array index, causing clang to fail with a C++ compiler error, and real code actually does this!). That difference allows us to fix bugs quickly with a high degree of confidence.

With the help of our community, we’re working hard to make IL2CPP as stable and fast as possible. By the way, if any of this excites you, we’re hiring (just sayin’).

More to come

I fear that I’ve spent too much time here teasing future blog posts. We have a lot to say, and it simply won’t all fit in one post. Next time, we’ll dig into the code generated by il2cpp.exe to see how your project actually looks to the C++ compiler.

May 6, 2015 in Engine & platform | 10 min. read

Is this article helpful for you?

Thank you for your feedback!

Topics covered
Related Posts