Search Unity

The scripting virtual machine team at Unity is always looking for ways to make your code run faster. This is the first post in a three part miniseries about a few micro-optimizations performed by the IL2CPP AOT compiler, and how you can take advantage of them. While nothing here will make code run two or three times as fast, these small optimizations can help in important parts of a game, and we hope they give you some insight into how your code is executing.

Modern compilers are excellent at performing many optimizations to improve run time code performance. As developers, we can often help our compilers by making information we know about the code explicit to the compiler. Today we’ll explore one micro-optimization for IL2CPP in some detail, and see how it might improve the performance of your existing code.

Devirtualization

There is no other way to say it, virtual method calls are always more expensive than direct method calls. We’ve been working on some performance improvements in the libil2cpp runtime library to cut back the overhead of virtual method calls (more on this in the next post), but they still require a runtime lookup of some sort. The compiler cannot know which method will be called at run time – or can it?

Devirtualization is a common compiler optimization tactic which changes a virtual method call into a direct method call. A compiler might apply this tactic when it can prove exactly which actual method will be called at compile time. Unfortunately, this fact can often be difficult to prove, as the compiler does not always see the entire code base. But when it is possible, it can make virtual method calls much faster.

The canonical example

As a young developer, I learned about virtual methods with a rather contrived animal example. This code might be familiar to you as well:

Then in Unity (version 5.3.5) we can use these classes to make a small farm:

Here each call to Speak is a virtual method call. Let’s see if we can convince IL2CPP to devirtualize any of these method calls to improve their performance.

Generated C++ code isn’t too bad

One of the features of IL2CPP I like is that it generates C++ code instead of assembly code. Sure, this code doesn’t look like C++ code you would write by hand, but it is much easier to understand than assembly. Let’s see the generated code for the body of that foreach loop:

I’ve removed a bit of the generated code to simplify things. See that ugly call to Invoke? It is going to lookup the proper virtual method in the vtable and then call it. This vtable lookup will be slower than a direct function call, but that is understandable. The Animal could be a Cow or a Pig, or some other derived type.

Let’s look at the generated code for the second call to Debug.LogFormat, which is more like a direct method call:

Even in this case we are still making the virtual method call! IL2CPP is pretty conservative with optimizations, preferring to ensure correctness in most cases. Since it does not do enough whole-program analysis to be sure that this can be a direct call, it opts for the safer (and slower) virtual method call.

Suppose we know that there are no other types of cows on our farm, so no type will ever derive from Cow. If we make this knowledge explicit to the compiler, we can get a better result. Let’s change the class to be defined like this:

The sealed keyword tells the compiler that no one can derive from Cow (sealed could also be used directly on the Speak method). Now IL2CPP will have the confidence to make a direct method call:

The call to Speak here will not be unnecessarily slow, since we’ve been very explicit with the compiler and allowed it to optimize with confidence.

This kind of optimization won’t make your game incredibly faster, but it is a good practice to express any assumptions you have about the code in the code, both for future human readers of that code and for compilers. If you are compiling with IL2CPP, I encourage you to peruse the generated C++ code in your project and see what else you might find!

Next time we’ll discuss why virtual method calls are expensive, and what we are doing to make them faster.

60 replies on “IL2CPP Optimizations: Devirtualization”

Hi, does this optimalization work if I call this.Speak() inside of Cow class?
Does this optimalization work in Mono as well?

A related more complicated example: Does boxing get optimized out with generics in both mono and IL2CPP?

T Convert(U other)
{
return (T)(object)(other);
}

Finally, sealed is used for optimization! As far I know, Microsoft guys had some low-priority plans to use it for optimization in their JIT but never got down to it.

Nice article but some questions to you:
Assume that we have a class derived from Cow

public class FlyingCow : Cow{
public override string Speak() {
return “Moooooooo”;
}
}

and class hierarchy is next:
Animal (base class) -> Cow -> FlyingCow

in this case we need use “sealed” keyword ONLY in “top” class (in my example FlyingCow) and this keyword will not work in Cow class (IL2CPP will generate C++ code with template) RIght ?

If I’m not mistaken, there are 2 possible cases, depending on the specifications of your Flying cow: does it make the same sound than the Cow or not?

1) If they make different sounds (“Moo” vs “Moooooooo”), it’s the case you state. Flying cow derives from Cow, so obviously the Cow class can’t be sealed. You can only seal the Flying cow class, so just as you stated this class will be the only one benefiting from the devirtualization.

2) Now, if they make the same sound (“Moo”) but the difference lies somewhere else in the class, what you can do is: seal the Flying cow class just like before so it still benefits from the entire devirtualization; but also seal the method Speak() on the Cow class (and of course don’t override it in the Flying cow class, because it’s the same) and then it should still be devirtualized.

C++ also offers virtual and override features. Why implementing a “virtual function invoker” instead of simply translating C# to C++ classes? As I understand, in pure C++ the difference then would be just a matter of 4 lines of assembly code against 1. Is that also the case explained in the article?
Thanks! Best regards :)

Hi Josh, thanks for post.

About your example, how about using interface to instead of abstract class/method?

Something like:
IAnimal animal = new Cow();
animal.Speak()?

Is that the same problem for generating c++ code?

Hi Josh,

Do you know have any information on what happens to unused code? I have in my project tons of debug and test code that will never be called in production, but when I check the assemblies (inside the android APK), all that code is still there.

I wonder if the IL2CPP or the platform compiler removes unused code. I expect that less code would make the game launch a bit faster and the package smaller as well.

Cheers.

Apologies for the double post. I thought that something went wrong when I submitted the first comment.

So this optimization only saves you time if you are calling the method from the inherited class, correct? It does not benefit when you call Animal.Speak()?

Any chance we will see IL2CPP on PC standalone builds anytime soon? I know it’s in the Windows Store builds already… what’s the hold up? Our Garbage Collection is causing hitches and I’ve heard IL2CPP will help that a lot. Love the blog post, by the way.

Does this have any effect on classes directly inheriting from MonoBehaviour, like Farm in the example?
I know the usual Unity methods like Start, Awake and Update aren’t overrides of an abstract method, but is there something else thats ‘abstract’ in MonoBehaviour?

Must admit this is a pretty interesting read as I’m very interested in the IL2CPP, although I’m not sure how it entirely works behind the covers. Additional I’ve been wondering for some times why Unity doesn’t simply allow you to make C++ script which can be used like the C#/UnityScritpt/boo scripts native, any wouldn’t this over all be a huge performance gain for skilled C++ programmers?

Is the best way using

foreach (var animal in animals)

or may be

for(var i=0; i<animals.Count; i++)

?

I found these the fastest versions of “for”, since list.Count is evaluated only once; rather than every loop iteration.

for (int i=list.Count-1; i>=0; –i) …
for (int i=0, iend = list.Count; i<iend; ++i) …

Where the decrementing version could run faster on platforms that support the "subs" instruction (eg ARM devices), which means there could be one "cmp" operation less per loop-iteration.

foreach is often considered the slowest method to iterate over a list, because it allocates an enumerator and it calls various methods each iteration (Next and Current).

However, you will never know, if you don't profile your code.

Why can’t you check if class doesn’t have subclasses and then don’t require “sealed” keyword? Linked libraries?

Great Article, please keep more like this coming.
I know you plan to have other optimizations discussed, but can you point to a resource (like in the documentation) that talks more about what we can do to make our coding practices better for cpp?

One day I would love to know the purpose of;

int32_t L_6 = V_3;
int32_t L_7 = L_6;

Just passing int around? Sounds wasteful.

Nice post, however i am not sure what tangible gains will something like this give. Is the cost of virtual vs. direct method call in IL2CPP really that different?

Yes, the difference is quite notable when in a restrictive environment like iOS, DS or PSP.

A VTable lookup can eat quite a few valuable cycle that would be better used somewhere else. I worked on a PSP game where class hierarchy were destroyed and merged on purpose to reduce the number of lookup in code path that were most used per frame.

Grate Thanks. And If is posible, a video tutorial introduction to IL2CPP in the learning live training section?

Pretty neat optimisation. And kinda obvious too, but appreciated nonetheless ;-) . At this point I’m wondering if you’re beginning to see the value in open sourcing IL2CPP. The community can figure out many more optimisations at many times the speed at which you’re currently doing it. Do the cons really outweigh the pros of open sourcing?

Comments are closed.