Search Unity

Last time we learned that virtual method calls are slower than direct calls, and we found out how to tell IL2CPP that a given virtual method call can be converted (devirtualized) into a faster direct method call. But what happens when you must make a virtual method call? Let’s at least make it as fast as possible.

What does it take to make a virtual method call?

A virtual method call is a call that must be resolved at run time. The compiler does not know which method will be called when it compiles the code, so it builds an array of methods (called the virtual table, or vtable) for each class. When someone calls one of those methods, the runtime looks up the proper method in the vtable, and calls it. But what happens when things don’t work out, and there is no virtual method to call in the vtable?

When virtual methods go bad

Let’s look at an extreme example, where the object we use has a type created at run time:

Given these types, we can try this code in Unity (I’m using version 5.3.5):

The details of MakeRuntimeBaseClass are not too important. What really matters is the object it creates has a type (GenericDerivedClass<int>) which is created at run time.

This somewhat odd code is no problem for a Just-in-time (JIT) compiler, where the compilation work happens at runtime. If we run it in the Unity editor, we get:

But the story is quite different with an Ahead-of-time (AOT) compiler. If we run this same code for iOS with IL2CPP, we get this exception:

That type created at runtime (GenericDerivedType<int>) is causing problems for the SayHello virtual method call. Since IL2CPP is an AOT compiler, and there is no source code for the GenericDerivedType<int> type, IL2CPP did not generate an implementation of the SayHello method.

When you call a method that does not exist

To understand what is happening here, we can create an exception breakpoint in Xcode. That breakpoint is triggered inside the il2cpp::vm::Runtime::GetVirtualInvokeData function, where the libil2cpp runtime is attempting to resolve the virtual method to call. That function looks like this:

The first line does the lookup in the vtable that we discussed above. The second checks to see if the virtual method really exists, and throws the managed exception we saw if the method does not exist.

Let’s make this code faster

With only three lines of code here, can we make this any faster? As it turns out, we can! The vtable lookup is necessary, so that has to stay as-is. But what about that if check? Most of the time, the condition will be false (after all, look at the ugly code we needed to use to create a type at runtime and make the condition true). So why should we pay the cost of a branch in the code that we will seldom (or never) take?

Instead, let’s always call a method! When that method is not generated by the AOT compiler, we’ll replace it with a method that throws a managed exception. In Unity 5.5 (currently in closed alpha release), GetVirtualInvokeData looks like this:

IL2CPP now generates a stub method for every different function signature used by any virtual method in the project. If a vtable slot doesn’t have a real method, it gets the proper stub method matching its function signature. In this case, the virtual method we call is:

So the code behaves in the same way, throwing a proper managed exception when the AOT compiler was not able to generate code for a virtual method call. Most importantly though, this behavior now has no cost for the normal case.

How much faster is this?

Now for the bottom line: Does this micro-optimization matter? Yes. Our profiling has shown between 3% – 4% improvement in overall execution time. The improvement varies depending on the number of virtual calls being made and processor architecture. Processors with better branch prediction pay a lower cost for the if check, so they see less benefit when it is removed. Processors that don’t handle branch prediction as well get a larger benefit in performance.

This is actually a common optimization technique for virtual machines, so we’re happy to be able to bring it to IL2CPP as well. It follows the old performance mantra, “executing no code is better than executing some code.”

Next time we’ll explore another micro-optimization, where IL2CPP can avoid executing code altogether if we can prove that it does not matter.

Comments are closed.

  1. Thanks for the post! I think IL2Cpp is an interesting technology and I enjoyed reading any technical information you posted so far.

    When I saw the older GetVirtualInvokeData implementation, my first thought was: why don’t they use an assertion for the error check, that is stripped out in release builds though the preprocessor.

    But your new approach is even better, just patching the vtable to detour the call to another method. Then you don’t even have to pay for that check in debug builds. Well done.

    Looking forward to the next post!

  2. Hello ! Great news unity ! but i am interested about IL2cpp on desktop plataforms. are you guys porting il2cpp to standalone or any news about that ?

    1. We are not currently working on IL2CPP on desktop platforms. It is something we have discussed a good bit internally, but it is not on our roadmap now. We would like to release it at some point in the future though. It will appear on the Unity public roadmap when we are able to get it ready for release.

  3. Processors with a deeper instruction pipeline and better branch prediction pay a lower cost for the if check

    Does having better branch prediction necessitate/imply/require a deeper instruction pipeline? This may be simply a grammar thing but I read this as saying “Processors with a deeper instruction pipeline and [processors with] better branch prediction…” If that’s the case, I’d expect processors with a shallower instruction pipeline to pay less for the if-check…

    If I’m missing something I’d love to better understand what’s going on!

    1. > Does having better branch prediction necessitate/imply/require a deeper instruction pipeline?

      No, not necessarily. Reading this again, I see it is not as clear as I would like it to be. I think your understanding is correct. The main factor really is the branch prediction. I’ll update the post to remove the mention of the instruction pipeline. Thanks!