Search Unity

First bug ever found. Taped into log for evidence.

GDC 2014 had another such trophy-worthy hunting safari in store for us.  We were five days away from presenting Unity 5 to the world when we «spotted» (well, it was kinda hard to miss) an ugly little elephant of a bug: our shiny new 64-bit editor was randomly crashing on OSX to the point of being completely unusable.  There’s just nothing like being up on stage to showcase how awesome your bug reporter is every couple minutes.

So, Levi, Jonathan and I dropped all the awesome stuff we’re working on (more stories we want to bore our grandkids with) and went stalking.  All we knew at that point was that it crashed somewhere in the native code that Mono generates at run-time.

As every programmer knows, when you’re faced with a bug that isn’t obvious, you simply start by gathering evidence.  Once you’ve learned enough about the bug’s behavioral patterns, you’ll eventually get a shot at it.  And with the clock ticking, we were ready to shoot at pretty much anything.

But we were stumped.  For an elephant, the bug turned out to be surprisingly agile and sneaky.

It seemed to happen only on OSX 10.9 although Kim saw something that looked markedly similar on Windows with his heavy duty memory debugger branch.  And if you enabled Guard Malloc on earlier versions of OSX, you got what looked fairly similar as well.  However, as it was crashing in random script code at arbitrary depths in the call hierarchy, it was difficult to say with certainty what was the same crash and what wasn’t.  And the crash could be consistent for ten consecutive runs only to be totally different for the next five.

So while Kim and I waded knee-high through memory and thigh-high through assembly code, Levi ran an extensive trace on all of Mono’s secret and not so secret activities to generate a gigabyte log and an editor that ran at the speed of my grandma.  This yielded the first interesting insight: apparently we were always compiling the method we crashed in right before things got ugly.

But what made it crash?  The immediate cause was that we were trying to execute code from an invalid address.  How did we get there?  A bug in Mono’s signal handling where we don’t resume properly?  A bug in Mono’s JIT compiler that won’t jump back properly to the compiled code?  A different thread corrupting stack memory on the main thread?  Fairies and grumkins? (for a bit, the latter seemed the most likely).

After two days of hunting, the elephant was still well alive and out and about.

So, Saturday night I equipped myself with a notebook, four different colored pens and an ample supply of beer from our trademark Unity fridge (carefully making sure I don’t touch the awful canned Christmas beer we still have stuck in its crevices ).  Then I spun up Unity instances until I had four different crashes frozen in the debugger, labeled them «Red Crash», «Blue Crash», «Green Crash», and «Black Crash» and went to work with my respectively colored pens to take notes and draw some not-so-pretty diagrams of everything I found.

Here’s my notes for Blue Crash:


And that’s when I made my first discovery: in every case, the stack was 16 bytes larger than it should be!

That then led to the next discovery: for all crashes, looking at those extra 16 bytes turned up a return address back into the function we crashed in.  From a trace it was clear that in all cases we already had executed some calls from the same method, and at first I thought the address was from the last call we had traced.  However, closer inspection revealed that it was actually the return address for a call whose method had not been compiled yet!

This puzzled me for a moment as in some cases there were several calls in-between the last traced method and this call that hadn’t been compiled yet either.  Looking closer, however, revealed that we always had jumped around them.

So, then I looked at that function we apparently were supposed to return from…


And there we have it (highlighted in blue):  We were jumping in the wrong direction!

What Mono does here is create little «trampoline» functions that contain only a call to the JIT compiler and some data encoded into the instruction stream after the call (used by the JIT compiler to know which method to compile).  Once the JIT compiler has done its work, it will delete those trampolines and erase every trace of having hooked into the method call.

However, the call instruction you see there is what is called a «near call» which incidentally uses a signed 32-bit offset to jump relative to the next instruction.

And since a signed 32-bit number can reach only 2GB up and down and we’re running 64-bit here, we suddenly knew why heap memory layout played such a crucial role in reproducing the bug: once Mono’s trampolines were further than 2GB away from the JIT compiler, offsets wouldn’t fit anymore into 32-bit and would get truncated when emitting the call instruction.

At that point, Jonathan quickly pinpointed the right fix and by the time his Sunday was over, we had a stable working build ready in time for GDC.

You all know the history from there.  We successfully demoed Unity 5 at GDC 2014 to rave reviews and after launch, it quickly became the most beloved piece of software ever.  Oh wait, that bit is yet to come…

Before that launch, there’s a whole lot more black and blue crashes to fix :).

20 replies on “On Hunting the Uncommon Elephant”

«One nice thing though about being that deep down is that there’s pretty much nothing underneath you anymore. There’s no libraries, no system calls, no layers of abstraction. Just the machine and raw streams of code. That does have its own appeal.

But yeah, comfortable is something else :)»

That’s Deep!

Unity does the hard work, so we don’t have to :-D .
Thanks Guys and continue the good work!

«Once you’ve learned enough about the bug’s behavioral patterns, you’ll eventually get a shot at it. And with the clock ticking, we were ready to shoot at pretty much anything.»

That really made me laugh out loud :D Nice job on that bug, it really was a nasty one. I personally wouldn’t want to go «down» that deep, my «comfort zone» ends right where the Mono compiler picks up its work ;)

Very nice hunting there! No calls of «Er guys! We’re gonna need a bigger elephant gun!» either.

i follow site every day every 5 min :)
any news from Unity make me happy for my black life :D

So I’m a bit confused, or rather ignorant, on the Mono compiler and target app on osX – is Mono compiler 64 bit and the app you wanted to compile was 64 bit or did you want to compile an app to 32 bit? Or was the Mono JIT code that was crashing happening in the editor for editor function and so on a 64 bit platform it will be 64 bit JIT function being compiled? Or is it dependent on which target publishing platform you currently have selected and whether it is 32 bit or 64 bit?

So you said, this function of the Mono compiler that was returning the address was always returning 32 bit address, when it should have returned 64 bit address but in what situation? Or in all situations when using the 64 bit editor?

Will Unity now have 4 downloads for the editor? 32 bit & 64 bit for Windows & osX?

Ahh, the joys of debugging assembly code. No, wait, scratch that.
Ahh, the joys of debugging C++ code. No, wait, scratch that.
Ahh, the joys of developing in Unity with C#. Yes, much better. We’ve had a good run, you and I, but I don’t miss you at all, C++. :)

Comments are closed.