Search Unity

Katana: Leveraging Open-Source Tools for Continuous Integration

, 六月 2, 2014

For a few years, Unity has used TeamCity from JetBrains for automated building and testing.  As the R&D team grew here at Unity, the demands on the build infrastructure grew on multiple axes (namely the number of users, the number of changesets, and the number of simultaneous branches).  We reached a point where we needed to accommodate several thousand builds per day, and we started seeing performance problems on multiple fronts: servers became slow to respond, we encountered unexplained errors that we could not fix, new changes were processed very slowly causing delays, webpages took several minutes to load, etc.  After a year of back-and-forth with the makers of TeamCity, and a progressively worsening state of our build infrastructure, I came to the conclusion that the best path forward for us was to switch to a solution that better suits our particular needs (obviously when you have a way of working that is as particular as ours, combined with our scale, extensibility and flexibility are a necessity in any tool . . . and both can be hard to get with off-the-shelf proprietary solutions).  Being a long-time open-source enthusiast, I felt this was a particularly good scenario to leverage the power of open-source to fix our problems.  After some research, I decided that we would build a custom solution on top of Buildbot — an open-source continuous integration framework used by Chromium, Mozilla, Python, and various other projects.  Buildbot is written in Python on top of the Twisted event-driven networking engine.

A Look Back

Phase 1: Prototype and Proof-of-Concept

It was now September of 2012, and, luckily for me, we had just expanded the Build Engineering team at this time with a new hire – Maria – who already had previous experience working with Buildbot.  We knew this would be a large project, so we started with a 2-month long prototyping/proof-of-concept phase where Maria explored various aspects of Buildbot to test its potential to scale in the future while maintaining the flexibility we needed for our complex build chains.  We knew we wanted the ability to decouple as many parts of the build infrastructure as possible to allow for easier maintenance and debugging.

 

katana-prototype.png

An early design diagram for Katana.

Phase 2: Requirements Gathering and Beginning of Implementation

After around two months of prototyping and proof-of-concept work, we were confident the toolset we had chosen would work — with some serious investment.  The next phase of the project involved doing a feature comparison between Buildbot and TeamCity and an initial attempt to gather requirements for a system that could be used in production as a TeamCity replacement.  This part was hard and required some iteration, because we 1) were still learning about all of the capabilities and limitations of Buildbot, and 2) it was hard to figure out which features TeamCity had that were really useful to us and which ones we could live without.   We started with an initial project plan and schedule, which we revised along the way at regular intervals.  At this point, we brought our IT department in to provide estimates on the amount of hardware we would need to acquire to build a fully-functioning system without taking resources away from our production instances.

Phase 3: The Front-end

The version of Buildbot we forked from (0.8.7) does come with a user interface, but coming from TeamCity, it was practically impossible to use, especially with the number of build configurations and number of builds we have.  Performance was of course also a concern; after our previous experiences, we knew the most important thing was that the UI was fast to load — everything else was secondary.  Therefore, we needed someone with UI expertise and a keen eye for design to produce a new UI for us.  We hired a front-end developer — Simon — who was experienced with websites where performance is the main concern.  He was tasked with creating a new Buildbot frontend.

Phase 4: More Implementation

This is where the bulk of feature implementation was done.  At one point during these months we did decide to reassess and extend the project schedule after discovering some significant work was needed on the Buildbot side to handle one of our use-cases, but overall, the project went well.  We ended up needing to do some work in our buildsystem (e.g., work around the fact that Python stores internally, and lists, environment variables all in upper-case) and our test frameworks (e.g., make all tests output a standardized XML file containing test results that we could parse) here and there.

Towards the end of this phase, we transitioned some internal projects (for example, our internal builds of the Mono runtime and classlibs) from TeamCity to Katana.  This allowed us to gain valuable user testing and feedback in a real-world scenario.  We started an internal focus group of users who were using the “Guinea Pig” projects.  From this, we progressed gradually to a more well-rounded feature set.

Phase 5: Production Readiness and Roll-Out

This is where we started counting down the list of to-do items before we could transition the main Unity project.  We use Trello for project management with Katana, and it works very well — in particular towards the end of this project where the team of people working on Katana had grown (by this point we had also added another member to our team — Daniel — who had started working on Katana, and I also had started working on Katana development and overseeing the configuration management).

katana-trello.png

Katana’s Trello Board

We migrated the main project to Katana (which, because it was a manual migration and is a very large project, actually took quite some time) and invited users to use this alongside TeamCity for verifying branches to be merged to trunk.  During this time, we fixed more issues and gained more feedback.  In late January of this year, we switched our mainline to building officially on Katana instead of TeamCity.  We’ve been using it since then, and overall, we are very pleased with the improvements it has brought us.

The Current State

Katana lives in our buildbot fork on GitHub under a GPLv2 license.  We are still actively developing it; just a few weeks ago we deployed a real-time updating solution that uses Autobahn.

Among other things, we have a good overview of our build status on each branch:

katana-1.png

And also an overview of what our buildslaves are doing:

katana-4.png

We can see a detailed breakdown of a build or test process:

katana-2.png

And we have a nice test report to help us when tests fail:

katana-3.png

Katana’s architecture has grown in complexity, but we have been mindful of what elements are important to us.  Katana architecture now looks more like this:

Katana Production.png

 

In general we have seen vast improvements in:

  • Maintainability
  • Flexibility
  • Reliability
  • Performance

Conclusion

Overall, I consider Katana a roaring success — both in terms of the improvements it has brought to R&D at Unity and also as a shining example of how to leverage the power of open-source tools.  We’re proud to be so instrumental in keeping the wheels turning here in R&D at Unity and I hope you all take advantage of build automation in your own studios.

8 replies on “Katana: Leveraging Open-Source Tools for Continuous Integration”

Thanks for this success story !
Sorry if I’m a bit of a troll, here, but I have to ask… Do you think we’ll see a linux Unity 3d Editor anytime soon ? Unity is the only reason I have Windows installed… Thank you !

Very interesting read! Out of curiosity: did you evaluate using Jenkins servers as well? They come with tons of plugins, it’s easy to write own ones etc.

Interesting, especially since Unity is not a CI server company. Most companies just pick up a well established product (commercial or open source).

Regarding the architecture – the Front end is hosted on just 1 server? what’s the load on this server? (e.g: users/requests)

I wonder (as a TeamCity user) why does it get so slow in rendering those pages, as most of the hard work should be done on the build agents anyway.

Didn’t you hear? You can’t be going around calling things “master” and “slave” all willy nilly any longer!
https://github.com/django/django/pull/2692

(just to be clear, I’m kidding. It’s a thread well worth skimming through though, for funzies).

Katana sounds (and looks!) quite marvellous. I hope I get an excuse to use it some time.

Comments are closed.