Search Unity

Hi, I’m Yan and for the past two years I’ve been a Toolsmith at Unity. We have grown quite a lot recently and so have our test suites and the number of unstable tests, slow tests and failures which can not be reproduced locally. In this blog post, I’ll talk about what we’re doing about it, but first let me tell you a little about our automation environment – just to give you a better understanding of what challenges we are dealing with.

At Unity, we have many different kinds of test frameworks (Figure 1) and test suites:

  • Runtime Tests are verifying Unity’s public runtime API on all of Unity’s supported platforms.
  • Integration Tests allow testing things that are not easily expressed as runtime tests – they can test Unity Editor features as well as integration to components like the Cache Server and Bug Reporter.
  • Native C++ Tests are focused on testing native code directly without going through the scripting layer.
  • Graphics Tests are testing rendering features by comparing a resulting image with a reference image, which is considered “correct”.
  • Many others (Performance Tests, Load Tests, IMGUI Tests, etc.).
Figure 1. Testing frameworks at Unity

Figure 1. Testing frameworks at Unity

On the highest level, all tests are grouped in different subsets based on test framework. However, they are further divided based on platform, run frequency, execution time and some other criterias. Those divisions produce an enormous amount of testing points. We’ll discuss these numbers later on.

Having so many frameworks and runners is not easy, so about a year ago we started working on a Unified Test Runner (UTR): a single entry point for running all tests. It serves as a facade (see Figure 2) for all testing runners/frameworks. This enables anyone to run any of our tests suites from command line.

Figure 2. Unified Test Runner Facade

Figure 2. Unified Test Runner Facade

All the artifacts that are produced by a test run are copied into the same place and are grouped and organized according to the same conventions everywhere. UTR also provides other services:

  • tests can be filtered the same way everywhere with -testfilter=TestName
  • execution progress is reported the same way for all the test suites

Initially, UTR was mostly used to run tests locally. Then we switched focus to our Build Farm configurations. We wanted to use the Unified Test Runner there as well. Our goal was to run tests the same way locally and on the build farm. Or in other words: if something failed on the Build Farm – it should be easy to reproduce it locally.

Slowly but surely UTR has become the single entry point which we are using to run tests in Unity. That’s what made it a perfect candidate for another task: collecting test execution data, both from local and Build Farm test runs. Whenever a test run is finished, UTR reports data to the Web service. That is how our test data analytics solution, Hoarder, was born. Hoarder’s responsibility is to collect, store and provide access to test execution data. It can present aggregated statistics with a possibility to drill down to the individual test runs. See Figure 3.

Figure 3. Build agents and humans submit data to Hoarder Web Service. Analytics application fetching it

Figure 3. Build agents and humans submit data to Hoarder Web Service. Analytics application fetching it.

We discovered a lot of interesting things in the data, which led to a few important decisions. I’m going to talk about how we make informed decisions based on this data in the next blog post.

Ya no se aceptan más comentarios.

  1. Dear community,
    I have a question. So, i can’t find information about this issue :
    need automation tests for web enterprise application(unity player in web browser), when part of the data is loaded during rendering scenes (after play button). is it possible in Unity write and run tests at runtime?

    1. contract wars automated tests too much.

  2. Being in software dev industry for +20 years in customer/providers environments, I think one of the problem with QA resides in the fact that you don’t control runtime environment. This means that you can put lot of effort on patterns, good practices and quality controls, but that will cover just 50% of the potential issue surface (related mostly to dev/human errors along dev processes).

    So yes, all of that stuff is incredibly good and should be done and continued. But in order to significantly reduce your bug report count (which is what matters to customers), you should think of:

    – How to extend your quality processes/controls/monitors/detectors to the end-user environment (ie. my development machine as an Unity dev).
    – Even further, how to extend your quality processes/controls/monitors/detectors to our own users environment (ie. my own customers).

    But most important: understand that asset developers (like I) should be considered as part of your development community (being in an outer-layer from our common customers perspectives – gamers and users). What I mean is that as an asset developer I want usage metrics for my assets as you have for Unity in general, I want Unity to alert me when an upgrade breaks my assets functionality for any reason (ok, not daily but as soon as you publish a beta). I can’t handle the testing needed for every beta put out there for all my assets (well, I can, but I prefer to invest my time in providing real value).

    Keep up the good work guys.

  3. When a reported bug is found to be valid, does Unity add an automated test for that bug to prevent regressions?

    1. This is developers decision. Quite often this happens like you’ve described.

      1. nose descarga

  4. Sorry to sound a bit cynical but the latest release just broke input for Mac on standalone. How is that possible with such a great testing environment that a major platform becomes completely broken?

    If only we could downgrade to a previous version, though lightmapping is unusable there when mixing the lightmap between Windows and Mac.

    1. Unfortunately described solution can not prevent us from having gaps in our test coverage. It helps to solve other class of problems. Does issue below describe your problem? If yes – it is fixed in 5.2.2 http://issuetracker.unity3d.com/issues/osx-input-dot-inputstring-does-not-work-on-mac-osx

      1. I do agree with the original poster that I’m surprised that an automated testing suite for a game engine doesn’t have rigorous testing around input handling. It’s an area where bugs have appeared repeatedly (especially on OS X) and it seems like an area that’s well suited to testing.

        1. Step 1. Build automated testing suite.
          Step 2. Create tests.
          Step 3. Users complain that you haven’t created enough tests.
          Step 4. Users complain that they know the internals better than you do.
          Step 5. Bang head on desk.

        2. The problem with input testing is that in order to exercise the code, you have to run through the entire stack, all the way from input device through platform specific code down to the input layers. And many times the problem resides on the outer layers, so you have to somehow initiate the input of the test from a physical device or mimic the physical device on a driver somewhere.

          Full stack integration tests are incredibly expensive to author, maintain and execute. Especially the maintain and execute part is important, because fragile tests are worse than no tests.

          What could be done is to make sure the test coverage on the unittest level is good, but that requires that the architecture of the system is made so it is testable. I know that the devs working on the new Input System are creating it with just that in mind.

        3. @Thomas Petersen : does that mean that input system is not being tested right now and will only be when the new one will be released ?
          QA should play a game built with the release on main platforms as the last integration test :)

      2. Thanks for the reply, i understand software development is a very difficult indeed! Though i’m confident such a large issue will be fixed promptly the fact that it was shipped seems like a pretty gross error and causes us a lot of headache.

    2. I agree, you’d have thought that if you know that you have something that your automated tests don’t cover, that’s as critical as keyboard input, then you might test it manually before releasing? Or at least fix it in one of the subsequent two patch release or point release? Seems like maybe you’re relying on automated tests too much.