Search Unity

There are dozens of platforms you can deploy to with Unity. It can be difficult for us developers at Unity to maintain visual integrity on all of them. Here is a quick peek into how we ensure graphics features don’t get unintentionally broken.

We have a lot of automated tests. There are unit tests, integration tests, system tests inside Unity itself in the form of Editor and Playmode tests and finally we have Graphics tests. Graphics tests set up a scene with specific graphical features turned on or off, builds that scene, runs it on all supported devices and finally renders the output as an image.

The resulting image is then compared with a previously approved reference image for that scene / graphics settings and device. Should any of the resulting images differ from the reference images, we then flag the test as failed and someone needs to manually verify if the fail is the result of some intentional or unintentional change that needs to get fixed.

Since it’s not always easy to spot changes from the reference image to the resulting test image (see the example below), we also provide the failed test with a diff image.

Figure 1. From left to right, the reference image, test result image and finally the diff image.

What makes graphics tests a bit more difficult to work with compared to normal tests is that they are brittle. Different platforms, device models and graphics cards will produce slightly different results. So in order to get consistent results from graphics tests, they must be executed on the test farm where we are sure the hardware remains same. This means that the workflow for updating tests or adding a new one is a bit convoluted, the developer has to:

  1. Make and push his changes
  2. Run the graphics tests on all appropriate devices
  3. Wait the tests to complete and fail
  4. Download the failed reference images from each of the builds
  5. Compare each reference image with the resulting image to ensure that the changes made are the expected changes
  6. Copy all the new images that need to be updated into the graphics tests repository
  7. Commit and push the changes to the graphics tests repository
  8. Run the graphics tests again

This entire process can be very time consuming, so to help make the process a bit easier, we made a small Polymer application with an Core backend that queries our build statistics system Hoarder and finds all the graphics test on a specific revision. Then it downloads the graphics tests artifacts from the build system for each of the builds and presents the results on a single web page.

The developer can then see the failed tests and compare them with the reference images and diff images. However, the changes between two images aren’t always easy to spot, see the two images below:

So the tool allows the developer to toggle between the test image and reference image and/or the diff image and can now quickly see the changes in the image. This is helpful since it’s not always easy to spot the changes until you can swap back and forth between the two images.

The developer can then select the tests he/she wants to update and finally get a command line to automatically download and update the selected images into his or hers graphics tests repository or manually download a combined zip file with the correct directory structure and copy them manually to their graphics tests repository.

With over 13’700 graphics test distributed among 33 build configurations, and several updates every day to the graphics repository, this tool helps to make a developer’s life a bit better and it reduces some of the manual overhead when working with graphics tests.

Ya no se aceptan más comentarios.

  1. Have you considered hooking in to something like Applitools?

    They do some nice machine learning to allow for minute changes due to minor rendering differences and have interfaces for marking ignore regions, detecting layout changes etc.

  2. by default Unity.GraphicsTestRunner.exe runs in DX9 mode,how to run dx11featurelevel ? what is value need to set for «–dx11featurelevel – Set to force feature level for DX11 »
    similarly how or run OGL mode ?

    1. You properly want to set the «-configuration» parameter instead; you can set it to glcore, glcore43, d3d11, d3d9 and so on.

      The «-dx11featurelevel» parameters are listed as featurelevelX_Y, for example featurelevel9_0 and is only for forcing dx11.

  3. Is everything in the rendering completely deterministic ? I thought that there is always slight differences between every rendering, and so the tests would always fail.
    See that makes me wonder if Physic engine companies also run this sort of test for their physic simulations.

    1. No they are not entirely deterministic, each platform (and some test combinations) has a configurable allowed delta for catching those slight differences.

      1. Just curious, what causes it to not be deterministic? Just a part of how floating point calculations work?

        1. Yes, take for example a simple ting like addition of floating point numbers, that may yield different result depending the order they are added and the compiler may reorder additions depending on what currently are in the registers.

        2. There are also more odd areas such as std::sort will sort identical values different per platform which will affect things such as particle sorting. We now have our own sorting methods that don’t suffer from this.

  4. This would be extremely useful for asset store developers as we don’t necessarily have the time or all platforms test on. I hope some version of this will be made available to us soon (maybe as a Unity service?)

  5. Is this something you could consider open sourcing?

    1. In it current form, no. It has a lot of integration to other systems which are currently also closed source, such as our build statistics Hoarder. But I have considered making the polymer components open source so it would be easy to setup/build something similar.

    2. We do provide some of our test suites here