Cache Server 6.0 Release and Retrospective: Optimizing Import
Cache Server makes creating with Unity faster by optimizing the asset-import processes either on your local machine or a dedicated server for teams working in a local area network. Version 6.0 of the remote Cache Server is now available and is a culmination of a six-month work on improving the quality and performance of the Cache Server. In fact, the improvement is so big that we decided a blog post is in order. Download Remote Cache Server now on GitHub, and read on to learn more.
What problem does Cache Server solve?
When using the Unity Editor, every time an asset is added or modified in a project, an import process is triggered. As your Unity team gets larger, this creates two issues. First, there will be more assets in the project, and second, the assets will change more frequently.
Ultimately, you and your teammates lose valuable development time while the editor calculates and imports project changes. The problem is further compounded for multiplatform projects. When switching target platforms, all platform-dependant assets in the project (e.g. textures, audio) go through the conversion process again, which can take hours for large projects.
To speed up this process, the Asset Cache Server can be deployed on a local system for individual use, or in a LAN/WAN environment for teams collaborating on one or more projects. The Unity Editor uses the Asset Cache Server to store and retrieve multiple platform asset representations while amortizing the cost of asset imports across your entire team.
What were our goals with v6.0?
In the recent past, we heard from many developers who were experiencing poor performance, such as hanging in the Editor, and other problems when using the Cache Server. It was becoming apparent that the Cache Server architecture was strained beyond its original design, and needed some focused attention to bring it up to speed with the current Unity architecture and quality standards.
In the rest of this blog post, I’ll recap our journey through the process of benchmarking performance, identifying bottlenecks, and making target fixes to deliver improvements to our developers as quickly as possible.
Because we don’t have a large team working on a big game inside of Unity, we had to simulate a real-world scenario to help us quickly identify performance problems that only manifest at large scale. Our test environment consisted of:
- A synthetic test client written in Python to simulate massive traffic with configurable PUT/GET sizes and concurrent connection counts
- Real Unity Editor testing on a mix of Mac OS and Windows and using a combination of the Python client plus real Unity Editor clients to simulate load
- A mix of demo projects (specifically the Adam interior and exterior environment projects) and very large customer projects in our support library
We tested the Cache Server running both for a single Unity Editor connected to a Cache Server running on the same system, and as a hosted server on a gigabit ethernet LAN.
For the hosted server, we tested the following scenarios:
- Two Unity Editor clients connected via ethernet
- Two Unity Editor clients connected via WiFi
- A large number of clients (a dozen or more), which consisted of a mix of synthetic Python script clients plus real Unity Editor clients
The results of our tests were immediately telling. The single Unity Editor client connecting to a local server showed no discernable problems, with good overall performance. The two-client case on LAN also performed well. Things started going bad with the WiFi-connected clients and became almost unusable in the large-scale synthetic test. Server lock-ups, client disconnects, and other failure scenarios manifested quickly and frequently. We needed to discover where the bottlenecks were.
The Cache Server is a Node.JS server application, which has been proven to be a very scalable platform for I/O centric applications — certainly well beyond the dozen concurrent clients that we tested here. So what could be the problem?
A few obvious things jumped out:
- The code lacked any kind of automated test suite. Without tests, it was going to be impossible to safely make changes to test performance improvements. Furthermore, there could be hidden bugs contributing to some of the problems we observed.
- Lots of synchronous file I/O calls. Because Node.JS is a single-threaded, synchronous file system, calls or CPU intensive tasks will quickly degrade the performance of the entire server.
- The method for freeing space when the total cache size exceeded the limit was very expensive and would trigger a full directory walk with every new file written to the cache.
- The protocol handling and file system cache code were all intertwined, making it difficult to isolate systems for targeted optimizations.
Performance improvement experiments
Before committing to any major course of action, we hypothesized on a handful of optimization strategies. We then did some quick implementations to measure their impact.
- Modernization: We did a quick pass to remove as many synchronous calls as possible, and modernized some library usage.
- Clustering: We implemented Node.js clustering, which forks the server into a configurable number of discrete processes.
- Buffering: We attempted to eliminate socket stalls in uploading from server to client by buffering writes to the socket.
- “Cache-Cache”: We implemented an in-memory cache of small (configurable, but targeted < 64KB) cache items to reduce file system I/O.
The results and observations of these tests were as follows:
- Modernization + Clustering: We observed far greater stability under heavy test load. Unable to crash or freeze, there were no client disconnects related to timeouts.
- Buffering: Results were mixed. Synthetic tests showed modest improvement, but real client tests were inconclusive. In Cache Server v6.0, we are taking full advantage of Node.JS stream architecture to ensure that the flow of data between client and server is as efficient as possible.
- “Cache-Cache:” This optimization showed huge gains on synthetic tests, and modest but measurable gains in real client testing. This difference had to do with some synthetic tests using small, uniform asset sizes, which were very conducive to this optimization strategy. Real clients tend to have a wide range of unsorted asset sizes, resulting in a much higher “miss” rate.
Phase 1 Improvements – v5.4
Based on our experimentation, we decided on a two-phase course of action. For phase 1, we would target the low-hanging fruit and get the code in shape for future improvements.
- In order to facilitate rapid bug fix deployment, faster release cycles and direct community contributions, development was moved to GitHub and the code was open sourced under the Apache 2.0 license.
- We implemented a full test suite, which surfaced a few critical bugs.
- Node.JS Clustering support was added, and all file I/O operations were isolated to the main worker. This dramatically improved stability and performance under high load.
- A little bit of refactoring was done to separate protocol and file system functions in order to make maintenance easier going forward.
Version 5.4 of the Cache Server was released last September with these changes, only two weeks after our initial performance investigation. Additionally, the bug fixes uncovered in this process were backported to the Cache Server distributed with the Unity Editor, back to version 2017.2.
Since version 5.4 has been released, we’ve received a lot of positive feedback from developers who were previously experiencing problems with the Cache Server.
Phase 2 Improvements – v6.0
Since the release of v5.4, we’ve been rebuilding the Cache Server from the ground up to lay a better foundation for maintainability and future enhancements. We also wanted to provide a high-end solution for the most demanding enterprise environments that currently deploy and maintain their own custom-build Cache Server solution.
Version 6.0 brings improved reliability and performance, as well as a host of new features, including:
- A high performance, fully in-memory (RAM) cache module, in addition to the standard file system backed cache module
- Transaction mirroring, for automatically synchronizing changes to one or more downstream cache servers
- A project import tool you can use to quickly seed a cache server from an existing fully imported Unity project