Search Unity

Share

Is this article helpful for you?

Thank you for your feedback!

Every Summer, we recruit interns to help us deliver on our mission to empower Unity developers with AI and Machine Learning tools and services. Last summer, the AI@Unity group had four fantastic interns working on impactful projects: Aryan Mann, PSankalp Patro, Ruo-Ping Dong, and Jay Patel. Read on to find out more about their experiences and achievements interning at Unity.

As Unity grows, our internship program grows as well. For 2020, the size of our internship program will increase by 40% globally. The AI@Unity organization has tripled the number of intern positions for 2020, and we are hiring in San Francisco and Bellevue for a total of 19 open positions, ranging from software development to machine learning research. If you are interested in our 2020 internship program, please apply here.

Aryan Mann: Game Simulations SDK

The Game Simulations team aspires to support game developers throughout the game creation phase by enabling them to leverage large-scale game simulations to test, validate and optimize their games. This is intended to augment traditional human playtesting where developers can analyze their game by launching a large number of parallel simulations and analyzing the resulting data. The simulations and resulting analysis can be used to test the game (ensure there are no crashes), answer critical design questions (is the game balanced?) and even optimize the game (find the game settings that achieve the desired balance). The Game Simulations team builds both an SDK and cloud service. The SDK enables developers to instrument the right data/metrics that they wish to track, while the cloud service enables running game builds (instrumented with the SDK) at unprecedented scale and process/analyze the resulting data.

Problem: Playtesting takes a lot of time

Playtesting is a process of assessing and validating game design. Before game studios release their game to thousands, even millions of players, they will often run it by a small group of people who aim to examine various systems that dictate gameplay and give feedback. The scale and timing of the playtest are not limited to polishing the game after it is developed, rather it is a continuous process which exists right from the prototype up until the release of the game. Currently, playtesting for games is a very manual process where studios hire people to play the game and fill out surveys about their gameplay experience.  Observing hours of gameplay, analyzing it, and getting feedback can be tedious and downright impractical.

To help explore the value of running game simulations for automated playtesting, I worked closely with Illogika, a veteran studio that has created AAA experiences such as Lara Croft GO and Cuphead. They are currently developing a new racing game called Rogue Racers (pictured above), which blends traditional infinite runner gameplay with competitive arcade-style mechanics such as spells and power-ups. They had a few design questions and wanted to see if the Game Simulations service could help answer those.

Solution: Automated playtesting with simulations

Illogika envisions the game to be competitive and requires a large degree of skill to be competitive, yet be forgiving to new players. As such, they initially wanted the maximum difference in completion times between two equally skilled players to be five seconds. To help answer this question, I set up a Playtest where two bots competed against each other on a specific map, and I used the Game Simulations SDK to track two metrics: 

  • Completion Time” tracked how many seconds it would take for the first bot to cross the finish line.
  • Completion Difference” recorded the difference in completion time between the first and the second bot in seconds. 

We then performed 1000’s of simulations of the game across the different skills settings of the bots to analyze these two metrics. When looking at the data in the graph below, we found that across a similar bot skill level, the “Completion Difference” was sometimes higher than the five seconds Illogika wanted.

While this data is enough to answer the design question, Illogika demonstrated how the data we generated could provide additional insights that failed to catch our eye. They noticed that the average “Completion Difference”, which can be seen in the red line above, were merely two seconds. We even found that bots of different skill levels were still close to two seconds apart. This meant that the game does not value skill as much as Illogika wanted. They hypothesized that this was due to their emergency boost system being too proactive and powerful. When a player would fall behind, they would get a speed boost to catch up, which in its current configuration, made the gameplay require less skill. With this keen insight, Illogika reworked their emergency boost systems to provide comeback opportunities while still enabling more skilled players to thrive. 

From here, we wanted to explore the game settings that would best achieve the design goals that Illogika had in mind. To support this, we expanded our simulations to try out a large number of combinations of game parameters to help Illogika understand how the two metrics above change with three specific game parameters. An evolution of these experiments was presented at the Unite Copenhagen Keynote. Additionally, I helped evolve the Game Simulations SDK to support time-series metrics. This is helpful for validating a game’s economy and understanding how a player's account balance (e.g. points, coins) evolve as they play a large number of sessions.

PSankalp Patro: Training adaptable agents

The ML-Agents Toolkit is an open-source project that aims to enable developers to leverage Deep Reinforcement Learning (DRL) to train playable and non-playable characters. By simply instrumenting the characters inputs (how it perceives the environment), actions (what decisions it can take) and rewards (a signal for achieving a desired behavior), developers train a character or game entity to learn a desired behavior, as a byproduct of repeated interaction between the character and the environment (the world in which the character resides). From each interaction, the environment sends a reward signal to the character. The character then tries to learn the behavior that awards it the maximum rewards over time.

Problem: The pitfall of overfitting

Consider the in-house pet Puppo trained using DRL (if you’re not familiar with Puppo, check out this blog post). The ML-Agents toolkit enabled us to teach Puppo to fetch on a flat garden. Through repeated trials of throwing the stick, Puppo learned to walk and to fetch the stick guided by the reward signals it received by the environment every time it retrieved the stick.

But what happens if we train Puppo to play fetch on a garden with a rough terrain? The previous ML-Agents setup would only allow us to train Puppo on a single fixed terrain. But when we play fetch on a different terrain, there is a drop in performance and Puppo often gets stuck.

This is a common pitfall in deep reinforcement learning termed as overfitting. Overfitting reduces the reliability, flexibility, and usability of our trained characters to perform at testing time. This poses a serious hindrance to developers trying to train their characters as they may display undesirable behavior when the environments are even slightly modified.

Solution: Generalized Training

The project that I worked on for the Summer aims to mitigate overfitting by training characters, Puppo, in this instance, to learn the task over multiple variations of the training environment, as opposed to the single fixed environment. This allows characters to ignore trivial aspects that do not affect the task at hand. My project alters the conventional training protocol by introducing an additional step in the training pipeline: the periodic modification of the environment (e.g. the roughness of the terrain that Puppo plays in).

Let's look at the performance of Puppo, who is now trained over multiple terrains. The terrain used for testing here is identical to the terrain used to test the Puppo in the earlier setup. As the newly trained Puppo traverses the terrain, we can clearly see how much quicker it is able to fetch the stick. In addition, it doesn’t get stuck as often either. It seems like Puppo has learned to play fetch better! 

The new training procedure is particularly helpful when training characters for tasks with dynamic environments. Agents that can generalize better do not need to be retrained as often when the environment changes during game development. Overfitting is a highly sought research area in the field of Reinforcement Learning (as well as the wider field of Machine Learning). To learn more about overfitting and progress made to address the issue, check out the following research paper, which formed the basis of the project: Assessing Generalization in Deep Reinforcement Learning. Visit the ML-Agent toolkit documentation on Generalized Training for a detailed description of how to use this new training option.

Ruo-Ping Dong: Speeding up ML-Agents training

At the start of the Summer, the ML-Agents Toolkit could only be used with CPU or single GPU training. The training for some complex games may take a long time since they are data-intensive (large batch sizes) and may use more complex neural networks. For my project, I wanted to understand the impact of GPU use on training performance and speed up some of our slower environments with multi-GPU training.

Problem: Training models takes a lot of time

Training a reinforcement learning algorithm takes a lot of time. Time that is mostly spent either simulating -- running the Unity game to collect data -- or updating the model using said collected data. In a previous release of ML-Agents, we improved the former by providing a mechanism to launch multiple Unity environments in parallel (on a single machine). This project addresses the latter by providing the ability to leverage multiple GPUs during the model update phase.

Solution: Leveraging multiple GPUs for training

We replaced the original Proximal Policy Optimization (PPO) algorithm with a new algorithm which creates one copy of the model for each GPU. All of the models share the same neural network weights. When there is enough data to perform an update, each GPU processes a subset of the training batch in parallel. The multi-GPU policy will then aggregate and average the gradients from all GPUs apply the updated weights to all models.

We tested the effect of multi-GPU training by measuring the update time during a training run of the Obstacle Tower environment.  We tested using three separate models available via ML-Agents: a small "Simple" Convolutional Neural Network (CNN), the "Nature" CNN described in Mnih et. al., and the ResNet described in Espeholt et. al.  While multiple GPUs had a minimal impact on the performance for the smaller models, we can see a substantial improvement in performance for the larger ResNet model.

Data pipeline optimization

Looking closer into the update time, we noticed that ML-Agents was spending a substantial amount of time feeding the data into the graph, including pulling stored data from the training buffer, transforming input data into TensorFlow tensors, moving data onto GPU devices, etc. We wanted to improve this processing time by preparing the data for subsequent update batches while in parallel performing optimization on the current batch. 

We implemented this by adapting our trainer to use the Tensorflow Dataset API, which takes care of all data pipeline operations, including batching, shuffling, repeating, and prefetching. The experimental results showed a 20-40% improvement in update time for both CPU and GPU training.

Jay Patel: Exploring image generation and design

Content creation is a broad and important component of game development, one in which machine learning may play an increasing role in the near future. In particular, the ability of machine learning algorithms to generate novel 2D images has improved dramatically in recent years. The potential applications of this technology for the creation of 3D worlds are plentiful - from machine-generated textures to level design, and more. For my internship, I focused on exploring one particular technique for human-guided image generation: Transparent Latent-space Generative Adversarial Network (TL-GAN).

Problem: Outputs of GANs are hard to control

Before we can understand TL-GAN, we need first to understand GANs, the older and simpler model it is based on. GAN stands for “Generative Adversarial Network,” a deep neural network that can learn how to generate new data with the same distribution as the training data it has seen. For example, if our training data consists of a large set of car images, the GAN would train on those images, and eventually learn to create new, unique images of cars.

The primary shortcoming of this approach is that it does not provide a human-understandable manner of controlling the output. A random vector of noise controls the exact image produced by the generator, but exactly what random noise corresponds to which image features is not something a human can understand because the mapping is simply too complex. 

What we want is to have control over the features in the images? We can generate images of cars - but what if we want to control the color of the car, or maybe the number of seats? Any tool to generate content becomes much more powerful if a human can easily and intuitively direct it. The random noise vector that we feed into the generator is called the latent code. To have control over the features, we need to understand this latent space. This is where TL-GAN comes into the picture.

Solution: TL-GAN

This is where TL-GAN comes in. The TL-GAN model provides a way to control the output of the discriminator, but it requires a trained GAN as one of its components. So, my first goal was to train a GAN. As a test-case, I took on the task of generating images of cars, based on a large training set of car images. 

TL-GAN has three major components: a GAN, a feature extractor, and a generalized linear model. The roles of these three components are as follows:

  • GAN: Generates synthetic car images from a latent space random noise vector. We discussed this above. Trained using a large dataset of unlabeled images.
  • Feature Extractor: A multi-class classifier that outputs labels for a given car image. Trained on a smaller set of labeled car images. Once trained, we can use the feature extractor with GAN to produce a large labeled dataset of {random latent vector, features}, by running synthetic car images through the feature extractor. 
  • Generalized Linear Model (GLM): Trained to understand the latent space in terms of the multi-class labels we have. We train this using the large  {random latent vector, features} dataset compiled from our (feature extractor + GAN) process above.

Unfortunately, I did not have the time to finish training TL-GAN. However, when trained, this model could allow a user to design the generated car in a tool that had a random button and a slider for each supported feature. The exciting thing is that if we can do this for cars - we can do it for anything.

Our 2020 Internship Program

Our 2019 Summer Interns were a fantastic addition to the Game Simulations, ML-Agents and Visual Machine Learning teams (some of whom will return next year as full-time team members or for another internship). We will continue to expand our internship program in Summer 2020. If you want the opportunity to work on an aspirational project that will have an impact on the experiences of millions of players, please apply!

Is this article helpful for you?

Thank you for your feedback!