How Eidos-Montréal created Grid Sensors to improve observations for training agents
Within Eidos Labs, several projects use machine learning. The Automated Game Testing project tackles the problem of testing the functionality of expansive AAA games by modeling player behavior with agents that have learned behavior using reinforcement learning (RL). In this blog post, we’ll describe how the team at Eidos Labs created the Grid Sensor within the Unity Machine Learning Agents Toolkit (ML-Agents) to better represent the game for machine learning, improving training times and ultimately leading to less expensive models.
Founded in 2007, Eidos-Montréal’s first goal was to rejuvenate old Eidos Interactive franchises such as Deus Ex: Human Revolution. In 2009, the studio was acquired by Square Enix. Fast forward to 2020, Eidos-Montréal now has about 500 employees working on both games and research projects. The studio recently announced the opening of Eidos-Sherbrooke, a regional chapter that houses Eidos Labs, a cutting-edge team dedicated to driving the technological innovation of Eidos-Montréal.
Automated Game Testing using reinforcement learning
There have been many achievements applying reinforcement learning to create AI systems that can play games at human to superhuman levels, such as StarCraft, Dota 2 and the Atari 2600 suite. However, one of the largest challenges to game developers is the amount of compute time and cost to train these models. Exacerbating this challenge is the sometimes impromptu nature of developing AAA games – developers often add features or update textures and animations, rapidly and dramatically changing a game in its early phases when testing is most needed. Within the Eidos Labs team, finding a middle ground between model expressiveness and training speed while remaining independent of ever-changing game visuals is one of the core goals of the Automated Game Testing project.
To drive innovation and progress, Eidos Labs partnered with Matsuko, a deep tech company focused on AI and 3D, in the development of the Automated Game Testing project, which led to the creation of the Grid Sensor. The team also leveraged the Unity ML-Agents Toolkit for prototyping. The core team consisted of:
- Jaden Travnik (Eidos Labs)
- Romain Trachel (Eidos Labs)
- Alexandre Peyrot (Eidos Labs)
- Charles Pearson (Matsuko)
- Martin Čertický (Matsuko)
- Erik Gajdos (Matsuko)
Defining observations in RL and the Unity ML-Agents Toolkit
The ability of an agent to observe its environment is a key concept in reinforcement learning (RL). After an agent takes an action based on its policy (which defines how the agent should behave at any given time), the agent observes the different states of the environment and determines if the reward has gone up or down. Although rewards and actions are solid levers for improving an RL policy, representation of observations can also significantly affect the agent’s behavior, especially since game engines can take more varied approaches to observations than the real world can offer.
In ML-Agents, sensors are the main mechanism to represent observations for training and executing models. In addition to a general interface, ML-Agents provides two types of sensors to generate observations for the agent that are used to train an RL model. The first type is the use of raycasts, which allow agents to observe and collect data about a GameObject down a line of sight. The developer has control to send not only the distance from the agent to the GameObject but also a reference, allowing the agent to look up other data points such as the GameObjects’s health or whether it’s a foe or friend.
The second type uses the pixels from the camera that is attached to an agent. The camera provides the agent with either a grayscale or an RGB image of the game environment based on how the camera is rigged and positioned. These images can be used to train convolutional neural networks (CNNs), which learn to understand the nonlinear relationship between pixels. By using CNNs, the camera sensor provides a means by which the agent can include high-dimensional images as inputs into its observation stream.
Although both types of sensors can be used in a variety of scenarios, both raycasts and pixel-based cameras have limitations on training agents to play games:
- GameObjects can remain hidden to the line of sight of the agent. If the observation of these objects is crucial to training an agent, then this limitation must be compensated for by the agent’s network capacity, usually by increasing memory.
- Each raycast is independent, and no inherent spatial information is shared between them.
- The length of the raycasts is usually limited because the agent does not need to know about objects that are at the other side of a scene. This means that an agent may not observe objects that fall between these rays. The smaller the object, the less likely it is to be detected. And usually for computational efficiency, fewer raycasts are conducted.
- Using the camera requires rendering of the Scene. This can significantly slow down the number of engine ticks per minute versus other alternatives, such as running headlessly with only discrete observations.
- Headless rendering of a Unity Scene on a remote server requires using Xvfb, which can be prohibitively slow, especially with higher resolution cameras.
- If the textures of the GameObjects in the game are updated, the agent needs to be retrained if it uses camera-based observations.
- The RGB of the camera provides a maximum of three channels to the agent. This limitation reduces the ability to capture other object-specific information, such as depth.
Using RL agents for Automated Game Testing
The Eidos Labs team wanted to successfully train RL agents from the perspective of the player in a 3D environment. One of the core principles of the Automated Game Testing project is to ensure testing is independent of the visuals and properties of the GameObjects. Most games, particularly large projects, evolve continuously. Even small changes to the vertical position, size, color or material of a GameObject require a developer to adjust the properties of the sensors or observations. Hence, testing that is independent of these variables is key to ensuring RL agents in Automated Game Testing can be a viable solution.
For the observations to remain independent of the visuals and properties of the GameObjects, the team tried different implementations of raycasts. However, if the object was small or too far away, it was unlikely that the raycast would hit it, and thus the object was not observed during training. This would lead to undesired agent behavior. The team also tried different implementations of cameras, but this required a lot of rendering overhead and retraining to be robust enough to accommodate changes in the visuals.
Because of the limitations of raycasts and observations, the Eidos Labs team envisioned a new kind of sensor.
The team realized that the RL agent did not need to be restricted by the kind of information that a human player observes. Raycasts and cameras are easily relatable to humans but not always the best way of representing observations in a game for a machine learning model.
The team needed a sensor for training that could:
- Efficiently detect and observe all GameObjects
- Enable Unity to run headlessly for significantly faster and cheaper collection of observations
- Not be solely camera-based, due to training and rendering constraints
- Leverage existing neural network literature and best practices
The team realized that box colliders provide the same mechanism as raycasts in terms of data collection from a GameObject, but they allow the data to have the same structured organization as pixels on an image. The team was also inspired by the MinAtar paper, which used a 10 x 10 x N binary state representation to represent analogs of Atari games to simplify the representation problem for RL agents. The Eidos Labs ultimately created Grid Sensors, which were contributed to ML-Agents.
The Grid Sensor combines the generality of data extraction from raycasts with the computational efficiency of CNNs. The Grid Sensor collects data from GameObjects by querying the physics properties and then structures the data into a “height x width x channel” matrix. This matrix is analogous to an image from an orthographic camera but rather than representing red, green, and blue color values of objects, the “pixels” (or cells) of a Grid Sensor represent a vector of arbitrary data of objects relative to the sensor. Another benefit is that the grid can have a lower resolution, which can improve training times. This matrix can then be fed into a CNN and used for either data analysis or to train RL agents.
The arbitrary data collected from GameObjects within a grid cell can include properties similar to those collected from raycasts. Although one of the original benefits of using CNNs on images was to avoid feature engineering, in practice the team found that this level of feature engineering is useful to game designers as a control mechanism. It simplifies the representation learning problem faced by a CNN and reduces the computational resources necessary to train an agent.
Lastly, unlike an in-game camera, the Grid Sensor depends only on the physics simulation. One of the biggest benefits is that this allows Scene rendering to be disabled, which greatly increases the number of engine ticks per minute and ultimately speeds up training time while still using high-dimensional data. Additionally, by decoupling rendering and pixels from the input into the machine learning model, the visual aspects of a game, such as textures and lighting, can change without affecting an agent’s behavior, which enables better generalization of RL-trained models as the game is being developed.
In summary, Grid Sensor enables a developer to collect arbitrary data from any number of GameObjects while enabling much faster simulation and training.
The Grid Sensor has some assumptions about the kinds of tasks in which it can be used. Although it is not a solution to every problem, Eidos Labs hopes that by open sourcing the Grid Sensor in ML-Agents, the game development community can continue to refine it to better meet different needs.
Try out the Grid Sensor in the Unity ML-Agents Toolkit
Grid Sensor was shipped as part of ML-Agents Release 7 in the extensions package, available on GitHub (along with all releases). For more information about the specific implementation of Grid Search, please refer to the Eidos Labs team’s PR.
If you use any of the features provided in this release, we’d love to hear from you. For any feedback, general issues, or questions regarding ML-Agents, please get in touch with us on the ML-Agents forums or email us directly. If you encounter any bugs, please reach out to us on the ML-Agents GitHub issues page.
If you’d like to work on this exciting intersection of machine learning and games, check out our current openings.