Puppo, The Corgi: Cuteness Overload with the Unity ML-Agents Toolkit
Building a game is a creative process that involves many challenging steps including defining the game concept and logic, building assets and animations, specifying NPC behaviors, tuning difficulty and balance and, finally, testing the game with real players before launch. We believe machine learning can be used across the entire creative process and in today’s blog post we will focus on one of these challenges: specifying the behavior of an NPC.
Traditionally, the behavior of an NPC is hard-coded using scripting and behavior trees. These (typically long) lists of rules process information about the surroundings of the NPC (called observations) to dictate its next action. These rules can be time-consuming to write and maintain as the game evolves. Reinforcement learning provides a promising, alternative framework for defining the behavior of an NPC. More specifically, instead of defining the observation to action mapping by hand, you can simply train your NPC by providing it with rewards when it achieves the desired goal.
The good puppy, bad puppy method
Training an NPC using reinforcement learning is quite similar to how we train a puppy to play fetch. We present the puppy with a treat and then throw the stick. At first, the puppy wanders around not sure what to do, until it eventually picks up the stick and brings it back, promptly getting a treat. After a few sessions, the puppy learns that retrieving a stick is the best way to get a treat and continues to do so.
That is precisely how reinforcement learning works in training the behavior of an NPC. We provide our NPC with a reward whenever it completes a task correctly. Through multiple simulations of the game (the equivalent of many fetch sessions), the NPC builds an internal model of what action it needs to perform at each instance to maximize its reward, which results in the ideal, desired behavior. Thus, instead of creating and maintaining low-level actions for each observation of the NPC, we only need to provide a high-level reward when a task is completed correctly and the NPC learns the appropriate low-level behavior.
Puppo, The Corgi
To showcase the effectiveness of this technique, we built a demo game, “Puppo (read as ‘Pup-o’), The Corgi”, and presented it at Unite Berlin. It is a mobile game where you play fetch with a cute little corgi. Throw a stick to Puppo by swiping on the screen and Puppo brings it back. While the higher-level game logic uses traditional scripting, the corgi learns to walk, run, jump and fetch the stick using reinforcement learning. Instead of using animation or scripted behaviors, the movements of the corgi are trained solely with reinforcement learning. Not only does it look super cute, but the corgi’s motion is driven by the physics engine exclusively. This means for instance that the motion of the corgi can be affected by surrounding RigidBodies.
Puppo became so popular at Unite Berlin that many developers asked us how we made it. That’s why we decided to write this blog post and release the project for you to try it out yourself.
To get started, we will cover the requirements and preliminary work that you need to do to train the corgi. Then, we will share our experience in training it. Finally, we will go over the steps we took to create a game with Puppo as its hero.
Before we get into the details, let’s define a few important notions in reinforcement learning. The goal of reinforcement learning is to learn a policy for an agent. An agent is an entity that interacts with its environment: Every learning step, the agent collects observations about the state of the environment, performs an action, and gets a reward for that action. The policy defines how an agent acts based on the observations it perceives. We can develop a policy by rewarding the agent when his behavior is appropriate.
In our case, the environment is the game scene and the agent is Puppo. Puppo needs to learn a policy so it can play fetch with us. Similar to how we train a real dog with treats to fetch sticks, we can train Puppo by rewarding it appropriately.
We used a ragdoll to create Puppo and its legs are driven by joint motors. Therefore, for Puppo to learn how to get to the target, it must first learn how to rotate the joint motors so that it can move.
A real dog uses vision and other senses to orient itself and to decide where to go. Puppo follows the same methodology. It collects observations about the scene such as proximity to the target, the relative position between itself and the target and the orientation of its own legs, so it can decide what action to take next. In Puppo’s case, the action describes how to rotate the joint motors in order to move.
After each action Puppo performs, we give a reward to the agent. The reward is comprised of:
- Orientation Bonus: We reward Puppo when it is moving towards the target. To do so, we use the Vector3.Dot() method.
- Time Penalty: We give a fixed penalty (negative reward) to Puppo at every action. This way, Puppo will learn to get the stick as fast as possible to avoid a heavy time penalty.
- Rotation Penalty: We penalize Puppo for trying to spin too much. A real dog would be dizzy if it spins too much. To make it look real, we penalize Puppo when it turns around too fast.
- Getting to the target Reward: Most importantly, we reward Puppo for getting to the target.
Now Puppo is ready to learn. It took us two hours on a laptop for the dog to learn to run towards the target efficiently. During the training process, we noticed one interesting behavior. The dog learned to walk rather quickly in about 1 min. Then, as the training continued, the dog learned to run. Soon after, it began to flip over when it tried to make a sudden turn while running. Fortunately, the dog learned how to get back up just as a real dog will do. This clumsy behavior is so cute that you could stop the training at this point and use it directly in the game.
If you are interested in training Puppo yourself, you can follow the instruction in the project. It includes detail steps on how to set up the training and what parameters you should choose. For a more detailed tutorial on how to train agents, please visit the ML-Agents documentation site.
Create a game with Puppo
To create “Puppo, The Corgi” game, we need to define the game logic that lets a player interact with the trained model. Because Puppo has learned to run to a target, we need to implement the logic that changes the target for Puppo within the game.
In game mode, we set the target to be the stick right after the player has thrown it. When Puppo arrives at the stick, we change Puppo’s target to the player’s position in the scene so that Puppo returns the stick to the player. We do this because it’s much easier to train Puppo to move to a target while defining the game flow logic with a script. It’s our belief that Machine Learning and traditional game development methods can be combined to get the best of both approaches. “Puppo, The Corgi” project includes a pre-trained model for the corgi that you can use immediately and even deploy on mobile devices.
We hope this blog post has shed some light on what is achievable with the ML-Agents Toolkit for game development.
Want to dive deep into the code of this project? We released the project and you can download it here. To learn more about how to use the ML-Agents Toolkit, you can find our official documentation and a step-by-step beginner’s guide here. If you are interested in getting a deeper understanding of the math, algorithms, and theories behind reinforcement learning, there is a Reinforcement Learning Nanodegree we offer in partnership with Udacity.