Search Unity

Our two previous blog entries implied that there is a role games can play in driving the development of Reinforcement Learning algorithms. As the world’s most popular creation engine, Unity is at the crossroads between machine learning and gaming. It is critical to our mission to enable machine learning researchers with the most powerful training scenarios, and for us to give back to the gaming community by enabling them to utilize the latest machine learning technologies. As the first step in this endeavor, we are excited to introduce Unity Machine Learning Agents Toolkit.

Training Intelligent Agents

Machine Learning is changing the way we expect to get intelligent behavior out of autonomous agents. Whereas in the past the behavior was coded by hand, it is increasingly taught to the agent (either a robot or virtual avatar) through interaction in a training environment. This method is used to learn behavior for everything from industrial robots, drones, and autonomous vehicles, to game characters and opponents. The quality of this training environment is critical to the kinds of behaviors that can be learned, and there are often trade-offs of one kind or another that need to be made. The typical scenario for training agents in virtual environments is to have a single environment and agent which are tightly coupled. The actions of the agent change the state of the environment, and provide the agent with rewards.

Unity Analytics, Machine Learning, AI

The typical Reinforcement Learning training cycle.

At Unity, we wanted to design a system that provide greater flexibility and ease-of-use to the growing groups interested in applying machine learning to developing intelligent agents. Moreover, we wanted to do this while taking advantage of the high quality physics and graphics, and simple yet powerful developer control provided by the Unity Engine and Editor. We think that this combination can benefit the following groups in ways that other solutions might not:

  • Academic researchers interested in studying complex multi-agent behavior in realistic competitive and cooperative scenarios.
  • Industry researchers interested in large-scale parallel training regimes for robotics, autonomous vehicle, and other industrial applications.
  • Game developers interested in filling virtual worlds with intelligent agents each acting with dynamic and engaging behavior.

Unity Machine Learning Agents Toolkit

We call our solution Unity Machine Learning Agents Toolkit (ML-Agents toolkit for short), and are happy to be releasing an open beta version of our SDK today! The ML-Agents SDK allows researchers and developers to transform games and simulations created using the Unity Editor into environments where intelligent agents can be trained using Deep Reinforcement Learning, Evolutionary Strategies, or other machine learning methods through a simple to use Python API. We are releasing this beta version of Unity ML-Agents toolkit as open-source software, with a set of example projects and baseline algorithms to get you started. As this is an initial beta release, we are actively looking for feedback, and encourage anyone interested to contribute on our GitHub page. For more information on Unity ML-Agents toolkit, continue reading below! For more detailed documentation, see our GitHub Wiki.

Learning Environments

Learning Agents, AI, Machine Learning, Unity Analytics

A visual depiction of how a Learning Environment might be configured within Unity ML-Agents Toolkit.

The three main kinds of objects within any Learning Environment are:

  • Agent – Each Agent can have a unique set of states and observations, take unique actions within the environment, and receive unique rewards for events within the environment. An agent’s actions are decided by the brain it is linked to.
  • Brain – Each Brain defines a specific state and action space, and is responsible for deciding which actions each of its linked agents will take. The current release supports Brains being set to one of four modes:
    • External – Action decisions are made using TensorFlow (or your ML library of choice) through communication over an open socket with our Python API.
    • Internal (Experimental) – Actions decisions are made using a trained model embedded into the project via TensorFlowSharp.
    • Player – Action decisions are made using player input.
    • Heuristic – Action decisions are made using hand-coded behavior.
  • Academy – The Academy object within a scene also contains as children all Brains within the environment. Each environment contains a single Academy which defines the scope of the environment, in terms of:
    • Engine Configuration – The speed and rendering quality of the game engine in both training and inference modes.
    • Frameskip – How many engine steps to skip between each agent making a new decision.
    • Global episode length – How long the episode will last. When reached, all agents are set to done.

The states and observations of all agents with brains set to External are collected by the External Communicator, and communicated to our Python API for processing using your ML library of choice. By setting multiple agents to a single brain, actions can be decided in a batch fashion, opening the possibility of getting the advantages of parallel computation, when supported. For more information on how these objects work together within a scene, see our wiki page.

Flexible Training Scenarios

With Unity ML-Agents toolkit, a variety of training scenarios are possible, depending on how agents, brains, and rewards are connected. We are excited to see what kinds of novel and fun environments the community creates. For those new to training intelligent agents, below are a few examples that can serve as inspiration. Each is a prototypical environment configurations with a description of how it can be created using the ML-Agents SDK.

  • Single-Agent – A single agent linked to a single brain. The traditional way of training an agent. An example is any single-player game, such as Chicken. (Demo project included – “GridWorld”)
  • Simultaneous Single-Agent – Multiple independent agents with independent reward functions linked to a single brain. A parallelized version of the traditional training scenario, which can speed-up and stabilize the training process. An example might be training a dozen robot-arms to each open a door simultaneously. (Demo project included – “3DBall”)
  • Adversarial Self-Play – Two interacting agents with inverse reward functions linked to a single brain. In two-player games, adversarial self-play can allow an agent to become increasingly more skilled, while always having the perfectly matched opponent: itself. This was the strategy employed when training AlphaGo, and more recently used by OpenAI to train a human-beating 1v1 Dota 2 agent. (Demo project included – “Tennis”)
  • Cooperative Multi-Agent – Multiple interacting agents with a shared reward function linked to either a single or multiple different brains. In this scenario, all agents must work together to accomplish a task than couldn’t be done alone. Examples include environments where each agent only has access to partial information, which needs to be shared in order to accomplish the task or collaboratively solve a puzzle. (Demo project coming soon)
  • Competitive Multi-Agent – Multiple interacting agents with inverse reward function linked to either a single or multiple different brains. In this scenario, agents must compete with one another to either win a competition, or obtain some limited set of resources. All team sports would fall into this scenario. (Demo project coming soon)
  • Ecosystem – Multiple interacting agents with independent reward function linked to either a single or multiple different brains. This scenario can be thought of as creating a small world in which animals with different goals all interact, such a savanna in which there might be zebras, elephants, and giraffes, or an autonomous driving simulation within an urban environment. (Demo project coming soon)

Additional Features

Beyond the flexible training scenarios made possible by the Academy/Brain/Agent system, the Unity ML-Agents toolkit also includes other features which improve the flexibility and interpretability of the training process.

  • Monitoring Agent’s Decision Making – Since communication in Unity ML-Agents toolkit is a two-way street, we provide an Agent Monitor class in Unity which can display aspects of the trained agent, such as policy and value output within the Unity environment itself. By providing these outputs in real-time, researchers and developers can more easily debug an agent’s behavior.

Unity Machine learning, AI, Analytics, Learning-Agents

Above each agent is a value estimate, corresponding to how much future reward the agent expects. When the right agent misses the ball, the value estimate drops to zero, since it expects the episode to end soon, resulting in no additional reward.

  • Curriculum Learning – It is often difficult for agents to learn a complex task at the beginning of the training process. Curriculum learning is the process of gradually increasing the difficulty of a task to allow more efficient learning. The Unity ML-Agents toolkit supports setting custom environment parameters every time the environment is reset. This allows elements of the environment related to difficulty or complexity to be dynamically adjusted based on training progress.

Unity Analytics, Machine Learning, AI, Unity

Different possible configurations of the GridWorld environment with increasing complexity.

  • Complex Visual Observations – Unlike other platforms, where the agent’s observation might be limited to a single vector or image, the Unity ML-Agents toolkit allows multiple cameras to be used for observations per agent. This enables agents to learn to integrate information from multiple visual streams, as would be the case when training a self-driving car which required multiple cameras with different viewpoints, a navigational agent which might need to integrate aerial and first-person visuals, or an agent which takes both a raw visual input, as well as a depth-map or object-segmented image.

Two different camera views on the same environment. When both are provided to an agent, it can learn to utilize both first-person and map-like information about the task to defeat the opponent.

  • Imitation Learning (Coming Soon) – It is often more intuitive to simply demonstrate the behavior we want an agent to perform, rather than attempting to have it learn via trial-and-error methods. In a future release, the Unity ML-Agents toolkit will provide the ability to record all state/action/reward information for use in supervised learning scenarios, such as imitation learning. By utilizing imitation learning, a player can provide demonstrations of how an agent should behave in an environment, and then utilize those demonstrations to train an agent in either a standalone fashion, or as a first-step in a reinforcement learning process.

An Evolving Platform

As mentioned above, we are excited to be releasing this open beta version of Unity Machine Learning Agents Toolkit today, which can be downloaded from our GitHub page. This release is only the beginning, and we plan to iterate quickly and provide additional features for both those of you who are interested in Unity as a platform for Machine Learning research, and those of you who are focused on the potential of Machine Learning in game development. While this beta release is more focused on the former group, we will be increasingly providing support for the latter use-case. As mentioned above, we are especially interested in hearing about use-cases and features you would like to see included in future releases of Unity ML-Agents Toolkit, and we will be welcoming Pull Requests made to the GitHub Repository. Please feel free to reach out to us at to share feedback and thoughts. If the project sparks your interests, come join the Unity Machine Learning team!

Happy training!

72 replies on “Introducing: Unity Machine Learning Agents Toolkit”

Damn that’s cool! :)

So, what I’ve noticed is that agents have a list of states with a fixed size, which is ok when you have a constant environment like 1 ball and 1 platform that tries to keep the ball on it.

But how about having enemies which spawn dynamically? Or when these enemies shoot bullets? We’d need a dynamic list of states for that. How would you implement this scenario?

This seems like a great toolbox for integrating Unity with Python. A quick question though. Any clue when the imitation learning tool would be available?
Thank you

It’s a great tool and I’ve been really enjoying working with it the last few days.
Though, while messing around with it, I’ve noticed that the training process itself uses only about 12% of GPU power and around 50% of my cpu.
Am I missing some feature that would let me to use the GPU to its full potential?

Thanks for sharing such a wonderful article with us
We are expecting more articles from this blog

Thank you for the information. Machine learning has its roots in statistics and mathematical optimization. Machine learning covers techniques in supervised and unsupervised learning for applications in prediction, analytics, and data mining. If you want machine learning services. Visit:

sorry, i failed to understand but after running the 3DBall training, what’s the outcome from the training / I mean, is there a result file that we can get & reuse ?

Just curious. Can I use this on Android, IOS platform?
Even if it did, I guess with Unity as middle interpreter of python code, it will be overkill for normal phone CPU.

I really love this. I’ve been playing with it for the last three days.

What I really need now, as a ML newb, is a step-by-step guide that answers a few questions. I understand that this isn’t a good place for answering questions. I don’t want answers here. Just hoping that in time these questions will be answered in the documentation section of ml-agents on github. Thanks!

What is the workflow? When I run the training, do I “load saved model”? Should I only be running a training once? If I run it multiple times does it keep learning? It seems my agent gets WORSE at tasks, not better even though I feel like I’m rewarding correctly. There is clearly some learning going on but if I leave it going overnight (6-8 hours) there doesn’t seem to be any improvement.

Do I run a training many times in a row by finishing one, then immediately rerunning the PPO script? Or do I need to set everything up. Set the steps to a crazy number. Run once. Export model?

How do I set up the cameras? Is there more than just adding them to the agent? Do I need to set up the “resolution” stuff in the brain?
Along with a tuning guide. Steps. Learning rate. What to change if things don’t seem to go right. And how long for simple tasks.

Will there be future training with the teacher? Learning for the known action for the state list. Data, for example, is collected in Player mode. Data type state list -> action list

I’m trying to fallow the Getting Started with the Balance Ball Example tutorial and I managed to get through the tutorial on how to set up Python/TensorFlow ( via the tutorial that was linked in the balance ball example tutorial. But when I try to run the jupyter PPO table I get this error :
ModuleNotFoundError Traceback (most recent call last)
in ()
1 import numpy as np
2 import os
—-> 3 import tensorflow as tf
5 from ppo.history import *

ModuleNotFoundError: No module named ‘tensorflow’

I’m not sure what I did wrong or what to try to do to fix this :( Any ideas at all would be helpful.

Ok, so I was actually able to get past that last error above. I just did not realize I needed to install tensor into the python folder, my bad. But I do have a another question. I am having trouble observing the training process. I open anaconda and put in the tensorboard –logdir=’summaries line and it runs but I’m not able to do anything else without stopping it and I don’t see anything that shows me how the training is going. I let the training go on for a few minutes and then continued to the last cell and then continue past that and then stop running the table by pressing the interrupt kernel button. I look in the models folder but I don’t see an exported model. I do see model-50000.cptk.index and some other files but not a 3DBall.bytes so I am confused on how to properly end running the cells or if I did something else wrong. Any help would be welcome.

Is there a timeline for supervised learning support? I have an application that would be much easier to train with supervision, and am not sure if I should wait or try to get it to train with the existing RL support.


Could you help me and stitch together your implementation of A3C algo into this?
I am looking at it but porting model seems to be above my skillset.
If not, I would gladly accept a short how-to of how shall I do it. Thanks!

I got the 3DBall project to work using the jupyter notebook.
But it looks like unity is running at 1 fps. Also it starts in a tiny window,

Is it supposed to be so slow when learning? I tried on a Nvidia 1060 and 1070

Oh, …. you just made my day. I was attempting to use Unity a few months ago as the environment in my AI research, but was struggling with implementing the ML algorithms I needed (CNNs, ANNs, etc) with C#. I put it on ice and switched to a home-brew 2D environment in Python on Linux so I could make progress on the AI, keeping basic Unity-like structure so I could switch back at some point easily. Looks like I can switch back now :)

This is so perfect.
Thank You!!

Oh and well I could at least try to help you out if you insist on continuing down this space when you could be spending the money completing features you have been putting off for a couple of years now which are more important…..

You should talk to some of the colleges that focus on behavioural analysis and study. is an example there are 2 or 3 focused on this niche of research, that particular one uses mocap systems, taking volunteers to do natural motion, then studying and programming computers to recognize gender, attitude, and emotion from body motion. Another focuses on interaction between humans, and even more on general physics and flow of movement. A last one I try to avoid because it is funded by DARPA and that just spooks me. It is the recognition from a distance project. Ties right into this, and those schools would llikely be happy to provide you with their research, publications, and findings if you in turn expand their knowledgebase and cases by publishing your own findings based on the research guidelines defined in the project licenses to use the mocap databases. AI learning should definately be learning cases of how to react based on subtle actions of the thing it is interfacing against.

Eh this is really interesting stuff, however as a content creation/game development/high end rendering platform which is awesome but has a lot of bugs and a undeserved bad reputation, I really do not thing you need to be here. Besides aren’t you guys afraid of AI…maybe it is time for hollywood to remake the old HAL “Would you like to play a game” ….

Funny I saw an asset the other day that started with HAL, I don’t even know what it was, I saw that much and changed the page.

Really cool demo! I was able to run the 3DBall code and it worked nicely. A few questions though …

How do you run the Tennis application? I did not find any instructions, so I assumed it would also require the PPO notebook, but training took a very long time. When I stopped training after 1000 iterations, and tried to persist the binary model, I got the error AttributeError: ‘NoneType’ object has no attribute ‘model_checkpoint_path’. Maybe I forgot something?

Second, why do you call the internal Brain model experimental? Is that simply because you load the Tensorflow library into the Unity engine which may cause instabilities?

And finally, what are you plans regarding Unity libraries? Do you intend to develop your own ML models, or is the job of ML-agent essentially to provide convenient bindings to existing ML frameworks?

I’m currently using V-Rep for my research with RL. The project is game-related, but it’s also related to robotics, which it’s what prevents me from trying Unity for this specific use case. For example, having a NAO robot fully working ready for importing was really important.
I know it’s not directly related to ML, but do you guys have any plans to expand more towards the field of robotics / having robot models available to researches?
Having said that, I’ll definitely try it on different projects!

Thats amazing, i have one question, Can I pass a previous made dataset to the brain, lets say for example, I have one script that save all the inputs of my players, and I have access to it, and I want to turn all this input into a dataset, to start to train a brain based on it.

It is possible ?

What about the performance on mobile phones with games that contains huge number of states? A game like Poker for example have a huge amount of states and it has a partial observable environment were the agent can’t see the opponent’s cards. Is it possible to make use your ML for a game like Poker on mobile platforms?

I must yes, it’s Good

But the Question here is Why is that Unity is Copying OpenAI strategy in Unity as is, AI learning from players is a Maths project from Dota2 But there are flaws with the system learning as with the kind of Pitch one has made

Perfect! I have done my Machine Learning Subject Project in Game Machine Learning… I didn’t quite get anything sophisticated… I wish Unity Machine Learning would have introduced 5 months ago.???

Perfect! I have done my Machine Learning Subject Project in Game Machine Learning… I didn’t quite get anything sophisticated… I wish Unity Machine Learning would have introduced 5 months ago.??

Very cool, I was already wanting to do a game that implemented reinforcement learning in Unity and this will go a long way towards that. In my case the agent would start off with a trained behavior with the player being able to modify the behavior as part of the gameplay.

Ai and an external database can be good for automatically adjust the starting rendering settings looking to the player hardware when the game starts.

This is immensely useful for game development and AI researchers. Unity is going the right direction.

Comments are closed.