Search Unity

Our two previous blog entries implied that there is a role games can play in driving the development of Reinforcement Learning algorithms. As the world’s most popular creation engine, Unity is at the crossroads between machine learning and gaming. It is critical to our mission to enable machine learning researchers with the most powerful training scenarios, and for us to give back to the gaming community by enabling them to utilize the latest machine learning technologies. As the first step in this endeavor, we are excited to introduce Unity Machine Learning Agents Toolkit.

Training Intelligent Agents

Machine Learning is changing the way we expect to get intelligent behavior out of autonomous agents. Whereas in the past the behavior was coded by hand, it is increasingly taught to the agent (either a robot or virtual avatar) through interaction in a training environment. This method is used to learn behavior for everything from industrial robots, drones, and autonomous vehicles, to game characters and opponents. The quality of this training environment is critical to the kinds of behaviors that can be learned, and there are often trade-offs of one kind or another that need to be made. The typical scenario for training agents in virtual environments is to have a single environment and agent which are tightly coupled. The actions of the agent change the state of the environment, and provide the agent with rewards.

Unity Analytics, Machine Learning, AI

The typical Reinforcement Learning training cycle.

At Unity, we wanted to design a system that provide greater flexibility and ease-of-use to the growing groups interested in applying machine learning to developing intelligent agents. Moreover, we wanted to do this while taking advantage of the high quality physics and graphics, and simple yet powerful developer control provided by the Unity Engine and Editor. We think that this combination can benefit the following groups in ways that other solutions might not:

  • Academic researchers interested in studying complex multi-agent behavior in realistic competitive and cooperative scenarios.
  • Industry researchers interested in large-scale parallel training regimes for robotics, autonomous vehicle, and other industrial applications.
  • Game developers interested in filling virtual worlds with intelligent agents each acting with dynamic and engaging behavior.

Unity Machine Learning Agents Toolkit

We call our solution Unity Machine Learning Agents Toolkit (ML-Agents toolkit for short), and are happy to be releasing an open beta version of our SDK today! The ML-Agents SDK allows researchers and developers to transform games and simulations created using the Unity Editor into environments where intelligent agents can be trained using Deep Reinforcement Learning, Evolutionary Strategies, or other machine learning methods through a simple to use Python API. We are releasing this beta version of Unity ML-Agents toolkit as open-source software, with a set of example projects and baseline algorithms to get you started. As this is an initial beta release, we are actively looking for feedback, and encourage anyone interested to contribute on our GitHub page. For more information on Unity ML-Agents toolkit, continue reading below! For more detailed documentation, see our GitHub Wiki.

Learning Environments

Learning Agents, AI, Machine Learning, Unity Analytics

A visual depiction of how a Learning Environment might be configured within Unity ML-Agents Toolkit.

The three main kinds of objects within any Learning Environment are:

  • Agent – Each Agent can have a unique set of states and observations, take unique actions within the environment, and receive unique rewards for events within the environment. An agent’s actions are decided by the brain it is linked to.
  • Brain – Each Brain defines a specific state and action space, and is responsible for deciding which actions each of its linked agents will take. The current release supports Brains being set to one of four modes:
    • External – Action decisions are made using TensorFlow (or your ML library of choice) through communication over an open socket with our Python API.
    • Internal (Experimental) – Actions decisions are made using a trained model embedded into the project via TensorFlowSharp.
    • Player – Action decisions are made using player input.
    • Heuristic – Action decisions are made using hand-coded behavior.
  • Academy – The Academy object within a scene also contains as children all Brains within the environment. Each environment contains a single Academy which defines the scope of the environment, in terms of:
    • Engine Configuration – The speed and rendering quality of the game engine in both training and inference modes.
    • Frameskip – How many engine steps to skip between each agent making a new decision.
    • Global episode length – How long the episode will last. When reached, all agents are set to done.

The states and observations of all agents with brains set to External are collected by the External Communicator, and communicated to our Python API for processing using your ML library of choice. By setting multiple agents to a single brain, actions can be decided in a batch fashion, opening the possibility of getting the advantages of parallel computation, when supported. For more information on how these objects work together within a scene, see our wiki page.

Flexible Training Scenarios

With Unity ML-Agents toolkit, a variety of training scenarios are possible, depending on how agents, brains, and rewards are connected. We are excited to see what kinds of novel and fun environments the community creates. For those new to training intelligent agents, below are a few examples that can serve as inspiration. Each is a prototypical environment configurations with a description of how it can be created using the ML-Agents SDK.

  • Single-Agent – A single agent linked to a single brain. The traditional way of training an agent. An example is any single-player game, such as Chicken. (Demo project included – “GridWorld”)
  • Simultaneous Single-Agent – Multiple independent agents with independent reward functions linked to a single brain. A parallelized version of the traditional training scenario, which can speed-up and stabilize the training process. An example might be training a dozen robot-arms to each open a door simultaneously. (Demo project included – “3DBall”)
  • Adversarial Self-Play – Two interacting agents with inverse reward functions linked to a single brain. In two-player games, adversarial self-play can allow an agent to become increasingly more skilled, while always having the perfectly matched opponent: itself. This was the strategy employed when training AlphaGo, and more recently used by OpenAI to train a human-beating 1v1 Dota 2 agent. (Demo project included – “Tennis”)
  • Cooperative Multi-Agent – Multiple interacting agents with a shared reward function linked to either a single or multiple different brains. In this scenario, all agents must work together to accomplish a task than couldn’t be done alone. Examples include environments where each agent only has access to partial information, which needs to be shared in order to accomplish the task or collaboratively solve a puzzle. (Demo project coming soon)
  • Competitive Multi-Agent – Multiple interacting agents with inverse reward function linked to either a single or multiple different brains. In this scenario, agents must compete with one another to either win a competition, or obtain some limited set of resources. All team sports would fall into this scenario. (Demo project coming soon)
  • Ecosystem – Multiple interacting agents with independent reward function linked to either a single or multiple different brains. This scenario can be thought of as creating a small world in which animals with different goals all interact, such a savanna in which there might be zebras, elephants, and giraffes, or an autonomous driving simulation within an urban environment. (Demo project coming soon)

Additional Features

Beyond the flexible training scenarios made possible by the Academy/Brain/Agent system, the Unity ML-Agents toolkit also includes other features which improve the flexibility and interpretability of the training process.

  • Monitoring Agent’s Decision Making – Since communication in Unity ML-Agents toolkit is a two-way street, we provide an Agent Monitor class in Unity which can display aspects of the trained agent, such as policy and value output within the Unity environment itself. By providing these outputs in real-time, researchers and developers can more easily debug an agent’s behavior.

Unity Machine learning, AI, Analytics, Learning-Agents

Above each agent is a value estimate, corresponding to how much future reward the agent expects. When the right agent misses the ball, the value estimate drops to zero, since it expects the episode to end soon, resulting in no additional reward.

  • Curriculum Learning – It is often difficult for agents to learn a complex task at the beginning of the training process. Curriculum learning is the process of gradually increasing the difficulty of a task to allow more efficient learning. The Unity ML-Agents toolkit supports setting custom environment parameters every time the environment is reset. This allows elements of the environment related to difficulty or complexity to be dynamically adjusted based on training progress.

Unity Analytics, Machine Learning, AI, Unity

Different possible configurations of the GridWorld environment with increasing complexity.

  • Complex Visual Observations – Unlike other platforms, where the agent’s observation might be limited to a single vector or image, the Unity ML-Agents toolkit allows multiple cameras to be used for observations per agent. This enables agents to learn to integrate information from multiple visual streams, as would be the case when training a self-driving car which required multiple cameras with different viewpoints, a navigational agent which might need to integrate aerial and first-person visuals, or an agent which takes both a raw visual input, as well as a depth-map or object-segmented image.

Two different camera views on the same environment. When both are provided to an agent, it can learn to utilize both first-person and map-like information about the task to defeat the opponent.

  • Imitation Learning (Coming Soon) – It is often more intuitive to simply demonstrate the behavior we want an agent to perform, rather than attempting to have it learn via trial-and-error methods. In a future release, the Unity ML-Agents toolkit will provide the ability to record all state/action/reward information for use in supervised learning scenarios, such as imitation learning. By utilizing imitation learning, a player can provide demonstrations of how an agent should behave in an environment, and then utilize those demonstrations to train an agent in either a standalone fashion, or as a first-step in a reinforcement learning process.

An Evolving Platform

As mentioned above, we are excited to be releasing this open beta version of Unity Machine Learning Agents Toolkit today, which can be downloaded from our GitHub page. This release is only the beginning, and we plan to iterate quickly and provide additional features for both those of you who are interested in Unity as a platform for Machine Learning research, and those of you who are focused on the potential of Machine Learning in game development. While this beta release is more focused on the former group, we will be increasingly providing support for the latter use-case. As mentioned above, we are especially interested in hearing about use-cases and features you would like to see included in future releases of Unity ML-Agents Toolkit, and we will be welcoming Pull Requests made to the GitHub Repository. Please feel free to reach out to us at ml-agents@unity3d.com to share feedback and thoughts. If the project sparks your interests, come join the Unity Machine Learning team!

Happy training!

72 コメント

コメントの配信登録

コメント受付を終了しました。

  1. Damn that’s cool! :)

    So, what I’ve noticed is that agents have a list of states with a fixed size, which is ok when you have a constant environment like 1 ball and 1 platform that tries to keep the ball on it.

    But how about having enemies which spawn dynamically? Or when these enemies shoot bullets? We’d need a dynamic list of states for that. How would you implement this scenario?

  2. Heads up, you have a dead link:

    > For more information on how these objects work together within a scene, see our wiki page.

    “wiki page” currently points here: https://github.com/Unity-Technologies/python-rl-control/wiki

    That repo doesn’t appear to exist anymore.

  3. I’m happy because this looks cool. But I’m sad because I didn’t think of it first! :'(

  4. Awesome!
    When do we have more algorithms than just PPO?

  5. Thanks for sharing such a wonderful article.
    http://www.broachindia.com/

  6. Shrey Pareek

    10月 31, 2017 5:09 pm

    Hello,
    This seems like a great toolbox for integrating Unity with Python. A quick question though. Any clue when the imitation learning tool would be available?
    Thank you

  7. Michael Knight

    10月 30, 2017 2:21 am

    Thanks for writing this article. I’ve always been interested in A.I. I would love to apply some of this to our Knight O.S. project.
    http://myknightrider2000.blogspot.ca

  8. Arthur Drikis

    10月 27, 2017 11:35 am

    Hi,
    It’s a great tool and I’ve been really enjoying working with it the last few days.
    Though, while messing around with it, I’ve noticed that the training process itself uses only about 12% of GPU power and around 50% of my cpu.
    Am I missing some feature that would let me to use the GPU to its full potential?

    1. Arthur Juliani

      10月 27, 2017 5:11 pm

      Hi Arthur,

      You are correct in noticing that the reinforcement learning algorithm we use (PPO) isn’t as GPU efficient as it could be. The problem stems from the fact that the network is used in two ways: deciding actions (inference) and training (gradient descent). If the batch size is large enough, the training step can fully utilize the GPU. However, during experience collection, the network is only being used to pick actions, and this is much less computationally efficient. There are possible methods for better utilizing the GPU, such as having separate threads collect experience and update the network. These however add complexity to the system, and require tuning in and of themselves to ensure they are providing the level of benefit desired. It is something we are aware of, and will be keeping in mind as we develop future algorithms.

      1. Arthur Drikis

        10月 31, 2017 2:58 pm

        Thanks for the reply!

        I would like to ask another question, though.
        So far I haven’t been able to successfully train a neural network in any environment other than 3DBall.

        In tennis I got to around 18M steps, and the cumulative reward seems to be stagnant with seemingly random spikes: http://prntscr.com/h4dhaq

        While training with GridWorld it looks like I’ve been getting worse and worse results. So far I’m at 4.5M steps, and the reward is just going down. http://prntscr.com/h4diha

        I tried to mess around with buffer size, batch size and hidden unit amount, but it didn’t seem to make much of a difference.

        What am I doing wrong here?
        Should I just wait for more steps?

        Also, it would be really cool if we could get our hands on the values that were used for training the pre-trained TF models for each of the examples.

  9. Data Science Training In Hyderabad

    10月 25, 2017 2:43 pm

    Hi,
    Thanks for sharing such a wonderful article with us
    We are expecting more articles from this blog

  10. Thank you for the information. Machine learning has its roots in statistics and mathematical optimization. Machine learning covers techniques in supervised and unsupervised learning for applications in prediction, analytics, and data mining. If you want machine learning services. Visit:https://www.usmsystems.com/machine-learning/

  11. AnalyticsPath

    10月 23, 2017 1:13 pm

    Thanks For the Valuable information About Machine Learning and Other Professional Courses Trend Setting Today.
    If Any Doubts regarding Machine Learning Please Visit this Website

    http://www.analyticspath.com/machine-learning-training-in-hyderabad

  12. sorry, i failed to understand but after running the 3DBall training, what’s the outcome from the training / I mean, is there a result file that we can get & reuse ?

    1. Arthur Juliani

      10月 21, 2017 8:46 pm

      Yep! You get a model file that you can add back into the project itself. See the Getting Started Guide for more info: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md#embedding-trained-brain-into-unity-environment-experimental

  13. Just curious. Can I use this on Android, IOS platform?
    Even if it did, I guess with Unity as middle interpreter of python code, it will be overkill for normal phone CPU.

    1. Arthur Juliani

      10月 20, 2017 3:01 am

      Hi Zara, by using our “Internal” brain, and the TensorFlowSharp plugin, you can put trained brains into projects for Android and iOS. You just won’t be able to do the actual training on those devices. For more information, see here: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Using-TensorFlow-Sharp-in-Unity-(Experimental).md

  14. I really love this. I’ve been playing with it for the last three days.

    What I really need now, as a ML newb, is a step-by-step guide that answers a few questions. I understand that this isn’t a good place for answering questions. I don’t want answers here. Just hoping that in time these questions will be answered in the documentation section of ml-agents on github. Thanks!

    What is the workflow? When I run the training, do I “load saved model”? Should I only be running a training once? If I run it multiple times does it keep learning? It seems my agent gets WORSE at tasks, not better even though I feel like I’m rewarding correctly. There is clearly some learning going on but if I leave it going overnight (6-8 hours) there doesn’t seem to be any improvement.

    Do I run a training many times in a row by finishing one, then immediately rerunning the PPO script? Or do I need to set everything up. Set the steps to a crazy number. Run once. Export model?

    How do I set up the cameras? Is there more than just adding them to the agent? Do I need to set up the “resolution” stuff in the brain?
    Along with a tuning guide. Steps. Learning rate. What to change if things don’t seem to go right. And how long for simple tasks.

    1. Arthur Juliani

      10月 16, 2017 10:44 pm

      Hi Jeffrey,

      Glad to hear that you are enjoying playing around with it!

      When you run the PPO notebook (or ppo.py) the neural network model will be trained for the number of steps set in max_steps. If you interrupt training, you can set load_model to true, and then continue training. Once you reach the max_steps, if your model isn’t sufficiently trained you can continue training by increasing max_steps. I would recommend looking at the TensorBoard logs though to track performance. If your reward isn’t gradually increasing over time, there may be issues with the reward structure of the environment, or the agent may not be getting the information it needs to solve the task. We have a preliminary “Best Practices” document for creating environments https://github.com/Unity-Technologies/ml-agents/blob/master/docs/best-practices.md , but I agree that it would be a good idea to add a similar page for the training process itself, and that is something we will work on putting together.

  15. This is a great topic to dive into!

  16. swordmaster swordmaster

    10月 13, 2017 3:37 pm

    Great !I developed an asset about machine learning for unity as well :
    https://www.assetstore.unity3d.com/en/#!/content/93236
    This artificial neural networks chose the back propagation algorithm to learn something,and the demo is
    about a car learning how to get out of the maze.

  17. Will there be future training with the teacher? Learning for the known action for the state list. Data, for example, is collected in Player mode. Data type state list -> action list

    1. Arthur Juliani

      10月 7, 2017 10:23 pm

      Hi Sergei,

      This feature is what we are calling “Imitation Learning.” It will be coming in the next release, which we hope to have out in the next few weeks!

  18. I’m trying to fallow the Getting Started with the Balance Ball Example tutorial and I managed to get through the tutorial on how to set up Python/TensorFlow (http://blog.nitishmutha.com/tensorflow/2017/01/22/TensorFlow-with-gpu-for-windows.html) via the tutorial that was linked in the balance ball example tutorial. But when I try to run the jupyter PPO table I get this error :
    ModuleNotFoundError Traceback (most recent call last)
    in ()
    1 import numpy as np
    2 import os
    —-> 3 import tensorflow as tf
    4
    5 from ppo.history import *

    ModuleNotFoundError: No module named ‘tensorflow’

    I’m not sure what I did wrong or what to try to do to fix this :( Any ideas at all would be helpful.

    1. Ok, so I was actually able to get past that last error above. I just did not realize I needed to install tensor into the python folder, my bad. But I do have a another question. I am having trouble observing the training process. I open anaconda and put in the tensorboard –logdir=’summaries line and it runs but I’m not able to do anything else without stopping it and I don’t see anything that shows me how the training is going. I let the training go on for a few minutes and then continued to the last cell and then continue past that and then stop running the table by pressing the interrupt kernel button. I look in the models folder but I don’t see an exported model. I do see model-50000.cptk.index and some other files but not a 3DBall.bytes so I am confused on how to properly end running the cells or if I did something else wrong. Any help would be welcome.

      1. Arthur Juliani

        10月 6, 2017 1:49 am

        Hi Michael, you need to launch a web browser and navigate to localhost:6006 to view the training information once launching Tensorboard.

        I am unsure why you aren’t seeing the .bytes file. Do you get a message letting you know it was created when you run that cell?

  19. Andre Infante

    9月 29, 2017 1:00 am

    Is there a timeline for supervised learning support? I have an application that would be much easier to train with supervision, and am not sure if I should wait or try to get it to train with the existing RL support.

    1. Arthur Juliani

      9月 30, 2017 5:53 am

      Hi Andre,

      We are currently actively developing the imitation learning support, and hope to have it out within a few weeks. If you are adventurous, you can check out the dev-broadcast branch of the repository, where we are developing it. The feature allows brain types besides external to “broadcast” their states/actions to the python api for use in supervised learning. Of course, it is still in development so there will likely be bugs. If you happen to find any, please let us know!

      1. Andre Infante

        10月 1, 2017 10:08 pm

        I will check it out, thank you!

  20. Arthur,

    Could you help me and stitch together your implementation of A3C algo into this?
    I am looking at it but porting model seems to be above my skillset.
    If not, I would gladly accept a short how-to of how shall I do it. Thanks!

    1. Arthur Juliani

      9月 26, 2017 6:18 pm

      Hi Max,

      We include an implementation of PPO with ML-Agents. PPO is a more reliable and efficient algorithm than A3C, so it is included instead. One difference is that the included PPO isn’t asynchronous, but it could be made to be though some adjustments. https://github.com/Unity-Technologies/ml-agents/tree/master/python

      1. Thanks for pointing me in the right direction!
        I reevaluated my approach and found a way to do my thing with PPO, where I will (maybe, will see how it will work without it) apply GA for evolutionary reasons.

        I have another question tho, when I set 2 brains in academy, I change every “brain_name” to “name” inside “for name in env.brain_names:”, but get this error: ” You have 2 brains, you need to feed a dictionary of brain names a keys, and actions as values”, at line “new_info = trainer.take_action(info, env, name)”. Can you show me how to properly feed it?

        Thanks!

  21. Bart Burkhardt

    9月 26, 2017 10:44 am

    I got the 3DBall project to work using the jupyter notebook.
    But it looks like unity is running at 1 fps. Also it starts in a tiny window,

    Is it supposed to be so slow when learning? I tried on a Nvidia 1060 and 1070

    1. Arthur Juliani

      9月 26, 2017 6:16 pm

      Hi Bart. When training, we speed up the engine to 100x, which causes a drop in frame-rate. Although it looks slow, the engine is actually processing thousands of steps of simulation, and is running correctly.

  22. Ryan Potter

    9月 25, 2017 7:29 pm

    Oh, …. you just made my day. I was attempting to use Unity a few months ago as the environment in my AI research, but was struggling with implementing the ML algorithms I needed (CNNs, ANNs, etc) with C#. I put it on ice and switched to a home-brew 2D environment in Python on Linux so I could make progress on the AI, keeping basic Unity-like structure so I could switch back at some point easily. Looks like I can switch back now :)

    This is so perfect.
    Thank You!!

  23. Where is sample project files of these games ML used in this blog thread?

    1. Arthur Juliani

      9月 25, 2017 8:12 pm

      Hi Lee,

      You can find the example projects in the repository here: https://github.com/Unity-Technologies/ml-agents .

  24. Exciting direction… Any plans to work with Apple’s ML framework and Swift?

  25. Tejas Ramanuj

    9月 25, 2017 4:40 am

    How to get Started with unity Machine Learning

  26. Oh and well I could at least try to help you out if you insist on continuing down this space when you could be spending the money completing features you have been putting off for a couple of years now which are more important…..

    You should talk to some of the colleges that focus on behavioural analysis and study. http://paco.psy.gla.ac.uk/index.php?option=com_jdownloads&view=viewcategories&Itemid=62 is an example there are 2 or 3 focused on this niche of research, that particular one uses mocap systems, taking volunteers to do natural motion, then studying and programming computers to recognize gender, attitude, and emotion from body motion. Another focuses on interaction between humans, and even more on general physics and flow of movement. A last one I try to avoid because it is funded by DARPA and that just spooks me. It is the recognition from a distance project. Ties right into this, and those schools would llikely be happy to provide you with their research, publications, and findings if you in turn expand their knowledgebase and cases by publishing your own findings based on the research guidelines defined in the project licenses to use the mocap databases. AI learning should definately be learning cases of how to react based on subtle actions of the thing it is interfacing against.

  27. Eh this is really interesting stuff, however as a content creation/game development/high end rendering platform which is awesome but has a lot of bugs and a undeserved bad reputation, I really do not thing you need to be here. Besides aren’t you guys afraid of AI…maybe it is time for hollywood to remake the old HAL “Would you like to play a game” ….

    Funny I saw an asset the other day that started with HAL, I don’t even know what it was, I saw that much and changed the page.

  28. Michael Bechauf

    9月 22, 2017 1:17 am

    Really cool demo! I was able to run the 3DBall code and it worked nicely. A few questions though …

    How do you run the Tennis application? I did not find any instructions, so I assumed it would also require the PPO notebook, but training took a very long time. When I stopped training after 1000 iterations, and tried to persist the binary model, I got the error AttributeError: ‘NoneType’ object has no attribute ‘model_checkpoint_path’. Maybe I forgot something?

    Second, why do you call the internal Brain model experimental? Is that simply because you load the Tensorflow library into the Unity engine which may cause instabilities?

    And finally, what are you plans regarding Unity libraries? Do you intend to develop your own ML models, or is the job of ML-agent essentially to provide convenient bindings to existing ML frameworks?

    1. Arthur Juliani

      9月 22, 2017 6:55 pm

      Hi Michael,

      You are right that training Tennis takes longer – at least 1 million steps. It is often the case that most complex Reinforcement Learning problems take in the millions of steps. For example, many ATARI games take roughly 200 million steps of training to achieve super-human performance.

      In order to create the binary file you need to have at least one model checkpoint saved. To make those saves happen more frequently, you can adjust save_freq in the hyperparameters.

      We refer to internal as “experimental” because it hasn’t been thoroughly tested enough for us to recommend to game developers as a method for actually controlling game-ai in released games. In the coming months we hope to provide a version we feel strongly enough about take the “experimental” tag off.

      With ML-Agents we are currently supporting TensorFlow integration, but if other solutions present themselves in the future we may explore them as well.

  29. Renato Vargas

    9月 21, 2017 6:08 pm

    I’m currently using V-Rep for my research with RL. The project is game-related, but it’s also related to robotics, which it’s what prevents me from trying Unity for this specific use case. For example, having a NAO robot fully working ready for importing was really important.
    I know it’s not directly related to ML, but do you guys have any plans to expand more towards the field of robotics / having robot models available to researches?
    Having said that, I’ll definitely try it on different projects!

  30. Jordy Henry

    9月 21, 2017 4:23 pm

    Thats amazing, i have one question, Can I pass a previous made dataset to the brain, lets say for example, I have one script that save all the inputs of my players, and I have access to it, and I want to turn all this input into a dataset, to start to train a brain based on it.

    It is possible ?

    1. Arthur Juliani

      9月 22, 2017 6:56 pm

      Hi Jordy,

      This feature is coming soon. You can read a little more about it above, under “Imitation Learning.”

  31. Ramy Dergham

    9月 21, 2017 2:57 pm

    What about the performance on mobile phones with games that contains huge number of states? A game like Poker for example have a huge amount of states and it has a partial observable environment were the agent can’t see the opponent’s cards. Is it possible to make use your ML for a game like Poker on mobile platforms?

    1. Arthur Juliani

      9月 22, 2017 7:05 pm

      Hi Ramy,

      This is a good question! The state-size of the problem doesn’t necessarily increase the complexity of the model. For example, learning from an 8-bit 128x128x3 pixel image contains a huge state-space 256^49152, yet we can use convolutional networks with a few layers to learn to generalize between them. A network like that can actually run relatively easily on a phone.

      Of course, on the other side of that is something like AlphaGo, which ran on a supercomputer… I think for many games though it will be possible to distill the important information into a relatively small network which can generalize within the domain. Especially as smartphones begin to integrate more powerful ML-specific hardware, like Apple is doing with iPhone X.

  32. Hi, when machine learning is run? in Editor or Runtime or both?
    Also can it be done offline?

  33. Can I use PyTorch instead of TensorFlow?

    1. Arthur Juliani

      9月 22, 2017 7:06 pm

      Hi,

      You can definitely use PyTorch for training! The only thing that won’t be possible is to embed the trained PyTorch model back into the Unity game/simulation itself.

  34. do we need python language knowledge for using ml-agent ?

    1. Arthur Juliani

      9月 20, 2017 10:27 pm

      Good question. Right now we are including a pre-made reinforcement learning algorithm called PPO with ML-Agents. You should be able to use it to train simple agents without needing to understand how to modify it yourself.

      We understand though that many Unity game developers mainly have expertise in C#, so we are exploring ways to enable developers to train agents without the need to manually interact with python.

      1. Thank you for the answer.
        Really hope you’ll adapt the feature in C#.

        Right now, I am stuck at the “Installing Dependencies” step and sent an email for details.

  35. Until now I had only seen the Udacity self driving car demo that also worked with a socket connection. I could only get it to run on my old MacBook tough. Also I did the training on a cloud instance

    https://medium.com/towards-data-science/introduction-to-udacity-self-driving-car-simulator-4d78198d301d

    Really looking forward to running the projects form this blogpost as well.

  36. Im having an issue with the installation of tensorflow, I’m running Windows 8.1 64bit Python 3.6.1 64bit anaconda 3
    https://drive.google.com/open?id=0B6Px6xu8RYExa2M2cG5BeFBSOE1iTkNRcDRpY1lqaVNkMEtR

  37. Great !
    How are trained brains saved ?

    1. Arthur Juliani

      9月 20, 2017 6:56 pm

      If you use TensorFlow to create the neural network, the saved Brain is saved as a TensorFlow model, which can be converted to a .bytes file, and embedded into the Unity project directly. For more information, check out this walkthrough: https://github.com/Unity-Technologies/ml-agents/wiki/Getting-Started-with-Balance-Ball.

  38. I must yes, it’s Good

    But the Question here is Why is that Unity is Copying OpenAI strategy in Unity as is, AI learning from players is a Maths project from Dota2 But there are flaws with the system learning as with the kind of Pitch one has made

  39. Does this work on Windows too or Linux only? (Sorry if asking stupid question :D:D)

    1. Arthur Juliani

      9月 20, 2017 6:38 pm

      We are targeting support for Windows, Mac, and Linux (Plus eventually mobile and console). Currently Mac and Linux are the more heavily tested environments. If you encounter an issue on Windows, please let us know here: https://github.com/Unity-Technologies/ml-agents/issues.

      1. Renato Vargas

        9月 21, 2017 5:55 pm

        It seems like a good timing for a official release of the Linux editor :)

  40. Very cool project!!
    I’m looking foward to supports for (spoken) language learning.

    1. Arthur Juliani

      9月 20, 2017 6:36 pm

      Hi! Unity actually already has a solution for speech recognition in games: https://labs.unity.com/article/speech-recognition-and-vr . I hope that is what you are looking for!

  41. Perfect! I have done my Machine Learning Subject Project in Game Machine Learning… I didn’t quite get anything sophisticated… I wish Unity Machine Learning would have introduced 5 months ago.🤔😥😍

  42. Perfect! I have done my Machine Learning Subject Project in Game Machine Learning… I didn’t quite get anything sophisticated… I wish Unity Machine Learning would have introduced 5 months ago.🤔😥

  43. Samuel Otero

    9月 19, 2017 9:19 pm

    Very cool, I was already wanting to do a game that implemented reinforcement learning in Unity and this will go a long way towards that. In my case the agent would start off with a trained behavior with the player being able to modify the behavior as part of the gameplay.

  44. Great stuff but I wish to see more examples of applications for this agents.

    1. Arthur Juliani

      9月 20, 2017 6:31 pm

      More demo projects and videos are definitely on the way! We are also interested in sharing example projects that others might make.

  45. Alan Mattano

    9月 19, 2017 7:02 pm

    Ai and an external database can be good for automatically adjust the starting rendering settings looking to the player hardware when the game starts.

  46. kamran bigdely shamloo

    9月 19, 2017 6:39 pm

    This is immensely useful for game development and AI researchers. Unity is going the right direction.

  47. Cannot wait to see the Ecosystem demos!