Reinforcement learning is a rapidly developing branch of machine learning. Some of the recent mind-blowing achievements in AI are a result of the exponential growth made in deep reinforcement learning. In this blog post, I’ll show you why reinforcement learning needs simulation and provide an example model with source files and instructions for you to download and try.
Deep reinforcement learning success
Probably the most famous example of deep reinforcement learning is the defeat of Go world champion, Lee Sedol, by Deepmind’s AlphaGo. Although the rules are simple, the game complexity of Go makes it formidably difficult and it was seen as the biggest challenge in classical games for artificial intelligence to master. It's estimated that there are more valid ways to play the game to conclusion than there are atoms in the observable universe.
#AlphaGo won game 3, claims match victory against best Go player of last decade, Lee Sedol → https://t.co/MbtYm64lhL pic.twitter.com/goHJvxCPUI— Google (@Google) March 12, 2016
AlphaGo accomplished this seemingly unattainable goal using deep reinforcement learning to train itself over the course of millions of games. The system was able to learn how to play the game from scratch and accumulated thousands of years of human knowledge in the span of a few days.
To better understand the AlphaGo success, we should look at how computers learn. Broadly speaking, people learn in two ways: either by knowledge transfer (from a teacher or a book), or by trial and error. The same is true for computers.
For computer programs, the knowledge transfer method is like hard coding chess rules and strategies into a computer so it can then use them to play chess. In contrast, the trial and error method is similar to a computer repeatedly playing chess until it develops its own knowledge and intuition about what is considered superior gameplay.
For trial and error, a computer program needs a playground to try its ideas and to learn from its mistakes and achievements. Such an environment can either be in the real world (on private roads, in restricted airspace, or on a mock assembly line, for example) or it can be virtual.
Although real-world playgrounds can be more lifelike, they have many disadvantages compared to simulation environments, such as their acquisition and construction costs, as well as in possible risks to lives and surroundings. Regulatory red tape can also limit experimentation.
In contrast, simulation models have no limitations—they are almost completely free and can be setup in a very controlled fashion. Models in virtual environments also run faster than in the real world, since they are not bound in the same way by the passage of time. This advantage was made clear by OpenAI, after they easily defeated the world champions of the sophisticated cooperative strategy game Dota 2. In ten months of training, the OpenAI system completed 45,000 human years’ worth of practice.
OpenAI Five is now the first AI to beat the world champions in an esports game. Here's what happened, and how we made our comeback since losing to pros in Aug 2018: https://t.co/QH6yj0Gmz3 pic.twitter.com/WvV4ERTvZt— OpenAI (@OpenAI) April 15, 2019
While deep reinforcement learning is a new development in the world of artificial intelligence, and still mainly considered a research topic, simulation modeling has been in daily practical use for decades. It has a very mature community with a vast body of real-world examples.
Common practice in the simulation community is to take simulation models, run experiments (Optimization, Monte Carlo, parameter variation, etc.) and use the outputs to make better decisions about a model’s real-world counterpart. With this approach, a human is needed to experiment with the simulation model and get information from it.
As mentioned earlier, recent developments in deep reinforcement learning have clearly demonstrated that learning agents (computer algorithms) are also very capable of extracting useful decisions (policies) from simulated systems. So, it makes sense to combine simulation modeling environments with machine learning, especially as interest moves away from gaming challenges and towards business-oriented objectives.
Reinforcement learning example model
To showcase the capabilities of a powerful general-purpose simulation tool as a training environment, AnyLogic worked with Pathmind to develop a simple but illustrative example model based on the simulation of a traffic light-controlled intersection. A similar version of this model was demonstrated at the 2019 AnyLogic Conference in Austin, Texas, as part of a presentation [video] on the practical application of deep reinforcement learning using AnyLogic.
This example model has been superseded and there are now multiple example models. Please find them for the following integrations:
- Microsoft Project Bonsai
- h2o.ai Automatic Machine Learning
To learn about AnyLogic and AI in general, visit our dedicated AI page
⭐ Many thanks to the Pathmind team – particularly Samuel Audet and Eduardo Gonzalez – for their involvement in this project, their contributions were invaluable. Further questions relating to the DL4J library made use of in the example model can be asked on their Gitter page.