Our primary mission at DeepMind is to push the boundaries of AI, developing programs that can learn to solve any complex problem without needing to be taught how. Our reinforcement learning agents have achieved breakthroughs in Atari 2600 games and the game of Go. Such systems, however, can require a lot of data and a long time to learn so we are always looking for ways to improve our generic learning algorithms.
Our recent paper “Reinforcement Learning with Unsupervised Auxiliary Tasks”introduces a method for greatly improving the learning speed and final performance of agents. We do this by augmenting the standard deep reinforcement learning methods with two main additional tasks for our agents to perform during training.
A visualisation of our agent in a Labyrinth maze foraging task can be seen below.
The first task involves the agent learning how to control the pixels on the screen, which emphasises learning how your actions affect what you will see rather than just prediction. This is similar to how a baby might learn to control their hands by moving them and observing the movements. By learning to change different parts of the screen, our agent learns features of the visual input that are useful for playing the game and getting higher scores.
In the second task the agent is trained to predict the onset of immediate rewards from a short historical context. In order to better deal with the scenario where rewards are rare we present the agent with past rewarding and non-rewarding histories in equal proportion. By learning on rewarding histories much more frequently, the agent can discover visual features predictive of reward much faster.
The combination of these auxiliary tasks, together with our previous A3C paper is our new UNREAL agent (UNsupervised REinforcement and Auxiliary Learning). We tested this agent on a suite of 57 Atari games as well as a 3D environment called Labyrinth with 13 levels. In all the games, the same UNREAL agent is trained in the same way, on the raw image output from the game, to produce actions to maximise the score or reward of the agent in the game. The behaviour required to get game rewards is incredibly varied, from picking up apples in 3D mazes to playing Space Invaders – the same UNREAL algorithm learns to play these games often to human level and beyond. Some results and visualisations can be seen in the video below.
Read More at source