At A Glance – Reinforcement Learning

Where AI becomes both student and teacher

Reinforcement learning is an important aspect of machine learning. As part of reinforcement learning, computer programmes known as ‘agents’ learn how to behave through the process of trial and error. Researchers place agents in specific environments, programme the outcome that they want to see and wait for the computer to work out how to achieve that outcome. It does this by randomly trialling different behaviours to see what works and what doesn’t.

When agents perform actions in reinforcement learning these actions have either positive or negative rewards. Actions which effect desired change in the environment receive a positive reward, whilst those which don’t are punished. The goal of the agent is to maximise the total amount of rewards in the environment. Reinforcement learning processes operate on a looped model, so that the agent gradually learns how to behave over time. This is the same way that humans learn – we try out different ways of interacting with the things and people around us to see what happens, repeating the actions which achieve what we want and discarding the ones which don’t.

Reinforcement learning techniques have been instrumental to many of the big AI breakthroughs of the past few years. In one famous use of reinforcement learning, researchers at DeepMind were able to create an AI which mastered the Chinese game of Go by teaching itself. AlphaGo Zero began life as a neural network which knew nothing about the game. After three hours of playing against itself, it had achieved the level of a human beginner, and after three days it beat the previous version of itself by 100 games to zero. Over just a few days, the computer gained knowledge that it took humans thousands of years to acquire. As this example shows, reinforcement learning is an incredibly powerful technique which also highlights the ability of artificial intelligence to improve exponentially.

Want to learn more? Sign up to our free weekly newsletter for the latest D/SRUPTION insights.