Reinforcement Learning and AlphaGo

Andy Barto and Sridhar Mahadevan of Computer Science appeared in a piece by the UMass news office last week on the role of Reinforcement Learning in Google DeepMind’s AlphaGo. Andy had this interesting follow-up with his analysis of the factors leading to AlphaGo’s success, noting that the news office piece may have given a bit too much credit to Reinforcement Learning:

I am writing a description of AlphaGo for the 2nd edition of our Reinforcement Learning book. I think what makes it work as well as it does is that it cleverly combines three existing technologies: 1) a type of multi-layer neural network called a deep convolutional network, which is specialized for processing images, or other spatial arrays of data, 2) value function learning from self-play, which descends from Samuel’s famous checkers playing program of the 1960s and Tesauro’s backgammon playing system of the 1990s—this is mostly where reinforcement learning plays a part; and 3) Monte Carlo Tree Search, which is relatively recent and was a big advance for Go playing programs. They did clever engineering in combining these methods. A lot of smart people at DeepMind.