An interesting (and sane) conversation about the defeat of AlphaGo by AlphaGo Zero. It almost completely avoids the science-fiction-y media coverage that tends to emphasise the potential for artificial general intelligence and instead focuses on the following key points:
- Go is a stupendously difficult board game for computers to play but it’s a game in which both players have total information and where the rules are relatively simple. This does not reflect the situation in any real-world decision-making scenario. Correspondingly, this is necessarily a very narrow definition of what an intelligent machine can do.
- AlphaGo Zero represents an order of magnitude improvement in algorithmic modelling and power consumption. In other words, it does a lot more with a lot less.
- Related to this, AlphaGo Zero started from scratch, with humans providing only the rules of the game. So Zero used reinforcement learning (rather than supervised learning) to figure out the same (and in some cases, better) moves than human beings have done over the last thousand years or so).
- It’s an exciting achievement but shouldn’t be conflated with any significant step towards machine intelligence that transfers beyond highly constrained scenarios.
Here’s the abstract from the publication in Nature:
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.