“We reached a superior level of performance after training for just 72 hours with AlphaGo Zero,” he says. Only 4.9 million simulated games were needed to train Zero, compared to the original AlphaGo’s 30 million. After the three days of learning Zero was able to defeat the Lee Sedol-conquering version 100-0. After it had been playing the game for 40 days, Zero defeated DeepMind’s previous strongest version of AlphaGo, called Master, which defeated Chinese master Ke Jie in May… Additionally, the new system only uses one neural network instead of two and four of Google’s AI processors compared to the 48 needed to beat Lee.
“It is possible to train to superhuman level, without human examples or guidance, given no knowledge of the domain beyond basic rules,” the research paper concludes. The system learned common human moves and tactics and supplemented them with its own, more efficient moves. “It found these human moves, it tried them and then ultimately it found something it prefers,” Silver says.
This article really gives a sense of the pace of progress in the area of machine learning (in this case, reinforcement learning). Yes, algorithms are limited. No, they cannot generalise across contexts. And they only work with respect to very specific tasks in very narrowly constrained situations. The point is that they always get better and the rate of change is accelerating.