DeepMind’s agents are not really collaborating, said Mark Riedl, a professor at Georgia Tech College of Computing who specializes in artificial intelligence. They are merely responding to what is happening in the game, rather than trading messages with one another, as human players do…Although the result looks like collaboration, the agents achieve it because, individually, they so completely understand what is happening in the game.
The problem with arguments like this is that 1) we end up playing semantic games about what words mean, 2) what we call the computer’s achievement isn’t relevant, and 3) just because the algorithmic solution doesn’t look the same as a human solution doesn’t make it less effective.
The concern around the first point is that, as algorithms become more adept at solving complex problems, we end up painting ourselves into smaller and smaller corners, hemmed in by how we defined the characteristics necessary to solve those problems. In this case, we can define collaboration in a way that means that algorithms aren’t really collaborating but tomorrow when they can collaborate according to today’s definition, we’ll see people wanting to change the definition again.
The second point relates to competence. Algorithms are designed to be competent at solving complex problems, not to solve them in ways that align with our definitions of what words mean. In other words, DeepMind doesn’t care how the algorithm solves the problem, only that it does. Think about developing a treatment for cancer…will we care that the algorithm didn’t work closely with all stakeholders, as human teams would have to, or will it only matter that we have an effective treatment? In the context of solving complex problems, we care about competence.
And finally, why would it matter that algorithmic solutions don’t look the same as human solutions? In this case, human game-players have to communicate in order to work together because it’s impossible for them to do the computation necessary to “completely understand what is happening in the game”. If we had the ability to do that computation, we’d also drop “communication” requirement because it would only slow us down and add nothing to our ability to solve the problem.
With the advent of strong reinforcement learning…, goal-oriented strategic AI is now very much a reality. The difference is one of categories, not increments. While a supervised learning system relies upon the metrics fed to it by humans to come up with meaningful predictions and lacks all capacity for goal-oriented strategic thinking, reinforcement learning systems possess an open-ended utility function and can strategize continuously on how to fulfil it.
“…an open-ended utility function” means that the algorithm is given a goal state and then left to it’s own devices to figure out how best to optimise towards that goal. It does this by trying a solution and seeing if it got closer to the goal. Every step that moves the algorithm closer to the goal state is rewarded (typically by a token that the algorithm is conditioned to value). In other words, an RL algorithm takes actions to maximise reward. Consequently, it represents a fundamentally different approach to problem-solving than supervised learning, which requires human intervention to tell the algorithm whether or not it’s conclusions are valid.
In the video below, a Deepmind researcher uses AlphaGo and AlphaGo Zero to illustrate the difference between supervised and reinforcement learning.
This is both exciting and a bit unsettling. Exciting because it means that an AI-based system could iteratively solve for problems that we don’t yet know how to solve ourselves. This has implications for the really big, complex challenges we face, like climate change. On the other hand, we should probably start thinking very carefully about the goal states that we ask RL algorithms to optimise towards, especially since we’re not specifying up front what path the system should take to reach the goal, and we have no idea if the algorithm will take human values into consideration when making choices about achieving its goal. We may be at a point where the paperclip maximiser is no longer just a weird thought experiment.
Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans.
We may end up choosing goal states without specifying in advance what paths the algorithm should not take because they would be unaligned with human values. Like the problem that Mickey faces in the Sorcerer’s Apprentice, the unintended consequences of our choices with reinforcement learning may be truly significant.
An interesting (and sane) conversation about the defeat of AlphaGo by AlphaGo Zero. It almost completely avoids the science-fiction-y media coverage that tends to emphasise the potential for artificial general intelligence and instead focuses on the following key points:
Go is a stupendously difficult board game for computers to play but it’s a game in which both players have total information and where the rules are relatively simple. This does not reflect the situation in any real-world decision-making scenario. Correspondingly, this is necessarily a very narrow definition of what an intelligent machine can do.
AlphaGo Zero represents an order of magnitude improvement in algorithmic modelling and power consumption. In other words, it does a lot more with a lot less.
Related to this, AlphaGo Zero started from scratch, with humans providing only the rules of the game. So Zero used reinforcement learning (rather than supervised learning) to figure out the same (and in some cases, better) moves than human beings have done over the last thousand years or so).
It’s an exciting achievement but shouldn’t be conflated with any significant step towards machine intelligence that transfers beyond highly constrained scenarios.
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.
This is the point at which the risk from medical AI becomes much greater. Our inability to explain exactly how AI systems reach certain decisions is well-documented. And, as we’ve seen with self-driving car crashes, when humans take our hands off the wheel, there’s always a chance that a computer will make a fatal error in judgment.
This is just lazy. “When humans take their hands off the wheel…”? OK, then who is responsible for all the death and suffering that happens when humans have their hands on the wheel? Thinking about this for 3 seconds should make it clear that human beings are responsible for almost all human deaths. Getting to the point where we take our hands off the wheel (and off the scalpel, and off the prescription charts, and off the stock exchange) could be the safest thing we will ever do.
Also, DeepMind has moved on from only being able to diagnose diabetic retinopathy, to accurately identifying 50 different conditions of the eye. Tomorrow, it’ll be more.