The AI Threat to Democracy

With the advent of strong reinforcement learning…, goal-oriented strategic AI is now very much a reality. The difference is one of categories, not increments. While a supervised learning system relies upon the metrics fed to it by humans to come up with meaningful predictions and lacks all capacity for goal-oriented strategic thinking, reinforcement learning systems possess an open-ended utility function and can strategize continuously on how to fulfil it.

Source: Krumins, A. (2018). The AI Threat to Democracy.

“…an open-ended utility function” means that the algorithm is given a goal state and then left to it’s own devices to figure out how best to optimise towards that goal. It does this by trying a solution and seeing if it got closer to the goal. Every step that moves the algorithm closer to the goal state is rewarded (typically by a token that the algorithm is conditioned to value). In other words, an RL algorithm takes actions to maximise reward. Consequently, it represents a fundamentally different approach to problem-solving than supervised learning, which requires human intervention to tell the algorithm whether or not it’s conclusions are valid.

In the video below, a Deepmind researcher uses AlphaGo and AlphaGo Zero to illustrate the difference between supervised and reinforcement learning.

This is both exciting and a bit unsettling. Exciting because it means that an AI-based system could iteratively solve for problems that we don’t yet know how to solve ourselves. This has implications for the really big, complex challenges we face, like climate change. On the other hand, we should probably start thinking very carefully about the goal states that we ask RL algorithms to optimise towards, especially since we’re not specifying up front what path the system should take to reach the goal, and we have no idea if the algorithm will take human values into consideration when making choices about achieving its goal. We may be at a point where the paperclip maximiser is no longer just a weird thought experiment.

Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans.

Bostrum, N. (2003). Ethical Issues in Advanced Artificial Intelligence.

We may end up choosing goal states without specifying in advance what paths the algorithm should not take because they would be unaligned with human values. Like the problem that Mickey faces in the Sorcerer’s Apprentice, the unintended consequences of our choices with reinforcement learning may be truly significant.

The OpenAI Dota 2 bots just defeated a team of former pros – The Verge

Those artificial intelligence agents learned everything by themselves, exploring and experimenting on the complex Dota playing field at a learning rate of 180 years per day

Source: The OpenAI Dota 2 bots just defeated a team of former pros – The Verge

Yet another important idea that’s often lost in the noise of reporting on AI; machine learning takes place at a rate that we can’t match. And as computation gets more powerful at marginal increases in cost, machine learning will continue accelerating. In 4 months the AI bots developed to the point where they were beating human players who had many years of experience at the professional level.

The developers of OpenAI noted that the OpenAI Five were losing to amateur players within their team back in May. By June, the AI had matured enough to defeat casual players, and today, it’s shown that it’s capable of overwhelming people who’ve been playing Dota 2 literally since its inception.

Another thing to take note of; Dota 2 is a complex strategy game that requires skill in planning, navigation, and decision-making. Even if algorithms do nothing but continue advancing in terms of efficiency and brute force computation, it’s obvious that they’re going to outpace human beings in every task that requires what we call “intelligence”.

Update (28-08-18): When OpenAI put their bots up against a team of professional Dota2 players, it became clear that humans still have the upper hand. See also this post on the more technical aspects of how the OpenAI bots play.