Categories
AI clinical

Human Compatible: Artificial Intelligence and the Problem of Control

Stuart Russell’s newest work, Human Compatible: Artificial Intelligence and the Problem of Control, is a cornerstone piece, alongside Superintelligence and Life 3.0, that articulates the civilization-scale problem we face of aligning machine intelligence with human goals and values. Not only is this a further articulation and development of the AI alignment problem, but Stuart also proposes a novel solution which bring us to a better understanding of what it will take to create beneficial machine intelligence.

Perry, L. (2019). AI Alignment Podcast: Human Compatible: Artificial Intelligence and the Problem of Control with Stuart Russell. Future of Life Institute.

The Future of Life Institute podcast series on AI alignment. An interview with Stuart Russell on his new book, Human Compatible: Artificial Intelligence and the Problem of Control.

The control problem is about ensuring that AI is aligned with human values, which is difficult when we can’t really define what these are.

It’s really hard to specify in advance what we mean when we say “human values” because it’s something that’s likely to be different depending on which humans we ask. This is a significant problem in health systems when clinical AI will increasingly make decisions that affect patient outcomes, considering all the points within that system where ethical judgement influences the choices being made. For example:

  • Micro: What is the likely prognosis for this patient? Do we keep them in the expensive ICU considering that the likelihood of survival is 37%, or do we move them onto the ward? Or send them home for palliative care? These all have cost implications that are weighted differently depending on the confidence we have in the predicted prognosis.
  • Macro: How are national health budgets developed? Do we invest more in infrastructure that is high impact (saves lives, usually in younger patients) but which touches relatively few people, or in services (like physiotherapy) that help many more patients improve quality of life but who may be unlikely to contribute to the state’s revenue base?

An example of tool AI is a system that aims to predict who is likely to be readmitted to hospital following discharge. It provides answers but can’t take action.

In the context of tool AI it’s relatively simple to specify what the utility function should be. In other words we can be quite confident that we can simply tell the system what the goal is and then reward it when it achieves that goal. As Russell says, “this works when machines are stupid.” If the AI gets the goal wrong it’s not a big deal because we can reset it and then try to figure out where the mistake happened. Over time we can keep reiterating until the goal that’s achieved by the system starts to approximate the goal we care about.

But at some point we’re going to move towards clinical AI that makes a decision and then acts on it, which is where we need to have a lot more trust that the system is making the “right choice”. In this context, “right” means a choice that’s aligned with human values. For example, we may decide that in certain contexts the cost of an intervention shouldn’t be considered (because it’s the outcome we care about and not the expense), whereas in other contexts we really do want to say that certain interventions are too expensive relative to the expected outcomes.

See here for The Guardian book review of Human Compatible.

Since we can’t specify up front what the “correct” decision in certain kinds of ethical scenarios should be (because the answer is almost always, “it depends”) we need to make sure that clinical AI really is aligned with what we care about. But, if we can’t use formal rules to determine how AI should integrate human values into its decision-making then how do we move towards a point where we can trust the decisions – and actions – taken by machines?

Russell suggests that, rather than begin with the premise that the AI has perfect knowledge of the world and of our preferences, we could begin with an AI that only knows something about our contextual preferences but that it doesn’t understand them. In this context the AI model only has imperfect or partial knowledge of the objective, which means that it can never be certain of whether it has achieved it. This may lead to situations where the AI must always first check in with a human being because it never knows what the full objective is or if it has been achieved.

Instead of building AI that is convinced of the correctness of its knowledge and actions, Russell suggests that we build doubt into our AI-based systems. Considering the high value of doubt in good decision-making, this is probably a good idea.

By Michael Rowe

I'm a lecturer in the Department of Physiotherapy at the University of the Western Cape in Cape Town, South Africa. I'm interested in technology, education and healthcare and look for places where these things meet.

1 reply on “Human Compatible: Artificial Intelligence and the Problem of Control”

Thanks for yet another highly interest post on a key issue we will have to deal with in healthcare and society at large. I really enjoy following your blogposts. In regard to this one, I really like the idea of building doubt into AI. Before reading the final two sentences, my first thought was not to give AI the ability to decide, but merely to inform. I still lean to that personally, but am afraid it’s not likely to happen that way. Next I was thinking that maybe they key thing we value is rarely actually the decision itself, but the possibility of continuously discussing, or questioning it and answering it otherwise. I suppose this aligns with the idea re doubt and AI having to check in.

Comments are closed.