Stuart Russell’s newest work, Human Compatible: Artificial Intelligence and the Problem of Control, is a cornerstone piece, alongside Superintelligence and Life 3.0, that articulates the civilization-scale problem we face of aligning machine intelligence with human goals and values. Not only is this a further articulation and development of the AI alignment problem, but Stuart also proposes a novel solution which bring us to a better understanding of what it will take to create beneficial machine intelligence.Perry, L. (2019). AI Alignment Podcast: Human Compatible: Artificial Intelligence and the Problem of Control with Stuart Russell. Future of Life Institute.
It’s really hard to specify in advance what we mean when we say “human values” because it’s something that’s likely to be different depending on which humans we ask. This is a significant problem in health systems when clinical AI will increasingly make decisions that affect patient outcomes, considering all the points within that system where ethical judgement influences the choices being made. For example:
- Micro: What is the likely prognosis for this patient? Do we keep them in the expensive ICU considering that the likelihood of survival is 37%, or do we move them onto the ward? Or send them home for palliative care? These all have cost implications that are weighted differently depending on the confidence we have in the predicted prognosis.
- Macro: How are national health budgets developed? Do we invest more in infrastructure that is high impact (saves lives, usually in younger patients) but which touches relatively few people, or in services (like physiotherapy) that help many more patients improve quality of life but who may be unlikely to contribute to the state’s revenue base?
In the context of tool AI it’s relatively simple to specify what the utility function should be. In other words we can be quite confident that we can simply tell the system what the goal is and then reward it when it achieves that goal. As Russell says, “this works when machines are stupid.” If the AI gets the goal wrong it’s not a big deal because we can reset it and then try to figure out where the mistake happened. Over time we can keep reiterating until the goal that’s achieved by the system starts to approximate the goal we care about.
But at some point we’re going to move towards clinical AI that makes a decision and then acts on it, which is where we need to have a lot more trust that the system is making the “right choice”. In this context, “right” means a choice that’s aligned with human values. For example, we may decide that in certain contexts the cost of an intervention shouldn’t be considered (because it’s the outcome we care about and not the expense), whereas in other contexts we really do want to say that certain interventions are too expensive relative to the expected outcomes.
Since we can’t specify up front what the “correct” decision in certain kinds of ethical scenarios should be (because the answer is almost always, “it depends”) we need to make sure that clinical AI really is aligned with what we care about. But, if we can’t use formal rules to determine how AI should integrate human values into its decision-making then how do we move towards a point where we can trust the decisions – and actions – taken by machines?
Russell suggests that, rather than begin with the premise that the AI has perfect knowledge of the world and of our preferences, we could begin with an AI that only knows something about our contextual preferences but that it doesn’t understand them. In this context the AI model only has imperfect or partial knowledge of the objective, which means that it can never be certain of whether it has achieved it. This may lead to situations where the AI must always first check in with a human being because it never knows what the full objective is or if it has been achieved.
Instead of building AI that is convinced of the correctness of its knowledge and actions, Russell suggests that we build doubt into our AI-based systems. Considering the high value of doubt in good decision-making, this is probably a good idea.