In 2009, Peter Singer wrote the first edition of The Life You Can Save to demonstrate why we should care about and help those living in global extreme poverty, and how easy it is to improve and even save lives by giving effectively.
This morning I listened to an 80 000 hours podcast with Peter Singer and learned that, on the 10th anniversary of its publication, his book, The life you can save, is now available as a free ebook and audiobook (you can get the audiobook as a podcast subscription, which is very convenient). Singer’s ideas in this book, and Practical ethics, have been hugely influential in my thinking and teaching and thought that more people might be interested in the ideas that he shares.
Click on the image below to get to the download page.
If we get to create robots that are also capable of feeling pain then that will be somewhere else that we have to push the circle of moral concern backwards because I certainly think we would have to include them in our moral concern once we’ve actually created beings with capacities, desires, wants, enjoyments, miseries that are similar to ours.
Peter Singer makes a compelling argument that sentient robots (this is assuming we get to the stage where we develop Artificial General Intelligence) ought to be treated in the same way that we treat each other, since they would exhibit the same capacity for pain, desire, joy, etc. as human beings.
I’m interested in what happens when we push the moral boundary further though, since there’s no reason to think that human beings represent any kind of ceiling on what’s possible when it comes to what can be felt and experienced. Will artificially created sentient beings deserve “more” or different rights than human beings, based on their increased capacity for experiencing a wider range of feelings than what is available to us? Will it get to the point where we are to AI-based systems what pigs are to us?
As we dig deeper, it seems that the problems faced by driverless cars and by human drivers are much the same. We try to avoid crashes and collisions, and we have to make split-second decisions when we can’t. Those decisions are governed by our programming and experience. The differences are that computers can think a lot faster, but they can also avoid many crashes that a human driver wouldn’t have. These differences pull in different directions, but they don’t cancel each other out.
Sorrel, C. (2019). Self-Driving Mercedes Will Be Programmed To Sacrifice Pedestrians To Save The Driver. Fast Company.
Initially I thought that this was a presumptuous decision but after thinking about it for a few seconds, I realised that this is exactly what I would do if I was the driver. And given that I’d likely have my family in the car, I’d double down on this choice. Regardless of how many different scenarios you come up with where it makes sense to sacrifice the vehicle occupants, the reality is that human drivers are making these choices every day, and we’re simply not capable of doing the calculations in real time. We’re going to react to save ourselves, every time.
Car manufacturers and software engineers should just programme the car to save the driver, regardless of the complexity of the scenario because this is what humans do, and we’re fine with it.
…for ML systems to truly be successful, they need to understand human values. More to the point, they need to be able to weigh our competing desires and demands, understand what outcomes we value most, and act accordingly.
This article identifies three kinds of autonomous agents that help understand why this is an important question.
Reflex agents react to predetermined changes in the environment e.g. a thermostat regulates the temperature in a house.
Goal-based agents keep working until a predetermined goal has been reached e.g. analyse and identify every image in a set.
Utility-based agents make tradeoffs as part of following decision paths that maximise the total rewards e.g. route planning as part of a navigation app on your phone.
The post makes clear that current ML systems rely on utility-based agents but that these agents must assume a set of priorities that don’t change. To stick with the route planning example, you may want to take the longer route when driving because you prefer to save money even if it costs you in time. However, when you’re late for an important meeting you may value your time more than the money, in which case you’ll want the shorter – more expensive – route. In other words, your values are dynamic and change depending on the context.
We get around this now by being presented with options when we first identify the route we want to take; the phone tells us that the shorter route has a toll road but the longer route will add 15 minutes to our trip. The software is sophisticated enough to know that these are differences that matter to us, but it is impossible for it to know what option is best today, in this moment.
This is just a simple example of finding an optimal route for you while driving, so imagine how complex the decision-making becomes when AI-based systems are implemented in hospitals and schools. While it makes sense to be asked what route you’d prefer when driving to a meeting, we can’t have situations where we’re asked every 5 minutes which of an arbitrary number of choices we’d prefer, given a wide variety of contexts. We’re going to have to give up some of the decision-making authority to machines. Which is why it really matters that we figure out how to get them to include human values in their choices.
Stuart Russell’s newest work, Human Compatible: Artificial Intelligence and the Problem of Control, is a cornerstone piece, alongside Superintelligence and Life 3.0, that articulates the civilization-scale problem we face of aligning machine intelligence with human goals and values. Not only is this a further articulation and development of the AI alignment problem, but Stuart also proposes a novel solution which bring us to a better understanding of what it will take to create beneficial machine intelligence.
It’s really hard to specify in advance what we mean when we say “human values” because it’s something that’s likely to be different depending on which humans we ask. This is a significant problem in health systems when clinical AI will increasingly make decisions that affect patient outcomes, considering all the points within that system where ethical judgement influences the choices being made. For example:
Micro: What is the likely prognosis for this patient? Do we keep them in the expensive ICU considering that the likelihood of survival is 37%, or do we move them onto the ward? Or send them home for palliative care? These all have cost implications that are weighted differently depending on the confidence we have in the predicted prognosis.
Macro: How are national health budgets developed? Do we invest more in infrastructure that is high impact (saves lives, usually in younger patients) but which touches relatively few people, or in services (like physiotherapy) that help many more patients improve quality of life but who may be unlikely to contribute to the state’s revenue base?
In the context of tool AI it’s relatively simple to specify what the utility function should be. In other words we can be quite confident that we can simply tell the system what the goal is and then reward it when it achieves that goal. As Russell says, “this works when machines are stupid.” If the AI gets the goal wrong it’s not a big deal because we can reset it and then try to figure out where the mistake happened. Over time we can keep reiterating until the goal that’s achieved by the system starts to approximate the goal we care about.
But at some point we’re going to move towards clinical AI that makes a decision and then acts on it, which is where we need to have a lot more trust that the system is making the “right choice”. In this context, “right” means a choice that’s aligned with human values. For example, we may decide that in certain contexts the cost of an intervention shouldn’t be considered (because it’s the outcome we care about and not the expense), whereas in other contexts we really do want to say that certain interventions are too expensive relative to the expected outcomes.
Since we can’t specify up front what the “correct” decision in certain kinds of ethical scenarios should be (because the answer is almost always, “it depends”) we need to make sure that clinical AI really is aligned with what we care about. But, if we can’t use formal rules to determine how AI should integrate human values into its decision-making then how do we move towards a point where we can trust the decisions – and actions – taken by machines?
Russell suggests that, rather than begin with the premise that the AI has perfect knowledge of the world and of our preferences, we could begin with an AI that only knows something about our contextual preferences but that it doesn’t understand them. In this context the AI model only has imperfect or partial knowledge of the objective, which means that it can never be certain of whether it has achieved it. This may lead to situations where the AI must always first check in with a human being because it never knows what the full objective is or if it has been achieved.
Instead of building AI that is convinced of the correctness of its knowledge and actions, Russell suggests that we build doubt into our AI-based systems. Considering the high value of doubt in good decision-making, this is probably a good idea.
…we need social scientists with experience in human cognition, behavior, and ethics, and in the careful design of rigorous experiments. Since the questions we need to answer are interdisciplinary and somewhat unusual relative to existing research, we believe many fields of social science are applicable, including experimental psychology, cognitive science, economics, political science, and social psychology, as well as adjacent fields like neuroscience and law.
The development of AI and its implications across society is too important to leave to computer scientists, especially when it comes to AI safety and alignment. The uncertainty around how we think about human values makes it difficult to encode into software, since it involves human rationality, bias and emotion. But because the alignment of our values with AI systems is so fundamental to the ability of those systems to make good decisions, we need to have a wide variety of perspectives aimed at addressing the problem.
…in practice, ‘the robots are coming for our jobs’ usually means something more like ‘a CEO wants to cut his operating budget by 15 percent and was just pitched on enterprise software that promises to do the work currently done by thirty employees in accounts payable.’
It’s important to understand that “technological progress” is not an inexorable march towards an inevitable conclusion that we are somehow powerless to change. We – people – make decisions that influence where we’re going and to some extent, where we end up is evidence of what we value as a society.
The Organisation for Economic Co-operation and Development (OECD) has just released a list of recommendations to promote the development of AI that is “innovative and trustworthy and that respects human rights and democratic values”. The principles are meant to complement existing OECD standards around security, risk management and business practices, and could be seen as a response to concerns around the potential for AI systems to undermine democracy.
The principles were developed by a panel consisting of more than 50 experts from 20 countries, as well as leaders from business, civil society, academic and scientific communities. It should be noted that these principles are not legally binding and should be thought of as suggestions that might influence the decision-making of the stakeholders involved in AI development i.e. all of us. The OECD recognises that:
AI has pervasive, far-reaching and global implications that are transforming societies, economic sectors and the world of work, and are likely to increasingly do so in the future;
AI has the potential to improve the welfare and well-being of people, to contribute to positive sustainable global economic activity, to increase innovation and productivity, and to help respond to key global challenges;
And that, at the same time, these transformations may have disparate effects within, and between societies and economies, notably regarding economic shifts, competition, transitions in the labour market, inequalities, and implications for democracy and human rights, privacy and data protection, and digital security;
And that trust is a key enabler of digital transformation; that, although the nature of future AI applications and their implications may be hard to foresee, the trustworthiness of AI systems is a key factor for the diffusion and adoption of AI; and that a well-informed whole-of-society public debate is necessary for capturing the beneficial potential of the technology [my emphasis], while limiting the risks associated with it;
The recommendations identify five complementary values-based principles for the responsible stewardship of trustworthy AI (while these principles are meant to be general, they’re clearly also appropriate in the more specific context of healthcare):
AI should benefit people and the planet by driving inclusive growth, sustainable development and well-being.
AI systems should be designed in a way that respects the rule of law, human rights, democratic values and diversity, and they should include appropriate safeguards – for example, enabling human intervention where necessary – to ensure a fair and just society.
There should be transparency and responsible disclosure around AI systems to ensure that people understand AI-based outcomes and can challenge them.
AI systems must function in a robust, secure and safe way throughout their life cycles and potential risks should be continually assessed and managed.
Organisations and individuals developing, deploying or operating AI systems should be held accountable for their proper functioning in line with the above principles.
The OECD also provides five recommendations to governments:
Facilitate public and private investment in research & development to spur innovation in trustworthy AI.
Foster accessible AI ecosystems with digital infrastructure and technologies and mechanisms to share data and knowledge.
Ensure a policy environment that will open the way to deployment of trustworthy AI systems.
Empower people with the skills for AI and support workers for a fair transition.
Co-operate across borders and sectors to progress on responsible stewardship of trustworthy AI.
For a more detailed description of the principles, as well as the background and plans for follow-up and monitoring processes, see the OECD Legal Instrument describing the recommendations.
Is it acceptable for algorithms today, or an AGI in a decade’s time, to suggest withdrawal of aggressive care and so hasten death? Or alternatively, should it recommend persistence with futile care? The notion of “doing no harm” is stretched further when an AI must choose between patient and societal benefit. We thus need to develop broad principles to govern the design, creation, and use of AI in healthcare. These principles should encompass the three domains of technology, its users, and the way in which both interact in the (socio-technical) health system.
With the advent of strong reinforcement learning…, goal-oriented strategic AI is now very much a reality. The difference is one of categories, not increments. While a supervised learning system relies upon the metrics fed to it by humans to come up with meaningful predictions and lacks all capacity for goal-oriented strategic thinking, reinforcement learning systems possess an open-ended utility function and can strategize continuously on how to fulfil it.
“…an open-ended utility function” means that the algorithm is given a goal state and then left to it’s own devices to figure out how best to optimise towards that goal. It does this by trying a solution and seeing if it got closer to the goal. Every step that moves the algorithm closer to the goal state is rewarded (typically by a token that the algorithm is conditioned to value). In other words, an RL algorithm takes actions to maximise reward. Consequently, it represents a fundamentally different approach to problem-solving than supervised learning, which requires human intervention to tell the algorithm whether or not it’s conclusions are valid.
In the video below, a Deepmind researcher uses AlphaGo and AlphaGo Zero to illustrate the difference between supervised and reinforcement learning.
This is both exciting and a bit unsettling. Exciting because it means that an AI-based system could iteratively solve for problems that we don’t yet know how to solve ourselves. This has implications for the really big, complex challenges we face, like climate change. On the other hand, we should probably start thinking very carefully about the goal states that we ask RL algorithms to optimise towards, especially since we’re not specifying up front what path the system should take to reach the goal, and we have no idea if the algorithm will take human values into consideration when making choices about achieving its goal. We may be at a point where the paperclip maximiser is no longer just a weird thought experiment.
Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans.
We may end up choosing goal states without specifying in advance what paths the algorithm should not take because they would be unaligned with human values. Like the problem that Mickey faces in the Sorcerer’s Apprentice, the unintended consequences of our choices with reinforcement learning may be truly significant.