10 recommendations for the ethical use of AI

In February the New York Times hosted the New Work Summit, a conference that explored the opportunities and risks associated with the emergence of artificial intelligence across all aspects of society. Attendees worked in groups to compile a list of recommendations for building and deploying ethical artificial intelligence, the results of which are listed below.

  1. Transparency: Companies should be transparent about the design, intention and use of their A.I. technology.
  2. Disclosure: Companies should clearly disclose to users what data is being collected and how it is being used.
  3. Privacy: Users should be able to easily opt out of data collection.
  4. Diversity: A.I. technology should be developed by inherently diverse teams.
  5. Bias: Companies should strive to avoid bias in A.I. by drawing on diverse data sets.
  6. Trust: Organizations should have internal processes to self-regulate the misuse of A.I. Have a chief ethics officer, ethics board, etc.
  7. Accountability: There should be a common set of standards by which companies are held accountable for the use and impact of their A.I. technology.
  8. Collective governance: Companies should work together to self-regulate the industry.
  9. Regulation: Companies should work with regulators to develop appropriate laws to govern the use of A.I.
  10. “Complementarity”: Treat A.I. as tool for humans to use, not a replacement for human work.

The list of recommendations seems reasonable enough on the surface, although I wonder how practical they are given the business models of the companies most active in developing AI-based systems. As long as Google, Microsoft, Facebook, etc. are generating the bulk of their revenue from advertising that’s powered by the data we give them, they have little incentive to be transparent, to disclose, to be regulated, etc. If we opt our data out of the AI training pool, the AI is more susceptible to bias and less useful/accurate, so having more data is usually better for algorithm development. And having internal processes to build trust? That seems odd.

However, even though it’s easy to find issues with all of these recommendations it doesn’t mean that they’re not useful. The more of these kinds of conversations we have, the more likely it is that we’ll figure out a way to have AI that positively influences society.

Comment: In competition, people get discouraged by competent robots

After each round, participants filled out a questionnaire rating the robot’s competence, their own competence and the robot’s likability. The researchers found that as the robot performed better, people rated its competence higher, its likability lower and their own competence lower.

Lefkowitz, M. (2019). In competition, people get discouraged by competent robots. Cornell Chronicle.

This is worth noting since it seems increasingly likely that we’ll soon be working, not only with more competent robots but also with more competent software. There are already concerns around how clinicians will respond to the recommendations of clinical decision-support systems, especially when those systems make suggestions that are at odds with the clinician’s intuition.

Paradoxically, the effect may be even worse with expert clinicians who may not always be able to explain their decision-making. Novices, who use more analytical frameworks (or even basic algorithms like, IF this, THEN that) may find it easier to modify their decisions because their reasoning is more “visible” (System 2). Experts, who rely more on subconscious pattern recognition (System 1), may be less able to identify where in their reasoning process they were victim to confounders like confirmation or availability bia, and so less likely to modify their decisions.

It seems really clear that we need to start thinking about how we’re going to prepare current and future clinicians for the arrival of intelligent agents in the clinical context. If we start disregarding the recommendations of clinical decision support systems, not because they produce errors in judgement but because we simply don’t like them, then there’s a strong case to be made that it is the human that we cannot trust.


Contrast this with automation bias, which is the tendency to give more credence to decisions made by machines because of a misplaced notion that algorithms are simply more trustworthy than people.

MIT researchers show how to detect and address AI bias without loss in accuracy

The key…is often to get more data from underrepresented groups. For example…an AI model was twice as likely to label women as low-income and men as high-income. By increasing the representation of women in the dataset by a factor of 10, the number of inaccurate results was reduced by 40 percent.

Source: MIT researchers show how to detect and address AI bias without loss in accuracy | VentureBeat

What many people don’t understand about algorithmic bias is that it’s corrected quite easily, relative to the challenge of correcting bias in human beings. If machine learning outputs are biased, we can change the algorithm, and we can change the datasets. What’s the plan for changing human bias?

The AI Threat to Democracy

With the advent of strong reinforcement learning…, goal-oriented strategic AI is now very much a reality. The difference is one of categories, not increments. While a supervised learning system relies upon the metrics fed to it by humans to come up with meaningful predictions and lacks all capacity for goal-oriented strategic thinking, reinforcement learning systems possess an open-ended utility function and can strategize continuously on how to fulfil it.

Source: Krumins, A. (2018). The AI Threat to Democracy.

“…an open-ended utility function” means that the algorithm is given a goal state and then left to it’s own devices to figure out how best to optimise towards that goal. It does this by trying a solution and seeing if it got closer to the goal. Every step that moves the algorithm closer to the goal state is rewarded (typically by a token that the algorithm is conditioned to value). In other words, an RL algorithm takes actions to maximise reward. Consequently, it represents a fundamentally different approach to problem-solving than supervised learning, which requires human intervention to tell the algorithm whether or not it’s conclusions are valid.

In the video below, a Deepmind researcher uses AlphaGo and AlphaGo Zero to illustrate the difference between supervised and reinforcement learning.

This is both exciting and a bit unsettling. Exciting because it means that an AI-based system could iteratively solve for problems that we don’t yet know how to solve ourselves. This has implications for the really big, complex challenges we face, like climate change. On the other hand, we should probably start thinking very carefully about the goal states that we ask RL algorithms to optimise towards, especially since we’re not specifying up front what path the system should take to reach the goal, and we have no idea if the algorithm will take human values into consideration when making choices about achieving its goal. We may be at a point where the paperclip maximiser is no longer just a weird thought experiment.

Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans.

Bostrum, N. (2003). Ethical Issues in Advanced Artificial Intelligence.

We may end up choosing goal states without specifying in advance what paths the algorithm should not take because they would be unaligned with human values. Like the problem that Mickey faces in the Sorcerer’s Apprentice, the unintended consequences of our choices with reinforcement learning may be truly significant.

When AI Misjudgment Is Not an Accident

The conversation about unconscious bias in artificial intelligence often focuses on algorithms that unintentionally cause disproportionate harm to entire swaths of society…But the problem could run much deeper than that. Society should be on guard for another twist: the possibility that nefarious actors could seek to attack artificial intelligence systems by deliberately introducing bias into them, smuggled inside the data that helps those systems learn.

Source: Yeung, D. (2018). When AI Misjudgment Is Not an Accident.

I’m not sure how this might apply to clinical practice but, given our propensity for automation bias, it seems that this is the kind of thing that we need to be aware of. It’s not just that algorithms will make mistakes but that people may intentionally set them up to do so by introducing biased data into the training dataset. Instead of hacking into databases to steal data, we may start seeing database hacks that insert new data into them, with the intention of changing our behaviour.

What this suggests is that bias is a systemic challenge—one requiring holistic solutions. Proposed fixes to unintentional bias in artificial intelligence seek to advance workforce diversity, expand access to diversified training data, and build in algorithmic transparency (the ability to see how algorithms produce results).

Mozilla’s Common Voice project

Any high-quality speech-to-text engines require thousands of hours of voice data to train them, but publicly available voice data is very limited and the cost of commercial datasets is exorbitant. This prompted the question, how might we collect large quantities of voice data for Open Source machine learning?

Source: Branson, M. (2018). We’re intentionally designing open experiences, here’s why.

One of the big problems with the development of AI is that few organisations have the large, inclusive, diverse datasets that are necessary to reduce the inherent bias in algorithmic training. Mozilla’s Common Voice project is an attempt to create a large, multilanguage dataset of human voices with which to train natural language AI.

This is why we built Common Voice. To tell the story of voice data and how it relates to the need for diversity and inclusivity in speech technology. To better enable this storytelling, we created a robot that users on our website would “teach” to understand human speech by speaking to it through reading sentences.

I think that voice and audio is probably going to be the next compter-user interface so this is an important project to support if we want to make sure that Google, Facebook, Baidu and Tencent don’t have a monopoly on natural language processing. I see this project existing on the same continuum as OpenAI, which aims to ensure that “…AGI’s benefits are as widely and evenly distributed as possible.” Whatever you think about the possibility of AGI arriving anytime soon, I think it’s a good thing that people are working to ensure that the benefits of AI aren’t mediated by a few gatekeepers whose primary function is to increase shareholder value.

Most of the data used by large companies isn’t available to the majority of people. We think that stifles innovation. So we’ve launched Common Voice, a project to help make voice recognition open and accessible to everyone. Now you can donate your voice to help us build an open-source voice database that anyone can use to make innovative apps for devices and the web. Read a sentence to help machines learn how real people speak. Check the work of other contributors to improve the quality. It’s that simple!

The datasets are openly licensed and available for anyone to download and use, alongside other open language datasets that Mozilla links to on the page. This is an important project that everyone should consider contributing to. The interface is intuitive and makes it very easy to either submit your own voice or to validate the recordings that other people have made. Why not give it a go?

The Future of Artificial Intelligence Depends on Trust

To open up the AI black box and facilitate trust, companies must develop AI systems that perform reliably — that is, make correct decisions — time after time. The machine-learning models on which the systems are based must also be transparent, explainable, and able to achieve repeatable results.

Source: Rao, A. & Cameron, E. (2018). The Future of Artificial Intelligence Depends on Trust.

It still bothers me that we insist on explainability for AI systems while we’re quite happy for the decisions of clinicians to remain opaque, inaccurate, and unreliable. We need to move past the idea that there’s anything special about human intuition and that algorithms must satisfy a set of criteria that we would never dream of applying to ourselves.

Want Less-Biased Decisions? Use Algorithms.

At the heart of this work is the concern that algorithms are often opaque, biased, and unaccountable tools being wielded in the interests of institutional power. So how worried should we be about the modern ascendance of algorithms?

These critiques and investigations are often insightful and illuminating, and they have done a good job in disabusing us of the notion that algorithms are purely objective. But there is a pattern among these critics, which is that they rarely ask how well the systems they analyze would operate without algorithms. And that is the most relevant question for practitioners and policy makers: How do the bias and performance of algorithms compare with the status quo? Rather than simply asking whether algorithms are flawed, we should be asking how these flaws compare with those of human beings.

Source: Miller, P. (2018). Want Less-Biased Decisions? Use Algorithms.

From where I’m standing this isn’t even news. Anyone who has worked with other human beings has first-hand experience of our ability to make bad choices. In retrospect, we look back at those decisions and wonder how it was possible for anyone to be so blind as to what was obviously an awful decision. And, we’re predictable in how consistently we make bad choices. To think that there is something special about human intelligence is to willfully ignore the evidence.

In all the examples mentioned…, the humans who used to make decisions were so remarkably bad that replacing them with algorithms both increased accuracy and reduced institutional biases.

Yes, algorithms are biased but they aren’t any more biased than human beings. In fact, the evidence seems to show that they are less biased, more accurate and faster to reach conclusions than we are. There’s nothing special about having a human in the decision-making loop and sometimes I  wonder this requirement will simply add more noise to the system. Whereas an algorithm will be able to support its decision with a direct link back to the data, we’ll never really know what informs human-derived conclusions. We’re probably moving towards a future where trust in machines will be the norm, and this is going to have implications for how we prepare future healthcare professionals for clinical decision-making.

Defensive Diagnostics: the legal implications of AI in radiology

Doctors are human. And humans make mistakes. And while scientific advancements have dramatically improved our ability to detect and treat illness, they have also engendered a perception of precision, exactness and infallibility. When patient expectations collide with human error, malpractice lawsuits are born. And it’s a very expensive problem.

Source: Defensive Diagnostics: the legal implications of AI in radiology

There are few things to note in this article. The first, and most obvious, was that we have a much higher standard for AI-based expert systems (i.e. algorithmic diagnosis and prediction) than we do for human experts. Our expectations for algorithmic clinical decision-making are far more exacting than those we have for physicians. It seems strange that we accept the fallibility of human beings but expect nothing less than perfection from AI-based systems. [1]

Medical errors are more frequent than anyone cares to admit. In radiology, the retrospective error rate is approximately 30% across all specialities, with real-time error rates in daily practice averaging between 3% and 5%.

The second takeaway was that one of the most significant areas of influence for AI in clinical settings may not be in the primary diagnosis but rather the follow up analysis that  highlights potential mistakes that the clinician may have made. These applications of AI for secondary diagnostic review will be cheap and won’t add any additional workload to healthcare professionals. They will simply review the clinician’s conclusion and flag those cases that may benefit from additional testing. Of course, this will probably be driven by patient litigation.


[1] Incidentally, the same principle seems to be true for self-driving cars; we expect nothing but a perfect safety record for autonomous vehicles but are quite happy with the status quo for human drivers (1.2 million traffic-related deaths in a single year). Where is the moral panic around the mass slaughter of human beings by human drivers? If an algorithm is only slightly safer than a human being behind the wheel of a car it would result in thousands fewer deaths per year. And yet it feels like we’re going to delay the introduction of autonomous cars until they meet some perfect standard. To me at least, that seems morally wrong.

Fairness matters: Promoting pride and respect with AI

We’re creating an open dataset that collects diverse statements from the LGBTIQ+ community, such as “I’m gay and I’m proud to be out” or “I’m a fit, happy lesbian that has just retired from a wonderful career” to help reclaim positive identity labels. These statements from the LGBTIQ+ community and their supporters will be made available in an open dataset, which coders, developers and technologists all over the world can use to help teach machine learning models how the LGBTIQ+ community speak about ourselves.

Source: Fairness matters: Promoting pride and respect with AI

It’s easy to say that algorithms are biased, because they are. It’s much harder to ask why they’re biased. They’re biased because of many reasons but one of the biggest contributors is that we simply don’t have diverse and inclusive data sets to train them on. Human bias and prejudice is reflected in our online interactions; they way we speak to each other on social media, the things we write about on blogs, the videos we watch on YouTube, the stories we share and promote. Project respect is an attempt to increase the set of inclusive and diverse training data for better and less biased machine learning.

Algorithms are biased because human beings are biased, and the ways that those biases are reflected back to us may be why we find them so offensive. Maybe we don’t like machine bias because of what it says about us.