AI Blindspots are oversights in a team’s workflow that can generate harmful unintended consequences. They can arise from our unconscious biases or structural inequalities embedded in society. Blindspots can occur at any point before, during, or after the development of a model. The consequences of blindspots are challenging to foresee, but they tend to have adverse effects on historically marginalized communities.
This is a good resource to help teams work through the different ways in which their plans around the use of AI may introduce problems into the system. All of these are relevant for health contexts. There are 3 main areas that the resource covers:
There’s a lot going on in this article, some of which I agree with and some of which I think is not useful. For me, the takeaway is that AI-based systems in the workplace really have the potential to improve our interactions with each other, but that there will be powerful incentives to use them for surveillance of employees.
The article focuses on software that analyses the conversation between call centre agents and customers and provides on screen guidance to the agent on how to “improve” the quality of the interaction. Using natural language processing to provide real-time feedback to call centre workers is, in my opinion, more like coaching than having an AI as “your boss”. We’re all biased, forgetful, get tired, have bad days, etc. and I think that a system that helped me to get around those issues would be useful.
The article presents this as some kind of dystopia where our decision making (or performance, or behaviour) will be subject to algorithmic manipulation. There are two things to note here:
We’re already subject to algorithmic manipulation (see Twitter, Netflix, email marketing, etc.);
Sometimes I want my performance to be optimised. When I’m running I get constant feedback on my pace, heart rate, distance, etc. all of which give me a sense of whether or not I’m working in an optimal zone for improving my cardiac fitness. Then I choose whether or not to adjust my pace, based on that real-time feedback.
Having said that, there are other aspects of the programme that move us into a more problematic scenario, which is where your performance and behaviours (e.g. did you minimise the feedback and ignore it) are reported to a supervisor, which may or may not influence your continued employment. This feels more like surveillance than coaching, where employees are less likely to use the system to improve their performace, and more likely to figure out how to avoid the punishment. When the aim of the system is to improve the relationship or interaction with customers it’s easier to get behind. But when it moves into judgement, it becomes more difficult to support.
This brings me to another aspect of the story that’s problematic; when algorithms evaluate performance against a set of metrics that are undefined or invisible to the user (i.e. you don’t know what you’re being compared to) and then the algorithm makes a decision independently that has a real world consequence (e.g. you get fired because you’re “underperforming”). If supervisors regard the information from the system as representing some kind of ground truth and use it for their own decision making, it’s likely to have negative consequences. For example, when employees are ranked from “most productive” to “least productive” based on some set of criteria that were easy to optimise for but which may have limited validity, and this output is simpley accepted as “the truth”, then it will essentially be the system making the decision rather than the supervisor.
But framing the problem as if it’s the algorithms that are the issue – “automated systems can dehumanize and unfairly punish employees” – misses the point that it’s human beings who are actually acting with agency in the real world. Unless we’re able to help people figure out how to use the information provided by algorithms, and understand that they don’t represent ground truth, we’re going to see more and more examples of people being taken out of the loop, with damaging consequences.
There’s another aspect of the story that I found worrying and it’s about the relationship between training data and user behaviour. In the example of the AI system that gives the user feedback on the quality of the conversation with customer, the system uses different criteria to come up with an empathy score. When the agent scores low on empathy, the system suggests that they need to be more empathic. However, the way to do this is, apparently, to “mirror the customers mood”, which seems problematic for a few reasons:
If the customer is angry, should the agent reflect that anger back to them?
How do you determine the customer’s and agent’s moods?
Savvy employees will focus on getting higher empathy scores by using a check list to work through the variables that the AI uses to calculate the score. But as supervisor you don’t care about the empathy score, you care about satisfied customers. (See this earlier post about encouraging people to aim for higher scores on metrics, rather than the actual outcomes you care about).
Using AI to correct for human biases is a good thing. But as more AI enters the workplace, executives will have to resist the temptation to use it to tighten their grip on their workers and subject them to constant surveillance and analysis.
I’m reading the collection of responses to John Brockman’s 2015 Edge.org question: What to think about machines that think and wanted to share an idea highlighted by Peter Norvig in his short essay called “Design machines to deal with the world’s complexity”.
Pessimists warn that we don’t know how to safely and reliably build large, complex AI systems. They have a valid point. We also don’t know how to safely and reliably build large, complex non-AI systems. We need to do better at predicting, controlling, and mitigating the unintended consequences of the systems we build.
For example, we invented the internal combustion engine 150 years ago, and in many ways it has served humanity well, but it has also led to widespread pollution, political instability over access to oil, more than a million traffic deaths per year, and (some say) a deterioration in the social cohesiveness of neighborhoods.
Norvig, P. (2015). Design machines to deal with the world’s complexity. In, Brockman, J. What to think about machines that think.
There’s a lot of justified concern about how we’re going to use AI in society in general, and healthcare in particular, but I think it’s important to point out that it does us no good to blame algorithms as if they had any agency (I’m talking about narrow, or weak AI, rather than artificial general intelligence, which will almost certainly have agency).
It’s human beings who will make choices about how this technology is used and, as with previous decisions, it’s likely that those choices will have unintended consequences. The next time you read a headline decrying the dangers presented by AI, take a moment to reflect on the dangers presented by human beings.
You can see the entirety of Norvig’s contribution here (all of the Edge.org responses are public), although note that the book chapters have different titles to the original contributions.
We can start by agreeing that algorithms are biased. Unfortunately, this is where most people stop and I think it’s because it presents a conclusion to a narrative that sits well with them. The narrative goes something like this: “Human beings are special and there are some things that computers will never be able to replicate.” The “algorithms-are-biased” conclusion helps to support that narrative because it’s a reason for why we shouldn’t delegate responsibility to them.
The thing is, algorithms are biased because they reflect the bias inherent to the data they’re trained on. Briefly, machine learning algorithms are trained on massive data sets that are generated by human beings. Sometimes the data sets are collated and labeled by people, and sometimes they’re generated through our interactions in the world. Either way, our implicit biases are encoded within those data sets and it is these biases that are reflected in the outcomes generated by the algorithms.
What’s great is that the bias in AI-based systems is often explicit (i.e. we can see it), which means that we can act on it and improve the outputs. Contrast this with human beings who are often not even aware of the cognitive biases that nudge us towards predetermined outcomes. And if we can’t even recognise it in ourselves how are we possibly going to reduce it’s influence. Working in groups – especially diverse groups – means that there may be others who can hold us to account and help us to recognise our biases. But even working in groups is no guarantee that we’ll avoid the trap of following our preconceived notions of what we think ought to happen. And we often make decisions alone, which means it’s even harder to recognise. Even when we want to do the right thing we may not recognise when we’re not.
One reason to be optimistic about algorithmic bias is that it’s relatively easy to correct. Once we see the bias in an algorithm we can make changes to it at different points in the system, from being more careful about gathering representative samples of data on which to train the algorithms, to modifying the software itself. And once that algorithm is less biased then everything it touches is also less biased. Try doing that with even a handful of people.
Here’s the thing that few people seem willing to confront: algorithms are biased because we are biased. But it’s so much easier to say that we can’t trust machine learning because it’s biased than to acknowledge that machine learning is simpy making explicit the biases that we don’t want to see in ourselves. And this is one reason why finding biases in algorithms is a Good Thing. Because we can make the algorithm do better by holding it to a higher standard than we’re capable of doing to ourselves.
In February the New York Times hosted the New Work Summit, a conference that explored the opportunities and risks associated with the emergence of artificial intelligence across all aspects of society. Attendees worked in groups to compile a list of recommendations for building and deploying ethical artificial intelligence, the results of which are listed below.
Transparency: Companies should be transparent about the design, intention and use of their A.I. technology.
Disclosure: Companies should clearly disclose to users what data is being collected and how it is being used.
Privacy: Users should be able to easily opt out of data collection.
Diversity: A.I. technology should be developed by inherently diverse teams.
Bias: Companies should strive to avoid bias in A.I. by drawing on diverse data sets.
Trust: Organizations should have internal processes to self-regulate the misuse of A.I. Have a chief ethics officer, ethics board, etc.
Accountability: There should be a common set of standards by which companies are held accountable for the use and impact of their A.I. technology.
Collective governance: Companies should work together to self-regulate the industry.
Regulation: Companies should work with regulators to develop appropriate laws to govern the use of A.I.
“Complementarity”: Treat A.I. as tool for humans to use, not a replacement for human work.
The list of recommendations seems reasonable enough on the surface, although I wonder how practical they are given the business models of the companies most active in developing AI-based systems. As long as Google, Microsoft, Facebook, etc. are generating the bulk of their revenue from advertising that’s powered by the data we give them, they have little incentive to be transparent, to disclose, to be regulated, etc. If we opt our data out of the AI training pool, the AI is more susceptible to bias and less useful/accurate, so having more data is usually better for algorithm development. And having internal processes to build trust? That seems odd.
However, even though it’s easy to find issues with all of these recommendations it doesn’t mean that they’re not useful. The more of these kinds of conversations we have, the more likely it is that we’ll figure out a way to have AI that positively influences society.
After each round, participants filled out a questionnaire rating the robot’s competence, their own competence and the robot’s likability. The researchers found that as the robot performed better, people rated its competence higher, its likability lower and their own competence lower.
This is worth noting since it seems increasingly likely that we’ll soon be working, not only with more competent robots but also with more competent software. There are already concerns around how clinicians will respond to the recommendations of clinical decision-support systems, especially when those systems make suggestions that are at odds with the clinician’s intuition.
Paradoxically, the effect may be even worse with expert clinicians who may not always be able to explain their decision-making. Novices, who use more analytical frameworks (or even basic algorithms like, IF this, THEN that) may find it easier to modify their decisions because their reasoning is more “visible” (System 2). Experts, who rely more on subconscious pattern recognition (System 1), may be less able to identify where in their reasoning process they were victim to confounders like confirmation or availability bia, and so less likely to modify their decisions.
It seems really clear that we need to start thinking about how we’re going to prepare current and future clinicians for the arrival of intelligent agents in the clinical context. If we start disregarding the recommendations of clinical decision support systems, not because they produce errors in judgement but because we simply don’t like them, then there’s a strong case to be made that it is the human that we cannot trust.
Contrast this with automation bias, which is the tendency to give more credence to decisions made by machines because of a misplaced notion that algorithms are simply more trustworthy than people.
The key…is often to get more data from underrepresented groups. For example…an AI model was twice as likely to label women as low-income and men as high-income. By increasing the representation of women in the dataset by a factor of 10, the number of inaccurate results was reduced by 40 percent.
What many people don’t understand about algorithmic bias is that it’s corrected quite easily, relative to the challenge of correcting bias in human beings. If machine learning outputs are biased, we can change the algorithm, and we can change the datasets. What’s the plan for changing human bias?
With the advent of strong reinforcement learning…, goal-oriented strategic AI is now very much a reality. The difference is one of categories, not increments. While a supervised learning system relies upon the metrics fed to it by humans to come up with meaningful predictions and lacks all capacity for goal-oriented strategic thinking, reinforcement learning systems possess an open-ended utility function and can strategize continuously on how to fulfil it.
“…an open-ended utility function” means that the algorithm is given a goal state and then left to it’s own devices to figure out how best to optimise towards that goal. It does this by trying a solution and seeing if it got closer to the goal. Every step that moves the algorithm closer to the goal state is rewarded (typically by a token that the algorithm is conditioned to value). In other words, an RL algorithm takes actions to maximise reward. Consequently, it represents a fundamentally different approach to problem-solving than supervised learning, which requires human intervention to tell the algorithm whether or not it’s conclusions are valid.
In the video below, a Deepmind researcher uses AlphaGo and AlphaGo Zero to illustrate the difference between supervised and reinforcement learning.
This is both exciting and a bit unsettling. Exciting because it means that an AI-based system could iteratively solve for problems that we don’t yet know how to solve ourselves. This has implications for the really big, complex challenges we face, like climate change. On the other hand, we should probably start thinking very carefully about the goal states that we ask RL algorithms to optimise towards, especially since we’re not specifying up front what path the system should take to reach the goal, and we have no idea if the algorithm will take human values into consideration when making choices about achieving its goal. We may be at a point where the paperclip maximiser is no longer just a weird thought experiment.
Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans.
We may end up choosing goal states without specifying in advance what paths the algorithm should not take because they would be unaligned with human values. Like the problem that Mickey faces in the Sorcerer’s Apprentice, the unintended consequences of our choices with reinforcement learning may be truly significant.
The conversation about unconscious bias in artificial intelligence often focuses on algorithms that unintentionally cause disproportionate harm to entire swaths of society…But the problem could run much deeper than that. Society should be on guard for another twist: the possibility that nefarious actors could seek to attack artificial intelligence systems by deliberately introducing bias into them, smuggled inside the data that helps those systems learn.
I’m not sure how this might apply to clinical practice but, given our propensity for automation bias, it seems that this is the kind of thing that we need to be aware of. It’s not just that algorithms will make mistakes but that people may intentionally set them up to do so by introducing biased data into the training dataset. Instead of hacking into databases to steal data, we may start seeing database hacks that insert new data into them, with the intention of changing our behaviour.
What this suggests is that bias is a systemic challenge—one requiring holistic solutions. Proposed fixes to unintentional bias in artificial intelligence seek to advance workforce diversity, expand access to diversified training data, and build in algorithmic transparency (the ability to see how algorithms produce results).
Any high-quality speech-to-text engines require thousands of hours of voice data to train them, but publicly available voice data is very limited and the cost of commercial datasets is exorbitant. This prompted the question, how might we collect large quantities of voice data for Open Source machine learning?
One of the big problems with the development of AI is that few organisations have the large, inclusive, diverse datasets that are necessary to reduce the inherent bias in algorithmic training. Mozilla’s Common Voice project is an attempt to create a large, multilanguage dataset of human voices with which to train natural language AI.
This is why we built Common Voice. To tell the story of voice data and how it relates to the need for diversity and inclusivity in speech technology. To better enable this storytelling, we created a robot that users on our website would “teach” to understand human speech by speaking to it through reading sentences.
I think that voice and audio is probably going to be the next compter-user interface so this is an important project to support if we want to make sure that Google, Facebook, Baidu and Tencent don’t have a monopoly on natural language processing. I see this project existing on the same continuum as OpenAI, which aims to ensure that “…AGI’s benefits are as widely and evenly distributed as possible.” Whatever you think about the possibility of AGI arriving anytime soon, I think it’s a good thing that people are working to ensure that the benefits of AI aren’t mediated by a few gatekeepers whose primary function is to increase shareholder value.
Most of the data used by large companies isn’t available to the majority of people. We think that stifles innovation. So we’ve launched Common Voice, a project to help make voice recognition open and accessible to everyone. Now you can donate your voice to help us build an open-source voice database that anyone can use to make innovative apps for devices and the web. Read a sentence to help machines learn how real people speak. Check the work of other contributors to improve the quality. It’s that simple!
The datasets are openly licensed and available for anyone to download and use, alongside other open language datasets that Mozilla links to on the page. This is an important project that everyone should consider contributing to. The interface is intuitive and makes it very easy to either submit your own voice or to validate the recordings that other people have made. Why not give it a go?