Heaven, W.D. (2021). Hundreds of AI tools have been built to catch covid. None of them helped. MIT Technology Review.
In the end, many hundreds of predictive tools were developed. None of them made a real difference, and some were potentially harmful. That’s the damning conclusion of multiple studies published in the last few months. In June, the Turing Institute, the UK’s national center for data science and AI, put out a report summing up discussions at a series of workshops it held in late 2020. The clear consensus was that AI tools had made little, if any, impact in the fight against covid.
The article links to a few good overview studies and reports that provide more detail about the methodological flaws in the studies referred to in the title. There’s also this excellent thread by Cory Doctorow on the garbage in, garbage out problem that we find with many ML studies in general.
First of all, it’s obviously great news that we’re identifying the areas that ML falls short of expectations. We cannot be in situations where all claims are simply accepted because they align with our hopes and beliefs. This is why we publish our methods; we want others to find the mistakes in our work. This is what progress is.
But I’ll also add that we don’t need AI to make false claims based on poor evidence; there’s good evidence that most of the research we publish isn’t worth paying attention to anyway. See Ioannidis, J.P.A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8), 6, for example. So there’s nothing special about using ML or any other technique to publish poor research; people have been doing that for a lot longer than we’ve been using machine learning.
And finally, it’s worth pointing out that all of the studies referenced in the MIT Technology article would have been conducted in the very earliest stages of the pandemic, with researchers trying to accelerate progress in an effort to limit the effects of a global virus outbreak. Journals relaxed publication criteria in efforts to share information as soon as possible, and the incentive structure around early publication of potentially groundbreaking research doesn’t exactly encourage slow and considered reflection. I’m not arguing that any of this is the way it should be, only that the problem is more complicated than simply highlighting the flaws in early papers. The response to Covid included a massive spike in related publications, and many of them would’ve used data that had been gathered quickly, was poorly analysed, and published in a hurry. No-one is doing the analyses to show how little those articles have contributed to solving the problem.
From the excerpt above: “None of them made a real difference, and some were potentially harmful.” This is true. But we can say the same thing about medical practice for almost all of human history. The methods of medicine before the 20th century caused an enormous amount of pain and suffering. And yet we still have doctors. They just had to up their game.
We need more research taking critical positions against the development of clinical interventions, whether or not those interventions include AI. That’s just good science. But we also need to make sure that we don’t demonise a technology that’s being implemented poorly by people, in the same way that we don’t demonise cars when they’re involved in pileups. I feel like this is a point that I’m going to keep having to make; there’s a ridiculous double-standard that exists when we evaluate the performance of AI while ignoring all the ways that people stuff things up.
These are early days for clinical and health AI and there’s a lot that doesn’t work. Scientific progress is nothing if not a collection of stories about we’ve failed. But every now and again, something works. And it only works because we’ve spent ages figuring out all the ways it doesn’t.