Separating the Art of Medicine from Artificial Intelligence

Writing a radiology report is an extreme form of data compression — you are converting around 2 megabytes of data into a few bytes, in effect performing lossy compression with a huge compressive ratio.

Source: Separating the Art of Medicine from Artificial Intelligence

For me, there were a few useful takeaways from this article. The first is that data analysis and interpretation is a data compression problem.  The trick is to find a balance between throwing out information that isn’t useful and maintaining the relevant message during the processing. Consider the patient interview, where you take 15-20 minutes of audio data (about  10-15 MB using mp3 compression) and convert it to about a page of text (a few kilobytes at most). The subjective decisions we make about what information to discard and what to highlight have a real impact on our final conclusions and management plans.

Human radiologists are so bad interpreting chest X-rays and/or agreeing what findings they can see, that the ‘report’ that comes with the digital image is often either entirely wrong, partially wrong, or omits information.

This is not just a problem in radiology. I haven’t looked for any evidence of this but from personal experience I have little doubt that the inter and intra-rater reliability of physiotherapy assessment is similarly low. And even in cases where the diagnosis and interventions are the same, there would likely be a lot of variation in the description and formulation of the report. And this links to the last thing that I found thought-provoking:

…chest X-ray reports were never intended to be used for the development of radiology artificial intelligence. They were only ever supposed to be an opinion, an interpretation, a creative educated guess…A chest X-ray is neither the final diagnostic test nor the first, it is just one part of a suite of diagnostic steps in order to get to a clinical end-point.

We’re using unstructured medical data captured in a variety of contexts, to train AI-based systems but the data were never obtained, captured or stored in a system that was designed for that purpose. The implication is that the data we’re using to train medical AI simply isn’t fit for purpose. As long as we don’t collect the metadata (i.e. the contextual information “around” a condition), and continue using poorly labeled information and non-standardised language, we’re going to have problems with training machine learning algorithms. If we want AI-based systems to be anything more than basic triage then these are important problems to address.