https://ai.nejm.org/doi/full/10.1056/AIp2300031
“We assessed the performance of the newly released AI GPT-4 in diagnosing complex medical case challenges and compared the success rate to that of medical-journal readers. GPT-4 correctly diagnosed 57% of cases, outperforming 99.98% of simulated human readers generated from online answers. We highlight the potential for AI to be a powerful supportive tool for diagnosis; however, further improvements, validation, and addressing of ethical considerations are needed before clinical implementation.”
Just a reminder that this isn’t a model that’s been fine-tuned on medical databases (like MedPaLM, for example); this is the same GPT-4 than anyone can get access to with a free Microsoft account.
Also, a reminder that this technology didn’t exist a year ago.
Even if we get no further down the path of developing these tools, and this represents the best it’s ever going to get, the future looks interesting.
Caveat: the researchers are comparing diagnostic outcomes with medical journal readers, not with experts. It doesn’t change the ‘correct diagnosis’; GPT-4 still got 57% of the diagnoses correct. It’s the ‘outperforming’ part that we needn’t pay too much attention to.