Michael Rowe

Trying to get better at getting better

Brodeur, P.G. et al. (n.d.). Superhuman performance of a large language model on the reasoning tasks of a physician.

…We sought to evaluate OpenAI’s o1-preview model, a model developed to increase run-time via chain of thought processes prior to generating a response. We characterize the performance of o1-preview with five experiments including differential diagnosis generation, display of diagnostic reasoning, triage differential diagnosis, probabilistic reasoning, and management reasoning, adjudicated by physician experts with validated psychometrics… This study highlights o1-preview’s ability to perform strongly on tasks that require complex critical thinking such as diagnosis and management while its performance on probabilistic reasoning tasks was similar to past models.

This paper evaluates OpenAI’s o1-preview model’s performance on complex medical reasoning tasks compared to GPT-4 and human physicians. The study found that o1-preview demonstrated superhuman performance in differential diagnosis, diagnostic clinical reasoning, and management reasoning, outperforming both previous AI models and human physicians in several areas. However, it showed no significant improvements in probabilistic reasoning or triage differential diagnosis compared to GPT-4.


Share this


Discover more from Michael Rowe

Subscribe to get the latest posts to your email.