Michael Rowe

Trying to get better at getting better

Article: Towards Expert-Level Medical Question Answering with Large Language Models

Singhal, K., et al. (2023). Towards Expert-Level Medical Question Answering with Large Language Models (arXiv:2305.09617). arXiv.

From the abstract: We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form “adversarial” questions to probe LLM limitations. While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.

See also this brief overview from The Verge.

Here’s another one: Beam, Kristyn, et al. “Performance of a Large Language Model on Practice Questions for the Neonatal Board Examination.” JAMA Pediatrics, July 2023.


Share this


Discover more from Michael Rowe

Subscribe to get the latest posts to your email.