Comment: Lessons learned building natural language processing systems in health care

Many people make the mistake of assuming that clinical notes are written in English. That happens because that’s how doctors will answer if you ask them what language they use.

Talby, D. (2019). Lessons learned building natural language processing systems in health care. O’Reilly.

This is an interesting post making the point that medical language – especially when written in clinical notes – is not the same as other, more typical, human languages. This is important to recognise in the context of training natural language processing (NLP) models in the healthcare context because medical languages have different vocabularies, grammatical structure, and semantics. Trying to get an NLP system to “understand”* medical language is a fundamentally different problem to understanding other languages.

The lessons from this article are slightly technical (although not difficult to follow) and do a good job highlighting why NLP in health systems is seeing slower progress than the NLP running on your phone. You may think that, since Google Translate does quite well translating between English and Spanish, for example, it should also be able to translate between English and “Radiography”. This article explains why that problem is not only harder than “normal” translation, but also different.

* Note: I’m saying “understand” while recognising that current NLP systems understand nothing. They’re statistically modelling the likelihood that certain words follow certain other words and have no concept of what those words mean.