Machine learning (ML) – and more recently deep learning (DL) – are transforming the field of natural language processing (NLP) in clinical data management, allowing more and more complex use cases to be tackled.
One such example is medical coding, or mapping reported medical terms such as adverse events and medications to the corresponding entry of a standard dictionary, a workflow that typically involves the use of a rules-based automated coding tool followed by two lines of manual review by human coders (see Figure 1). The auto-coding tool receives each input verbatim term and, based on predefined rules, searches for a match in the dictionary. If a match is not found, the input term is sent to a first-line medical coder who looks through the dictionary to suggest the correct matching entry. This entry is then reviewed by a second medical coder for final approval or rejection and re-coding. With simple rules, auto-coding tools are typically able to successfully code 50-60% of the input terms. These numbers can be increased through the addition of a synonym library to capture the various valid ways that a given medical term is expressed. However, this library typically takes years to build and requires significant ongoing effort to be maintained.

This resource-intensive process can be significantly streamlined with the help of ML. For example, using an ML solution to perform the first-line coding review. With this model, a rules-based coding tool, which does not rely on a lengthy to develop and arduous to maintain synonym list, runs to tackle the coding of easier terms. Remaining verbatim terms that cannot be coded can be sent to an ML model that finds the best match in the dictionary. The ML coding recommendation would then be reviewed by the second-line medical coder, who will accept or reject the suggestion. Overall, this solution removes the need for a first-line medical coding review, and for building and maintaining a synonym list.
For this ML model to yield accurate suggestions, it needs to read the input term and the dictionary terms, understand their meaning, and retrieve the proper entry. As such, it needs to be able to process and understand natural language, making medical coding a perfect NLP use case.
ML and DL are particularly well suited to this task, but what do we mean when we talk about leveraging ML and DL for NLP, and what outcomes can we expect to see? (see Figure 2)
Techniques for Natural Language Processing
There are different existing techniques available to tackle an NLP use case. Among them, three main groups can be highlighted:
- Symbolic methods, for example, rule-based parsing or approximate string matching.
- ML, for example, supports vector machines or logistic regression.
- DL, for example, recurrent neural network or large language model.
While sharing the same goal of processing text, each of these techniques has different levels of complexity and requires different efforts. For example, symbolic methods require deep investigations of the use case to be able to build complex logic that captures all possible scenarios. ML – and even more so DL – techniques do not require every single scenario to be encoded. Instead, they learn them through data.
While DL is a subset of ML, it is helpful to separate the two as they have different capabilities and limitations. ML typically requires heavier data preparation in which domain knowledge is used to incorporate assumptions that will help the model learn the task. For example, text can be cleansed to remove useless parts. With DL, the idea is to send everything with minimal data preparation and let the model learn what is useful. For this, the model needs to be more complex and, thus, the amount of training data needed is greater.
When describing an NLP use case, it is interesting to specify which technique is involved as each of them has pros and cons that impact the way the solution will be used. Some core differences in the techniques include:
- Performance.
- The kind of mistakes made.
- Interpretability, for example, large language models have difficulties providing interpretability while symbolic methods provide it easily.
- Maintenance, for example, rule-based algorithms typically need much more manual updating.
Leveraging Deep Learning for Medical Coding
When implementing a solution for a new use case, its specific challenges should guide the choice of the technology. Medical coding brings specific challenges including:
- Complex vocabulary – medical terminology, drug names.
- Frequent new words – COVID-19, new treatments.
- Many possible outputs – dictionary entries.
Thanks to its capabilities, DL can address these challenges. Firstly, as explained above, the idea when leveraging DL is to send the input term with minimal data preparation and rely on the flexibility of the model to adapt to any scenario. This allows, for example, the model to learn any relationship across words within the input term and not rely on assumptions that might be wrong for specific cases. This also removes the need to implement organization-specific scenarios. However, as this flexibility is obtained by increasing the complexity of the model, it comes with the cost of requiring enough training data, typically hundreds of thousands of samples in the case of medical coding.
Secondly, the DL model can automatically learn new words when it encounters them. This means, that when training the model with data containing an unknown word, the model automatically remembers it and can properly handle it afterward. As there is no manual intervention for this step, it increases the ability of the model to adapt to the change in medical coding vocabulary.
DL can also leverage the semantics. This is done using embeddings that encode the meaning of words, i.e., vectors of real numbers. For example, ‘ache’ and ‘pain’ have very similar vectors which allows the model to understand that their meanings are close to each other. This helps the model properly select the right dictionary entry from many choices and deal with high variability from the input terms in expressing the same concept.
Thanks to this technique, and others like transfer learning, it is also possible to include a priori medical knowledge – knowledge the DL model has before being trained for the task of medical coding. This means we can leverage external sources of information for the model to know in advance medical concepts and terms. In addition to making the model globally better for the task of medical coding, it also allows it to handle terms never seen in training data.
The ability to handle unseen dictionary entries means the solution is able to handle different dictionary versions without retraining. The only required change is to point the model to the new version of the dictionary.
Instead of outputting only one dictionary entry, the solution can also suggest several entries to review together with a confidence score. This can be used to bring attention to terms that have a low confidence score and may even result in them being handled differently than those with a high score.
DL solutions can achieve accuracies higher than 90% in both adverse events and medications. For example, when seeing the input term ‘Probable COVID-19 infection’, some models can properly code it into ‘Suspected COVID-19’. They can ignore the less meaningful ‘infection’ to focus on ‘COVID-19’ as well as understand that ‘probable’ is similar to ‘suspected’ in this situation. Such models have also been able to code the input term ‘Honeydew melon allergy’ to the dictionary term ‘Fruit allergy’, as they understand that melon is a fruit.
These results demonstrate that DL is suitable for medical coding. It performs well on both adverse events and medications, allowing teams to use the same solution for both applications, by training on different data sets.
Next Steps
The next steps for ML and DL in medical coding are query detection and direct coding of high-confidence terms.
Currently, auto-coded input terms that should have a query are not found until there is a quality control process. ML and DL can be used to support the creation of queries within the medical coding process itself. A simple example is when an input term is found in the wrong dataset, such as when the surgical operation ‘Wisdom tooth removal’ is found in the AE dataset.
DL also offers the opportunity for a more efficient final coding review due to higher accuracy direct coding of some terms and queries being raised automatically. This requires achieving high confidence in the model output, leveraging the confidence score described above. The idea is to establish a confidence score threshold above which terms can be directly coded and approved without manual review. Lower[1]confidence terms would still require human coding.
Automated query detection and direct coding, combined with removing the need for auto-coder tools, first-line coding review, and more efficient processes for dictionary updates, make a compelling case for the role of DL and ML in the future of medical coding. The use case reinforces that, when leveraged properly, DL can bring significant value to clinical data management.
Author Details
Nicolas Huet, Machine Learning Manager- CluePoints
Nicolas Huet, CluePoints Machine Learning Manager, discusses developments in the use of machine learning and deep learning for natural language processing and how they can be pragmatically implemented in clinical data management.
Publication Details
This article appeared in Pharmaceutical Outsourcing: Vol. 25, No. 2Apr/May/Jun 2024Pages: 14-16