NHS England Data Science PhD Internships

Evaluating the Implementation of Automated Clinical Coding on Neurology Clinic Letters - working with LTH

Keywords: NLP, MedCAT, Text

Need: The majority of neurology care is conducted in outpatient settings without mandated diagnostic coding. This results in most of the clinical records being stored in unstructured data stores such as clinical letters, with inconsistent layouts, idiosyncratic abbreviations and redundant tokens which are extremely burdensome to effective code. There is a significant opportunity to reduce burden and increase the value of these data through automated clinical coding.

Our previous work in this area looked at the evaluation of pre-trained named entity recognition and linking (NER+L) models. This effectively groups identified concepts together and assigns SNOMED-CT classification. The Medical Concept Annotation Toolkit (MedCAT) was used within the LANDER (Lancashire Data Science Environment) Trusted Research Environment (TRE) in Azure to identify and link concepts. This was applied to 110,831 sentences from 3,000 neurology clinic letters (from a total of >6M clinic letters). As the data was unlabelled the work focussed on evaluating what a small range of NER+L models had learned and how they compared.

This project would aim to continue this investigation by exploring an implementation of MedCATtrainer for domain expert labelling of training data, evaluation of different embedding types, and further investigation into how to effectively evaluate these models. There is also an aim to scale up this pipeline in LANDER to act on the larger set of letters.

Current Knowledge/Examples & Possible Techniques/Approaches:

Related Previous Internship Projects: Enriching Neurology Information Using MedCat working with Lancashire Teaching Hopsitals (LTH)

Enables Future Work: Both the learning from applying MedCAT to clinical letters for neurology and the implementation within the LANDER environment feeds future projects.

Outcome/Learning Objectives: A closed implementation codebase built in LANDER. An open codebase with non-sensitive content focussed on demonstrating methods of evaluation. A published report stating the project learning.

Datasets: LANDER clinical letters dataset

Desired skill set: When applying please highlight experience with automated clinical coding Natural Language Processing, neurology, containerisation, and any other data science experience you feel relevant.

Return to list of all available projects.