NHS England Data Science PhD Internships

Enriching Neurology Patient Information using MedCAT

Keywords: NLP, CogStack, Text

Need: Clinical Letters contain a wealth of information that may not be available elsewhere within structured data fields, but are challenging to systematically extract information and structure from, to enable tasks such as filtering and linkage to reference data, especially at scale.

This project looks to investigate a dataset of over 6 million clinical letters stored alongside a CogStack pipeline (including a MedCAT component and negspaCy) within the LANDER (Lancashire Data Science Environment) Trusted Research Environment (TRE) in Azure. This environment was designed to enable innovative projects to be conducted on large linked secure data sources across multiple healthcare data collections.

The project would aim to focus on a neurology specification (in particular epilepsy) and use the capabilities of MedCAT to identify an appropriate mapping to SNOMED-CT of the information held solely within the unstructured letters, and thus enrich the available structured data.

Further the project would give a chance to look at different workflows for evaluating, augmenting, and improving the underlying extraction approaches, via incorporation of additional techniques or components within the established pipeline, whilst working closely with clinicians to iterate towards a suitably balanced solution.

Current Knowledge/Examples & Possible Techniques/Approaches:

Related Previous Internship Projects: N/A

Enables Future Work: Both the learning from applying MedCAT to clinical letters for neurology and the implementation within the LANDER environment feeds future projects.

Outcome/Learning Objectives:

Datasets: LANDER clinical letters dataset

Desired skill set: This project will require a technical proficiency in python and focusses on the “translational aspects” of NLP/CogStack with opportunities to refine the underlying models for neurology. When applying please highlight experience with Natural Language Processing frameworks, Cloud Engineering (including message queues and APIs), orchestration and containerisation, and any other data science experience you feel relevant.


Return to list of all available projects.