NHS England Data Science PhD Internships

Evaluating NER-focussed models and LLMs for identifying key entities in histopathology reports – working with GOSH DRIVE

Keywords: NLP, NER, Text

Need: Information retrieval and knowledge extraction from corpora of domain-specific documents is a challenging task. Recently Large Language Models (LLMs) have become popular for their ability to demonstrate state-of-the-art performance and in-context learning. However, despite their success, recent work has shown that performance of LLMs on Named Entity Recognition (NER) tasks is often below supervised baselines (see for instance Wang et al, 2023).

This project will study existing NER models that are either trained in a supervised manner, or using a weakly-supervised approach, and large language models, for identifying information that can be considered as “entities” i.e. to perform entity detection.

The project will also analyse the potential benefits of such models in identifying entities from histopathology reports and any limitations in identifying any domain-specific, emergent, or granular entity information.

Finally, it will also explore how these different methods may be combined to create better overall systems, how these could be combined with human annotation cycles, as well as how best to evaluate these types of hybrid approaches.

Current Knowledge/Examples & Possible Techniques/Approaches:

Related Previous Internship Projects: Enriching Neurology Patient Information using MedCAT

Enables Future Work: Working with Great Ormond Street Hospital’s Data Research, Innovation and Virtual Environments (GOSH DRIVE) to build out knowledge of suitable techniques to enrich their datasets for downstream tasks as well as providing a benchmark for emerging tools such as LLMs for a common NLP task in a domain-specific setting.

Outcome/Learning Objectives: Understanding whether these existing tools can detect information that are “entities” and results and analysis from this will help us to focus on:

Datasets:

Desired skill set: When applying please highlight any experience around work with natural language processing, large language models, evaluation frameworks, python coding experience and software development (including any coding in the open), and any other data science experience you feel relevant.


Return to list of all available projects.