NHS England Data Science PhD Internships

NHS Semantic Search

Keywords: NLP, SemanticSearch, MultiModalData

Need: Often in healthcare data and information, due to disparate coding practices, multiple data collection systems, and the complexity of accurately coding healthcare, it can be difficult to be sure that all relevant records have been returned for a specific query. Recently, semantic search techniques have been used to greatly enhance query broad ranging data sources. Semantic search extends a basic search to consider intent and contextual meaning of the user. This project would seek to apply a specific application relating to technology innovation in healthcare to create a clear, timely and complete picture of innovation whilst demonstrating the technique.

Current Knowledge/Examples & Possible Techniques/Approaches: Recently large scale open source language models (e.g. GPT-2/BioBERT/GPT-Neo) have provided us with a base to perform various NLP tasks due to training on large unstructured datasets, alongside other more established open source approaches already available. These have been further integrated into various natural language interfaces for tasks such as Question Answering and Semantic Document Search (e.g. Haystack). In the medical space tools like CogStack, which is an information retrieval and extraction platform developed by researchers at the NIHR Maudsley Biomedical Research Centre, allow the implementation of best-of-breed enterprise search. This has been used to unlock the unstructured information in health records, and can be utilised in search across other forms of medical text data.

Related Previous Internship Projects: n/a as first year of the scheme.

Enables Future Work: The specific application would be useful itself for collaborating around innovation for healthcare but the main purpose of this work is to enable the application of the technique to more restricted medical documents.

Outcome/Learning Objectives: The main aim of this work is to demonstrate how the NHS can interact and use available tools for this sort of analysis. The specific outcome would be a demonstration of an online search based on a broad question for innovation in healthcare.

Datasets: Publicly available data as application is to search promotional materials across the web for projects relating to healthcare innovation.

Desired skill set: When applying please highlight any experience around work with text data and specifically medical text data, natural language processing, searching/trawling, coding experience (including any coding in the open), and any other data science experience you feel relevant.

Return to list of all available projects.