NHS England Data Science PhD Internships

Graph-based representations and techniques for Healthcare Data

Keywords: Graphs, Longitudinal, Tabular

Need: The complexity and depth of information captured in Electronic Health Records (EHR) requires a variety of methods to analyse. Graph representations are one such method which can help gain insight from EHR and other healthcare data sources. Using graph representations and graph neural network (GNN) approaches to better explore the data, has exciting potential, but is yet to be embedded as a widely used approach. Other modern approaches for encoding sequential information would provide a solid benchmark e.g. language-model derived embeddings, etc. to help show the power of such graph-based techniques.

This work also looks to make use of the SAIL (Secure Anonymised Information Linkage) DataBank, which provides anonymised person-based data for research powered by the Secure e-Research Platform. Such collections of linked health datasets collated into a single research environment creates several large opportunities for insight and analysis. Whilst anonymised extracts from these environments have excellent value, there is an even larger opportunity to develop methods of deploying modelling to the data remotely.

Recent work using graph models to build simpler knowledge discovery systems opens potential for increased prediction accuracies, reduced pre-processing burden and the application of models of the higher complexity of our data. These models have been shown to effectively handle messy data and to learn representation of key factors from the data directly whilst storing the information in a format that is often more interpretable and can relate to our understanding of the world.

In this internship, we would look to explore how such techniques can be applied in diverse ways such as:

Please note that the scope of the work is not at all limited to these areas and will depend on early discussions off the back of some initial reading and your thoughts on other interesting directions.

This work fits within a longer-term research collaboration between SAIL and NHS England. The longer-term goals of the project would seek to explore and extend (hyper-)GNN approaches.

Current Knowledge/Examples & Possible Techniques/Approaches:

Related Previous Internship Projects:

  1. Outputs from the first internship on this project will be available in soon in the form of a preprint and codebase that which explores using hypergraphs for improved understanding of multi-morbidity, exploring how to include demographic information, and ordered disease progression via the inclusion of directionality in the hypergraphs
  2. An app explaining the approaches was developed as part of the second internship can be found at Streamlit: Hypergraphs for Multimorbidity
  3. Outputs from the third internship will be made available on completion of the project

Enables Future Work: Any open demonstration of analysis of large-linked datasets is of use to the NHS but more so the application of these techniques with data in SAIL

Outcome/Learning Objectives:

Datasets: SAIL Data Bank Datasets

Desired skill set: When applying please highlight any experience around graph representations, hypergraphs, graph machine learning or neural networks, coding experience (including any coding in the open), and any other data science experience you feel relevant.

Return to list of all available projects.