NHS England Data Science PhD Internships

Transforming Healthcare Data with Graph-based Techniques Using SAIL DataBank - Next Steps

Keywords: Graphs, Hypergraphs, Tabular

Need: The collections of linked health datasets into a single research environment creates a number of large opportunities for insight and analysis. Whilst anonymised extracts from these environments have great value, there is an even larger opportunity to develop methods of deploying modelling to the data remotely. The SAIL Databank - The Secure Anonymised Information Linkage Databank provides anonymised person-based data for research powered by the Secure e-Research Platform.

Recent work using graph models to build simpler knowledge discovery systems opens up potential for increased prediction accuracies, reduced pre-processing burden and the application of models of higher complexity to our data. These models have been shown to effectively handle messy data and to learn representation of key factors from the data directly (rather than choosing a set of predictor variables).

The recent paper entitled Ranking Sets of Morbidities using Hypergraph Centrality and accompanying repository demonstrates applying a hypergraph analysis to a multi-morbidity investigation. Where possible, this project would seek to continue to develop upon this work looking at including further complexity through adding demographics, directionality into the graphs, or time evolution.

The project as a whole looks to show the potential of these additional data representations for downstream prediction and analytics tasks, or to explore how these approaches could be best used to enrich structures when representing patient pathways.

Current Knowledge/Examples & Possible Techniques/Approaches:

Related Previous Internship Projects: Outputs from the first stage of this project will be available in late 2022, which explores using hypergraphs for improved understanding of multi-morbidity, exploring how to include demographic information, and ordered disease progression via the inclusion of directionality in the hypergraphs

Enables Future Work: Any open demonstration of analysis of large linked datasets is of use to the NHS but more so the application of remote modelling. Further, this work would fit within a longer term research collaboration between SAIL and NHS England.

Outcome/Learning Objectives: Extension of the work performed as part of the first internship - for instance, adding a single demographic variable to each technique and show the potential of these additional data.

Datasets: SAIL Databank Datasets (Core-set)

Desired skill set: When applying please highlight any experience around graph and hypergraph representations, GNNs, network analysis, coding experience (including any coding in the open), and any other data science experience you feel relevant.


Return to list of all available projects.