NHS England Data Science PhD Internships

Building Multimodal Patient Representations for Clinical Prediction

Keywords: Machine Learning, Single Patient Record, MultiModal 

Need: This project builds on that work by moving beyond fairness diagnosis and instead focusing on developing robust, task-relevant multimodal patient representations. Specifically, it will create patient-level embeddings across structured (EHR), unstructured (clinical notes), and image (e.g., X-ray or CT metadata) data, and assess their utility across a range of downstream use cases, including:

Previous NHS projects, such as mm-healthfair, have demonstrated that combining data across multiple modalities (e.g., structured data, free text, and imaging) can both improve performance and introduce new fairness concerns. Those projects highlighted how models can become unfairly biased depending on the modality fusion strategy and the richness of the non-tabular data sources.

By using consistent embedding frameworks, this project creates a foundation for future fairness auditing, interpretability analysis, and explainability research, enabling NHS use cases with traceable model reasoning.

Current Knowledge/Examples & Possible Techniques/Approaches:

Related Previous Internship Projects:

Enables Future Work:

Outcome/Learning Objectives:

Datasets: Accessible Datasets such as MIMIC-IV

Desired skill set: When applying please highlight any experience around healthcare data, multimodal embedding and learning, coding experience (including any coding in the open), any other data science experience you feel relevant.


Return to list of all available projects.