NHS England Data Science PhD Internships

Mechanistic Interpretability for AI Systems in Healthcare

Keywords: Explainability, Circuits, Multi-modal

Need: AI models in healthcare, including those trained on tabular data, clinical notes, or multi-modal records, often perform well, but their internal mechanisms remain opaque. Traditional explainability tools (like SHAP, LIME, or saliency maps) offer high-level justifications, but don’t provide visibility into the internal logic or failures of a model. This limits trust, reduces safety, and impairs regulatory and ethical assurance.

Mechanistic Interpretability (MI) offers a more ambitious approach: probing internals of models to uncover how decisions are made, at the level of neurons, features, and circuits. This project proposes exploring MI methods on a well-defined clinical prediction task, using small-to-medium scale models (e.g., a small LLM trained on synthetic or open data), to examine how internal representations correlate with known clinical factors (e.g. lab results, symptoms, or diagnosis codes).

This will:

Current Knowledge/Examples & Possible Techniques/Approaches:

Related Previous Internship Projects:

Enables Future Work:

Outcome/Learning Objectives:

Datasets: Open datasets like MIMIC-III, or MIMIC-IV

Desired skill set: When applying please highlight any experience around neural networks, explainability, interest in deep learning, python coding experience (including any coding in the open), any other data science experience you feel relevant.


Return to list of all available projects.