Completed Intern Projects
Wave 6 - January to June 2024
-
P62 - NHS Monitor Corpus
Samuel Hollands
Scraping and curation of NHS website data to create a corpus of representive text data
Graph-based Web Scraping Text Data Corpus -
P61 - Understanding Fairness and Explainability in Multimodal Approaches within Healthcare
Sophie Martin
Pipeline to compare the impact on fairness of using a fusion model versus a single modality model
Fairness MultiModal Python
Wave 5 - June to November 2023
-
P53 - Transforming Healthcare Data with Graph-based Techniques using SAIL DataBank
Chris Tomlinson
This project explored topological and semantic graph embeddings of relevant healthcare ontologies to enhance performance on downstream tasks.
Representations of data Embeddings Python -
P52 - Exploring Process Mining with East Midlands Ambulance Service
Alex Coles
In collaboration with East Midlands Ambulance Service, this work explored using Process Mining techniques to better understand the processes within the service.
Process Mining Ambulances Python - PM4Py -
P51 - Investigating Privacy Concerns and Mitigations for Language Models in Healthcare
Vicky Smith
An initial exploration of privacy risks in healthcare language models, including privacy-preserving techniques applied before or after model training, and evaluating their effectiveness with privacy attacks.
NLP Foundation Models Python
Wave 4 - January to May 2023
-
P43 - Enriching Neurology Patient Information using MedCAT
Aizaan Anwar
In collaboration with Lancaster Teaching Hospitals NHS Foundation Trust and Lancaster University, this work explored how to evaluate the embedding space generated for automated clinical coding tasks in Neurology.
NLP Neurology Python - MedCAT -
P42 - Including Mortality in Hypergraphs for Multi-morbidity
Zoe Hancox
Building on previous hypergraphs work (P34) that can extract the impact of predecessor and successor diseases on disease progression pathways, this work looked to include an implicit relationship to demographics and consider the impact of mortality.
Representations of Data Hypergraphs Python - Numba -
P41 - NHS Synth
Harrison Wilde
This project focused on building a package for generating useful synthetic data, audited and assessed along the dimensions of utility, privacy and fairness. It gives the ability to experiment with different model architectures to find which are the most promising for real-world usage.
Synthetic Data VAE Python
Wave 3 - June to December 2022
-
P34 - Using Hypergraphs to Investigate the Impact of Comorbidities
Jamie Burke
In collaboration with Swansea University and the SAIL Databank, this work focused on the generation of hypergraphs for investigating the individual and joint impact of comorbidities on a patient pathway.
Representations of Data Hypergraphs Python - Numba -
P33 - Exploring Large-scale Language Models with NHS Incident Data
Niall Taylor
In collaboration with the NHS England patient safety data team, an investigation into how to produce a useful and valid representation space when training a language model for a healthcare task.
NLP DeCLUTR Python - Transformers -
P32 - Predicting the Impact of Health Inequalities - Diabetes
Stephen Richer
In collaboration with East Suffolk and North Essex foundation trust to apply a suite of data science techniques to a large population health data including both primary and secondary care data.
RAP Code PHM Data Python - OSNnx -
P31 - Txt-Ray Align Continued
Sarah Hickman
This project sought to identify the clinical application, pipeline and validation metrics for this work.
Multi-modal Validation in Healthcare Python
Wave 2 - January to May 2022
-
P24 - Using LIME to explain facial disease classification
Anwesha Mohanty
Application of Local Interoperable Model-agnostic Explanations to an InceptionV3 classifier looking at a Rosacea
Model Explainability LIME Python -
P23 - STM for survey data
Anna Linton
The development of an R code for investigating the topics found in free text survey data using a technique that monitors both the content of the responses but also the metadata.
RAP code Topic Modelling R -
P22 - Txt-Ray Align
Dekai Zhang
An investigation of extracting insight from multi-modal text and imaging data using contrastive learning.
Multi-modal Contrastive Learning Python -
P21 - SynthVAE Continued
David Brind
Building on SynthVAE - focused on non-Gaussian input data, hyperparameter tuning, improving the codebase and starting to consider how fairness in the created data can be assessed and implemented.
Synthetic Data VAE Python - PyTorch, Opacus
Wave 1 - April to September 2021
-
P14 - Model Class Reliance
Elizabeth Dolan
Investigating the use of MCR to identify the value of including commercial sales data in respiratory predictions
Model Explainability Commercial Data Python - mcrforest, SHAP -
P13 - NHS Text Data Exploration
Beth Rushton-Woods
Using a pre-defined toolset this project looked to understand how to ingest NHS.UK text data into a curated form.
NLP Weak Supervison Python - scispaCy -
P12 - SynthVAE
Dom Danks
Initial creation of a variational autoencoder with differential privacy for generating single table tabular gaussian data.
Synthetic Data VAE Python - PyTorch, Opacus -
P11 - SynPathDiabetes
Tiyi Morris
Exploration work into incorporating learning in to a pathway simulator for diabetes.
Simulation Patient Pathways Python