NHS England Data Science PhD Internships

Extending NHSSynth into Multi-table, Multi-modal, and Longitudinal Data

Keywords: Synthetic, VAE, Tabular

Need: Over the course of three internship projects, we have developed NHSSynth, a Variational AutoEncoder (VAE) with differential privacy built into a modular pipeline. It allows tabular, single table, synthetic data to be generated alongside an evaluation metric suite, a fairness toolset, and an adversarial attack suite.

This project would investigate expanding this tool to be able to generate multi-table, longitudinal, or multi-modal data using recent advances in the field.

Current Knowledge/Examples & Possible Techniques/Approaches:

In terms of:

Related Previous Internship Projects: The first two projects on this can be seen in SynthVAE with the most recent work in NHSSynth

Enables Future Work: Allows NHS England to be generating a wider range of synthetic data for internal and external use

Outcome/Learning Objectives: Extension of the toolset into a new functional area.

Datasets:
MIMIC III is our standard for this work

Desired skill set: When applying please highlight any experience around work with synthetic data, variational autoencoders, other generative techniques, python coding experience and software development (including any coding in the open), and any other data science experience you feel relevant.


Return to list of all available projects.