Keywords: Synthetic, VAE, TabularData
Need: Creating high-fidelity realistic health data is not only complex but comes with multiple information governance considerations. A particularly promising technique for creating realistic synthetic data is the variational autoencoder (VAE). However, current attempts to use VAEs have struggled to put the models into practice as the confidence around appropriate usage and privacy of the ground truth data has not been sufficient. This project would seek to use a currently developed VAE to investigate and discuss its potential when implementing for healthcare data.
Current Knowledge/Examples & Possible Techniques/Approaches: Colleagues in the NHSD data science and innovation team have created a VAE to create realistic Health data.
Related Previous Internship Projects:
Enables Future Work: Depending on recommendation from this piece, further projects may seek to use or build upon the model.
Outcome/Learning Objectives: Application of model to open data resulting in publication of discussion around appropriate usage. Additionally, interested in coupling the current VAE with differential privacy.
Datasets: Open transactional data with rare values to simulate basic structure of health activity data
Desired skill set: When applying please highlight any experience around synthetic generation (especially variational autoencoders), differential privacy, coding experience (including any coding in the open), any other data science experience you feel relevant.
Return to list of all available projects.