Keywords: Synthetic, Adversarial, Tabular
The privacy of data in the NHS is a key issue. Whilst synthetic data generation can overcome some privacy concerns, there still aren’t robust metrics that can demonstrate a high degree of confidence that a synthetic dataset is indeed private. An alternative approach to proving the privacy of a dataset is to show that a range of common attacks yield little success. The Synthetic Adversarial Suite was developed as a way of creating membership inference attacks (among others) to demonstrate if a generated dataset was leaking privacy. The initial development of this suite was successful but only includes two attack avenues. This project would seek to extend this suite by investigating and implementing additional attacks that could be used on a range of synthetic datasets.
Current Knowledge/Examples & Possible Techniques/Approaches: The current codebase is closed for security but would be accessible once the intern is onboarded. A brief description of this codebase can be found here.
See the below for background reading into the area:
Related Previous Internship Projects: Not a previous intern project but the Synthetic Adversarial Suite was commissioned and development in early 2022 and currently contained in a closed repoistory.
Enables Future Work: Aim is to use the suite as both part of our synthetic data generation pipeline and as a tool for one-off assessments of privacy leakage from datasets.
Outcome/Learning Objectives: Additional attack algorithms added to the suite with clear narrative of the attack avenues and threat models.
Datasets: CTDC or similar published synthetic dataset with known model architecture.
Desired skill set: When applying please highlight any experience around privacy techniques and adversarial machine learning (incl. membership inference, and model inversion), code development, python coding experience (including any coding in the open), any other data science experience you feel relevant.
Return to list of all available projects.