Important: Disclaimer
This site is no longer active
Since August 2023 we have moved our content to a central Signpost Site
This is not the official site but a store of technical documents and ongoing work. Opinions expressed in posts are not representative of the views of NHS England and any content here should not be regarded as official output in any form. For more information about NHS England please visit our official website
Data Science Internships - Student Experiences
Date: June 2022
Post author: David Brind
Git repository: SynthVAE - Development
My name is Dave and I am a PhD student in healthcare data science at The University of Birmingham. I am affiliated with HDRUK as part of the second year cohort for their PhD programme. My focus is surrounding machine learning applications to cardiac datasets such as ultrasounds and routine EHR data. At the beginning of 2022 I was fortunate enough to undertake an internship within the Innovation branch of NHSX analytics unit. I was supervised by Dr Jonathan Pearson. My project was aimed at developing on previous work performed by Dom Danks surrounding variational autoencoders for synthetic data generation.
Briefly explaining our project, we were interested initially in improving the fidelity and privacy of the synthetic data that we generated. We desired a method to create data that retained important trends from the original set, however was privatised in such a way that information about patients in the original set could not be identified. Due to the flexible nature and independence provided by Jonny, he left me to take the project in my own unique direction. I introduced the idea of “de-biasing” datasets to SynthVAE. Bias and fairness is a key issue surrounding healthcare data and as synthetic data providers, we have an opportunity to break this trend by trying to mitigate any potential bias in the original training dataset. This project linked vaguely to my PhD in that, the methods used (deep generative models) were like the ones I have used in my research. However the modality was vastly different as most of my work has focused on imaging thus far. Privacy and fairness are also topics that have been of interest to me but unfortunately, I have not had opportunity to explore these. This internship gave me the perfect opportunity to simultaneously advance my knowledge in key PhD related areas, build knowledge in areas of interest that I may not otherwise get to experience and finally, shape the project to take it in the direction that I thought would provide the biggest benefit.
My main motive for getting involved with the scheme was to experience the difference between working in a research environment compared to a more business style environment. I found this scheme was hugely beneficial to this and I would thoroughly recommend it to other PhD students. This internship will provide you with transferable skills both outside of your PhD, but also help build experience in applying the knowledge gained through your PhD to other topic areas. Make use of all the opportunities they provide through team meetings and engaging with the whole analytics unit to get the best experience.
Good luck with your application!
Date: April 2022
Post author: Anna Linton
Git repository: Structural Topic Modelling for NHS Survey Data
My name is Anna. I am a PhD student in UKRI CDT in AI for Medical Diagnosis and Care at the University of Leeds. I did the PhD data science internship with NHSX in the spring of 2022. I worked on the project Structural Topic Modelling to Analyse NHS Survey Text Data. My project looked to evaluate the use of structural topic modelling and other machine learning methods to gain insight readily from free text responses from NHS surveys.
I was fortunate that the internship connected nicely with my PhD project, which looks at free text responses generated in PROM surveys. As such, this internship was a perfect chance to use my knowledge of the area to develop the project but also expand my understanding by applying it to a broader field and experience how a similar problem is tackled by others using different methods. Structural topic modelling was a new technique for me. I enjoyed exploring the potential of this method and am intrigued to see how another intern will further it.
The internship was an enjoyable challenge. You have a lot of freedom for innovation and to try out your ideas. You can take the project in a direction you would like to explore. This is really encouraged. Even during the application and interviews, feel free to share your ideas for the direction you would like to take the project in the application and interview. Throughout the internship, as you take on the project you are well very supported by very knowledgeable and friendly leads.
This internship was about more than just data analysis. It included additional skills of communication, regularly with your line managers, with stakeholders, often not with the same technical background as you, as well as with the wider team. It was also a great chance to talk to and learn from other members of the team, both technical and nontechnical skills and knowledge.
Good luck with your application!
Date: September 2021
Post author: Tiyi Morris
Git repository: SynPath - Diabetes
My name is Tiyi and I’m a PhD student in Health Economics at NIHR North Thames ARC at UCL. This summer I worked on a simulation model project as part of my internship at NHSX. Here are a quick couple of thoughts and tips for others applying to the scheme.
I gained a lot from the internship. It was a fantastic opportunity to take a project idea and really run with it, with lots of support. I met lots of key people working in NHSX Innovation and Digital Transformation and the NHS England Diabetes team. I learned about key priorities and policy goals for the NHS now, which will be helpful for my PhD. I had a lot of freedom to say what the important components to include in the model were from my perspective, and to collaborate with a team inside and outside of the NHS. I also worked with the teams at Faculty and Hash (specialists in Data Science outside the NHS) to develop the methods used in the project and the plan for building the intelligence layer in the future.
The project was initially about longitudinal synthetic data, and I combined this theme with the case of digital health for type 2 diabetes that I’m looking at in my PhD project. I was really excited by the potential for using synthetic data in large simulations and the policy relevance of looking at this case. It’s important to say that there’s lots of flexibility in the projects. If you have a great idea that links to the themes of the project, don’t be afraid to put your own spin on it.
Finally, you should make sure that you think about the way your PhD work and experience have shown you have transferable skills. Sometimes in quantitative disciplines we use different words to explain similar techniques in another field. Make sure you can communicate that the skills you have would make you a great intern.
Good luck with your application!
Date: October 2021
Post author: Dom Danks
Git repository: SynthVAE
My name is Dom and I’m a PhD student affiliated with The Alan Turing Institute and the University of Birmingham focussing on the development of machine learning (ML) methodology with applications to a variety of health data settings. In the summer of 2021 I was fortunate to undertake a PhD Data Science Internship within the Innovation Branch of the NHSX Analytics Unit (NHSXAU). My project was aimed at applying the Variational Autoencoder to the problem of synthetic data generation in the context of the NHS.
Given the nature of potential NHS use cases, it was important for the project to consider both the fidelity of the synthetic data and the extent to which the privacy of individuals’ data was retained. Prior to the project I had worked extensively with Variational Autoencoders, however had not been directly involved with synthetic data generation or formal privacy-preserving ML approaches. I therefore saw the project as an excellent opportunity to work on a project which both utilised knowledge I had developed during my PhD whilst simultaneously introducing me to additional areas of the field. It also represented the opportunity to see how approaches and priorities may differ between the academic world which I was accustomed to and that of a research-aware public sector setting like the NHSXAU.
I highly recommend the scheme and strongly encourage you to apply if the internship scheme’s projects and format appeal to you. Within your application be sure to mention explicit experiences you may have had working with data and the tools you used. Also communicate why you have chosen the particular project(s) you have applied for. It may be that it is particularly related (or unrelated!) to your normal work - whatever it is, be sure to make it clear. Finally, do make use of the interview to ask questions about the project and put forward the ways in which you think you may develop it. This will show that you have thought through the process thoroughly and will more than likely lead to a fun and free-flowing discussion between you and the panel.
Best of luck with your application!