NHS England Data Science PhD Internships

Exploring Medical Visual Question Answering Approaches and Advancements

Keywords: NLP, VQA, Multi-modal

Need: Visual Question Answering (VQA) is an interesting challenge combining different disciplines, including computer vision, natural language understanding, and deep learning techniques. VQA in the medical domain incorporates areas such as diagnosis, helping patients understand their medical conditions, and answering the corresponding questions accurately in unlabelled datasets.

Further, when considering Medical VQA we must be aware that images and their descriptions generated in the healthcare sector are often vastly different from images and text taken from the general domain. Thus, there is a gap between many pre-trained models which could be useful and this problem. There is also a large cost associated with finding healthcare professionals with suitable expertise to annotate these images as well. This is also a challenging area in which to evaluate success, due to the need to understand how to constrain the evaluation space, whilst not stifling the need for flexibility in the approach.

The project would look to understand the current state of Medical VQA, the available datasets and challenges, and the ability of current approaches to this task. It would need to consider how recent advances in multimodal foundation models has advanced the area. The interpretability and explainability of technical approaches would be of interest to consider.

Current Knowledge/Examples & Possible Techniques/Approaches:

Related Previous Internship Projects: TxtRayAlign

Enables Future Work: Demonstration and deeper understanding of VQA as a component of explainability in medical imaging and working with multi-modal datasets.

Outcome/Learning Objectives: The project would look to build an understanding of the state of the field of VQA and the challenges specific to the healthcare domain.


Desired skill set: When applying please highlight any experience around computer vision, natural language processing, multi-modal data, coding experience (including any coding in the open), and any other data science experience you feel relevant.

Return to list of all available projects.