Keywords: Explainability, ASR, Audio
Need: Many NHS consultations, Multi-Disciplinary Team meetings, and patient interactions are recorded as speech. Converting this audio into accurate, concise, and clinically meaningful summaries could reduce administrative burden, improve documentation quality, and enhance patient safety.
However, current automatic speech recognition (ASR) and summarisation models face challenges in real-world NHS contexts. Standard metrics like word error rate (WER) are limited—treating all errors equally, even though some (e.g. medication name errors) can significantly impact clinical interpretation. Disordered or accented speech, emotion, or interruptions often reduce ASR accuracy in healthcare.
This project aims to evaluate how decisions made regarding audio processing, transcription and summarisation, influence patient outcomes. It will explore more nuanced evaluation methods—including semantic similarity, clinical relevance scoring, and error weighting—and demonstrate how different ASR systems perform in healthcare-like scenarios.
Current Knowledge/Examples & Possible Techniques/Approaches: Recent advances in ASR and summarisation have led to high-performing open-source and commercial systems:
Related Previous Internship Projects:
Enables Future Work:
Outcome/Learning Objectives:
Datasets: Public facing data to begin with such as the Kaggle competition around Patient Health Detection using Vocal Audio, LibriSpeech, or CommonVoice
Desired skill set: When applying please highlight any experience around audio processing and evaluation, NLP and/or speech models, Familiarity with LLMs & HuggingFace, understanding of model evaluation and bias/factuality concerns, Python including PyTorch or TensorFlow, and any other data science experience you feel relevant.
Return to list of all available projects.