Keywords: Explainability, ASR, Audio
Need: The accuracy and efficacy of automatic speech recognition (ASR) and downstream summarisation technologies need clear validation and benchmarking to give the NHS confidence in the performance and safety of these technologies.
Whilst metrics such as word error rate (WER) are commonly used, there are well documented issues with using WER alone to evaluate ASR systems especially in the medical setting where disordered speech can be caused through speech impediments, emotion and accents. In addition the WER considers all errors to be equal when in fact some will have a much greater impact on the downstream tasks, especially in summarisation. Therefore, a wider set of metrics including semantic models and clinical coding models are needed to be incorporated alongside the WER, with errors weighted by change in meaning.
This project would seek to demonstrate the issues and errors that need to be addressed in ASR with summaritaion tasks, how to identify these issues, and to highlight the impact of choices made in audio processing on the transcription accuracy.
Current Knowledge/Examples & Possible Techniques/Approaches:
Related Previous Internship Projects:
Enables Future Work:
Outcome/Learning Objectives:
Datasets: Public facing data to begin with such as the Kaggle competition around Patient Health Detection using Vocal Audio
Desired skill set: When applying please highlight any experience around audio processing and evaluation, python coding experience, and any other data science experience you feel relevant.
Return to list of all available projects.