Skip to content

Structural Topic Modelling for NHS survey data

This model is not currently suitable for predicting patient non-attendance in a real-world healthcare environment.

Note: All example data used in this repository is simulated and for illustrative purposes only. The dataset used in the analysis is provided. It is originally from Nottinghamshire Healthcare NHS Foundation Trust's CDU Data Science Team

See code README for installation and usage instructions.

Overview

A reusable codebase with example data for applying structural topic modelling (STM) to survey data. This technique allows contextual information (e.g. question number) to be included in the topic allocation.

The codebase includes example preprocessing of data, NGram analysis, sentiment analysis and the actual structural topic modelling. Additionally, there is a text search function enabled using WordNet.

To visualise and interpret the topic models we examined the range of outputs in the stm R package, such as word clouds, plot the estimated effect of the metadata, and print most associated words. ToLDAvis was used to visualise the topic-word distributions in an interactive pop-out window. This provided an overview of topic quality by looking at topic content and similarity. stminsights package was used to produced an interactive dashboard for a detailed inspection of the model and topics.

STMInsights Screenshot


Figure 1: Example Screenshot from STM insights