This site is no longer active
Since August 2023 we have moved our content to a central Signpost Site
This is not the official site but a store of technical documents and ongoing work. Opinions expressed in posts are not representative of the views of NHS England and any content here should not be regarded as official output in any form. For more information about NHS England please visit our official website
DART Innovation Branch - Ways of Working
Reproducible Analytical Pipelines (RAP)
Reproducible Analytical Pipelines (RAP) are becoming the standard for creating analytical outputs in government; combining a number of ways of working that help to improve the reliability, transparency, and speed of statistics publications. Recently the Goldacre Review identified RAP as the essential element to ensure high-quality analysis.
Both our internal and external work constantly has one eye on how to increase the level of RAP and support reuse of our work.
Code First (mostly Python, R)
Whilst most of our work is python based we aim to use and support a wider variety of languages recognising that no single language can currently, or will continue indefinitely, to be optimal for every task.
For new members of the team looking at learning Python and R we recommend starting with:
- Python - see nhs-pycom.net/resources and wider website as a start. Engage with the nhs-pycom slack (link on the website) as a secondary point to see what the wider community recommends. You can also see our peer-to-peer internal training here.
- R - see nhsrcommunity.com as a start. Engage with the NHSR slack (link on website).
Levels of code output
We have three levels of code (depending on the envisaged end use-case):
- Prototypes - these pieces of code are delivered as working examples of a method or tool set but then not continually maintained. Their main purpose is to backup the technical report with shared code and clear examples.
- Standalone - these pieces of code are designed for reuse by a developer/analyst. They will need tweaking for the local situation and should only be applied with domain/data specific knowledge to ensure they are not misused. These pieces would be used by any ICS/trust project as a suite of possible starting points to apply a range of data science techniques. These code bases need a code owner to monitor their status and ensure they remain active and updated over time.
- Integrated - these pieces of code have an eventual aim to turn into a tool that can be integrated alongside NHS England infrastructure and data to create new capabilities for our analysts. These will need maintenance to keep them active but more importantly a full software engineering cycle including requirements, full refactor to reach an Alpha point and then a full testing programme to move through Beta and into release. This is beyond the current capabilities of the team and so requires support from DMIS or additional resourcing.
These rough definitions help us to prioritise code development and ask ourselves when do we need to care about maintaining and pushing best practice on a code and when can we just ensure the code has reached a stable state we can come back to at a later date.
We use github as our main collaboration tool when the code is not sensitive. To support our work in github we use a standard project template with branches of this template including a more detailed cookiestructure, hooks for simple code quality checks, tests, MkDocs documentation, and docker setup.
These templates and branches are aimed at supporting gradual development of the codebase towards higher RAP standards as demonstrated by this example flow
We've also set out our thinking behind the mandate and approach of sharing code in the open in our thinking section
We also have published and use a open code checklist as a starting point to support making our code open with a series of appropriate checks.