Intended for healthcare professionals

Rapid response to:


How can we improve the quality of data collected in general practice?

BMJ 2023; 380 doi: (Published 15 March 2023) Cite this as: BMJ 2023;380:e071950

Rapid Response:

Transforming Primary Healthcare through the Power of Natural Language Processing and Big Data Analytics

Dear Editor,

Delivering quality care to a patient population increasing in age, size and complexity is a growing challenge for primary care in the NHS (1). A promising solution exists in a learning health systems approach (LHS). LHS treats each patient encounter as a resource for continual innovation, identifying evidence-based improvements informed by analysis of routine consultation data (2). However, the authors note that poor data quality is a barrier to realising the potential of LHS in primary care (3). We believe that Natural Language Processing (NLP), an artificial intelligence technology that mines data using deep learning and text analytics, stands as a promising means of efficient data collection and facilitating LHS (4).

Firstly, NLP adoption will ensure existing clinical documentation can be transformed into high quality data. Over 80% of healthcare documentation exists in Electronic Health Records (EHRs) as unstructured free-text, and manual extraction has proven resource-intensive and unreliable (5,6). NLP algorithms do not require data to be collected in a specific structure or for clinicians to retrospectively codify their notes. When compared to a manual expert review, NLP significantly reduces documentation time whilst maintaining quality and usability of the data stored (7).

Secondly, removing the administrative burden of manual data input using dictation tools alongside NLP would enhance patient-centred consultations whilst reducing an unpopular part of General Practitioner (GP) workload (8,9). Documenting in EHRs were found to take up more than half of a GP’s work day, twice the time spent with patients; many GPs feel these time constraints impede provision of holistic care (10). Furthermore, enabling patients to review the generated clinical summary would provide empowerment whilst yielding additional medico-legal protections (11). Envisage a reality where every GP’s undivided attention is upon the patient in front of them; NLP offers the opportunity for happier patients, more satisfied GPs, and optimisation of continuity-of-care through rich data collection.

Despite the justified optimism surrounding NLP utilisation, barriers to implementation must be addressed. Clinicians often use abbreviations and ambiguous free text when recording data, which may be left unrecognised by machine learning algorithms. Expanding and training NLP models to use existing repositories such as SNOMED-CT has been proposed to prevent such data loss (12). A significant qualm regarding NLP uptake is patient de-identification; a proposed method to overcome this is pseudonymisation whereby patient names are replaced with pseudonyms at all levels of their clinical record. CRATE, an open source AI which takes advantage of pseudonymisation, has shown promise in achieving this with accuracy and speed (13).

A potentially key hurdle to face prior to the adoption of NLP in practice is gauging the perspective of doctors, which has been sparsely explored in literature. The Technology Acceptance Model describes how perceived usefulness directly influences intention to use (14). Physicians may not trust new technology or recognise the potential of NLP to empower their own clinical practice. Therefore, it is vital that clinicians are able to collaborate with data scientists - the underlying ethos of LHS. In order to maximise the potential of NLP, GPs must understand how it works and support the iterative process. This is in line with an NHS England Transformation Directorate which sets out to build a skilled, digitally literate workforce.

The NHS long-term plan calls for digital development to be at the heart of efforts to modernise our health system (15). Incorporating NLP across the primary care sector provides a springboard for large databases to be built around research driven practices. Though some drawbacks do exist, the possibilities NLP presents for data utilisation in primary care are immense and therefore should be implemented in everyday practice.

(1) Health and Social Care Committee. The future of general practice. UK House of Commons. Report number: 4, 2022-23.

(2) Hardie T, Horton T, Thornton-Lee N, Home J, Pereira P. Developing learning health systems in the UK: Priorities for action. The Health Foundation; 2022.

(3) Shemtob L, Beaney T, Norton J, Majeed A. How can we improve the quality of data collected in general practice?. bmj. 2023 Mar 15;380.

(4) Harrison CJ, Sidey-Gibbons CJ. Machine learning in medicine: a practical introduction to natural language processing. BMC medical research methodology. 2021 Dec;21(1):1-1.

(5) Zhang D, Yin C, Zeng J, Yuan X, Zhang P. Combining structured and unstructured data for predictive models: a deep learning approach. BMC medical informatics and decision making. 2020; 20 (1): 280.

(6) Williamson EJ, McDonald HI, Bhaskaran K, et al. Risks of covid-19 hospital admission and death for people with learning disability: population based cohort study using the OpenSAFELY platform. BMJ2021; 374:n1592.

(7) Wi S, Goldhoff PE, Fuller LA, Grewal K, Wentzensen N, Clarke MA, et al. Using Natural Language Processing to Improve Discrete Data Capture From Interpretive Cervical Biopsy Diagnoses at a Large Health Care Organization. Archives of pathology & laboratory medicine (1976). 2023; 147 (2): 222-226.

(8) Stobbe EJ, Groenewegen PP, Schäfer W. Job satisfaction of General Practitioners: A cross-sectional survey in 34 countries. Human Resources for Health. [Online] 2021;19(1).

(9) Lanier C, Dominicé Dao M, Baer D, Haller DM, Sommer J, Junod Perron N. How do patients want us to use the computer during medical encounters?—a discrete choice experiment study. Journal of General Internal Medicine. [Online] 2021;36(7): 1875–1882.

(10) Arndt BG, Beasley JW, Watkinson MD, Temte JL, Tuan W-J, Sinsky CA, et al. Tethered to the EHR: Primary care physician workload assessment using EHR event log data and time-motion observations. The Annals of Family Medicine. [Online] 2017;15(5): 419–426.

(11) Barach P, Bettinger J, Charpak Y et al. Exploring patient participation in reducing health-care-related safety risks. World Health Organization. 2013.

(12) Locke S, Bashall A, Al-Adely S, Moore J, Wilson A, Kitchen GB. Natural language processing in medicine: a review. Trends in Anaesthesia and Critical Care. 2021 Jun 1;38:4-9.

(13) Cardinal RN. Clinical records anonymisation and text extraction (CRATE): an open-source software system. BMC medical informatics and decision making. 2017 Dec;17:1-2.

(14) Davis FD. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly. 1989 Sep 1:319-40.

(15) NHS. The NHS long term plan. 2019.

Competing interests: No competing interests

02 April 2023
Yasin Uddin
Medical Student
Abhinav Nair, Sameed Shariq, Sehaan Hasan Hannan
Imperial College London
Imperial College London, Exhibition Road, South Kensington, London SW7 2BX