Temporal trends in use of tests in UK primary care, 2000-15: retrospective analysis of 250 million tests

Abstract Objectives To assess the temporal change in test use in UK primary care and to identify tests with the greatest increase in use. Design Retrospective cohort study. Setting UK primary care. Participants All patients registered to UK General Practices in the Clinical Practice Research Datalink, 2000/1 to 2015/16. Main outcome measures Temporal trends in test use, and crude and age and sex standardised rates of total test use and of 44 specific tests. Results 262 974 099 tests were analysed over 71 436 331 person years. Age and sex adjusted use increased by 8.5% annually (95% confidence interval 7.6% to 9.4%); from 14 869 tests per 10 000 person years in 2000/1 to 49 267 in 2015/16, a 3.3-fold increase. Patients in 2015/16 had on average five tests per year, compared with 1.5 in 2000/1. Test use also increased statistically significantly across all age groups, in both sexes, across all test types (laboratory, imaging, and miscellaneous), and 40 of the 44 tests that were studied specifically. Conclusion Total test use has increased markedly over time, in both sexes, and across all age groups, test types (laboratory, imaging, and miscellaneous) and for 40 of 44 tests specifically studied. Of the patients who underwent at least one test annually, the proportion who had more than one test increased significantly over time.


Text S2 Extended included tests
Temporal change in age and sex-standardised rates were modelled with joinpoint regression [1]. Joinpoint regression models consists of straight lines which are connected by joinpoints -the estimated location of a significant change in the slope of the trend line. The joinpoint programme starts from a null hypothesis model of zero joinpoints and tests whether the alternative hypothesis of the maximum number of joinpoints specified for the model has a statistically significant lower sum of residual squares [2,3]. Permutation tests are applied sequentially until a model of best fit is reached [3]. Because multiple tests are performed, the significance level of each test is adjusted to control the overall type I error at specified a level (0.05) [2]. The maximum number of joinpoints is dependent upon the number of observations in the model, for our model a maximum of 2 joinpoints was allowed, as recommended [2].
Locations of a significant change in temporal slope were identified (a 'joinpoint') and the annual percentage change (APC) of the slope between joinpoints was determined. Thus, our calculated APC represent the annual percentage change for the years between joinpoints. Calculation of the AAPC allows for comparison of tests over the same time interval and for comparison between time intervals. To determine the annual percentage change over the entire study period (2000/1 to 2015/16) and over the post-QOF period (2004/5 to 2015/16) we calculated the average annual percentage change. The AAPC is computed as a weighted average of the APCs within the joinpoint model, with the weights equal to the length of the respective APC intervals. It is thus valid even if the joinpoint model indicates that there were changes in trend over the interval [4].
To examine the temporal change in utilisation of specific tests, we selected 44 specific tests (28 laboratory, 11 imaging and five other, miscellaneous tests (supplementary file). Twenty-five tests were selected based on the following criteria: 1. Specific guidance on their use in primary care is stated in one or more of the following guidelines/frameworks: Quality Outcomes Framework (QOF), National Institute of Health and Clinical Excellence (NICE), Choosing Wisely or NICE Do Not Do guidelines. 2. They were one of the two most frequent laboratory or imaging tests ordered from Oxfordshire primary care (data attained directly from Oxford University Hospital (OUH)). This list of tests was discussed and refined in consultation with our patient and public involvement group and agreed upon by all authors.
A further 19 tests were identified as part of data cleaning and in consultation with GPs as tests that are typically ordered as one test but return many results. For instance, a liver function test returns results for Alanine aminotransferase (ALT), Aspartarte aminotransferase (AST), Alkaline phosphatase (ALP) and Gamma glutamyl transferase (GGT) amongst others, all of which have individual Read codes and are recorded separately. Selection of these additional specific tests occurred before data analysis. The full codes are listed above in Supplementary Text S1.
Originally, we were going to exclude tests unlikely to be ordered by GPs. These tests were: Uric acid blood level, serum chloride, serum bicarbonate, anion gap, blood gases, ascitic fluid examination, Schilling test. Upon further expert General Practitioner review, before data was analysed, it was decided to include these tests in the temporal analysis of total tests. Expert GP review concluded that it is likely or at least plausible that these tests are ordered by UK General Practitioner and exclusion of these tests would be inappropriate. It was also stated that given the tiny proportion of total test use made up by the above tests, inclusion of these tests in the total test analysis is very unlikely to affect the results. All clinicians (JOS, FDRH, CS, PL, BG, JA and CH) in the author group discussed and agreed on the above stated process to select the included tests.

Text S3: Extended Discussion
We have taken many measures to ensure that the consistency of coding throughout the study period is consistent. As stated in the manuscript, the CPRD only includes General Practices with robust, up-to-date and valid data [5]. We also considered ordering of the test from two perspectives: a record of the test having been ordered and the result of the test (for instance, via a letter or numerical results). Letter correspondence is often incorporated into CPRD, with around 90% compliance [6,7]. We also believe the following points further support the validity of coding.
Significant clinical event coding. For General Practices to be included in the CPRD, they must report 'significant clinical events' [8,9], including test ordering.
Secondary care data contamination. The CPRD contains primary care data, with additional linkage to secondary care if requested (we did not attain linked secondary data as it was beyond the aim of our study). Thus, given the quality and validation of CPRD data, it is unlikely, but possible, that some of our test ordering we present reflects tests ordered by secondary care clinicians. Assessment of the temporal trends in two specific tests (Knee MRI and Brain MRI) suggests there is little, if any, secondary care contamination in our data. The provision of direct access MRI has been inconsistent around the UK [10], but substantial national effort to increase direct access MRIs for GP occurred around 2005/6 [11,12], these changes are reflected in the substantial increase in knee and brain MRIs around the same time. The rate of Knee, and Brain MRIs from our results are very low before 2005 (the least, and 3 rd least utilised tests respectively). It should be noted that the other MRI test in our analysis (Lumbar spine) has been available direct access for GPs since the mid to late 90's [13].

Independent validation of test codes in CPRD.
A systematic review reports that all abnormal test results are recorded accurately in CPRD [14]. A separate study [9] reports that from 2000 the proportion of abnormal results, out of the total number of tests ordered, has remained relatively constant in CPRD (28.7% from 2000-2004 and 27.0% from 2005 onwards). These results suggest that the coding of all test results (normal and abnormal results) has been consistent from 2000.

Decreasing use of some specific tests before 2004 (QOF introduction).
The results from our analysis of 44 specific tests support the conclusion that coding before 2004/5 was valid. We were concerned that test use coding before 2004 would be inconsistent. We were particularly concerned that not all tests that were ordered, were coded. With the advent of electronic ordering, we were less concernedas this coding then became automated -and we thus conducted a sensitivity analysis when 90% of general practices were electronically ordering tests (after 2004). Nevertheless, our results from three tests (urine drug monitoring, vaginal swabbing, and lumbar spine radiography) add weight to the conclusion that coding was valid before 2004. The use of these three tests fell significantly in use from 2000 to 2003/4. If the data were not coded appropriately, and thus tests were missed before QOF, we would not have anticipated any reductions in use for any test. It is of further encouragement that we noted reductions in the use across all three test types (laboratory, imaging, and miscellaneous).