Investigating the relationship between quality of primary care and premature mortality in England: a spatial whole-population study
BMJ 2015; 350 doi: https://doi.org/10.1136/bmj.h904 (Published 02 March 2015) Cite this as: BMJ 2015;350:h904- Evangelos Kontopantelis, senior research fellow12,
- David A Springate, research fellow23,
- Mark Ashworth, senior lecturer4,
- Roger T Webb, reader5,
- Iain E Buchan, professor1,
- Tim Doran, professor6
- 1Centre for Health Informatics, Institute of Population Health, University of Manchester, Manchester, UK
- 2NIHR School for Primary Care Research, Centre for Primary Care, Institute of Population Health, University of Manchester
- 3Centre for Biostatistics, Institute of Population Health, University of Manchester
- 4Primary Care and Public Health Sciences, King’s College London, London, UK
- 5Centre for Mental Health and Risk, University of Manchester
- 6Department of Health Sciences, University of York, York, UK
- Correspondence to: E Kontopantelis e.kontopantelis{at}manchester.ac.uk
- Accepted 13 January 2015
Abstract
Objectives To quantify the relationship between a national primary care pay-for-performance programme, the UK’s Quality and Outcomes Framework (QOF), and all-cause and cause-specific premature mortality linked closely with conditions included in the framework.
Design Longitudinal spatial study, at the level of the “lower layer super output area” (LSOA).
Setting 32482 LSOAs (neighbourhoods of 1500 people on average), covering the whole population of England (approximately 53.5 million), from 2007 to 2012.
Participants 8647 English general practices participating in the QOF for at least one year of the study period, including over 99% of patients registered with primary care.
Intervention National pay-for-performance programme incentivising performance on over 100 quality-of-care indicators.
Main outcome measures All-cause and cause-specific mortality rates for six chronic conditions: diabetes, heart failure, hypertension, ischaemic heart disease, stroke, and chronic kidney disease. We used multiple linear regressions to investigate the relationship between spatially estimated recorded quality of care and mortality.
Results All-cause and cause-specific mortality rates declined over the study period. Higher mortality was associated with greater area deprivation, urban location, and higher proportion of a non-white population. In general, there was no significant relationship between practice performance on quality indicators included in the QOF and all-cause or cause-specific mortality rates in the practice locality.
Conclusions Higher reported achievement of activities incentivised under a major, nationwide pay-for-performance programme did not seem to result in reduced incidence of premature death in the population.
Introduction
Primary care has enormous potential to improve population health outcomes—including mortality from common chronic conditions—through early intervention in the disease process1 2 and coordinated provision of care. Effective primary care is associated with reduced morbidity, increased longevity, and more equitable health outcomes,3 4 but quality of primary care varies widely between providers.5 6 Traditional physician payment systems have facilitated this variation, with fee-for-service systems potentially incentivising over-investigation and over-treatment, and capitation systems potentially incentivising under-utilisation. Neither approach directly rewards high quality care or investment in quality improvement.7 8 9
In order to improve patient outcomes, policymakers worldwide have attempted to link remuneration for providers to quality of care through pay-for-performance programmes. Multiple programmes have been implemented across a range of settings, but clear evidence for improved patient outcomes is yet to emerge.10 11 12 In the United Kingdom a national primary care incentive scheme was introduced in 2004. The Quality and Outcomes Framework (QOF), one of largest pay-for-performance programmes in the world, links up to 25% of family practitioners’ income to performance on over 100 publicly reported quality indicators. Several indicators relate to organisation of care and patient experience, but most relate to management of common chronic diseases,13 including the leading causes of death in the UK: coronary heart disease, cancer, chronic obstructive pulmonary disease, hypertension, stroke, and diabetes. The breadth of the scheme should not be underestimated: in 2011-12 the total register of the 19 conditions incentivised under the QOF represented data collected from approximately 62% of the total number of registered patients.
The QOF was associated with an initial improvement in incentivised processes of care and some intermediate outcomes, but there was little further improvement after the third year of the scheme (2006-7).14 15 16 Evidence for improved patient outcomes, however, is contradictory. For example, emergency hospital admission rates seem to have decreased for diabetes17 but not for some other incentivised conditions.18 19 Evidence on the benefits of several processes incentivised in the QOF suggests that improvements in performance of these processes should reduce population mortality.20 However, no studies to date have examined the relationships between recorded practice performance under the QOF and premature death rates for conditions included in the programme.
In this study, we assessed these relationships for the first time. More specifically, we applied spatial analysis techniques to quantify the association between practices’ performance on the QOF indicators, including improvement on these indicators, and all-cause premature mortality in the areas the practices serve, controlling for area and population characteristics. We also examined the relationship between performance on specific intermediate outcomes indicators that are more closely associated with patient outcomes (for example, blood pressure control) and premature mortality for closely related conditions (diabetes, heart failure, hypertension, ischaemic heart disease, stroke, and chronic kidney disease, the total register for which was approximately 27% of the total number of registered patients in 2011-12).
Methods
Data sources
We accessed various data sources to obtain information on population estimates, mortality, socioeconomic status, ethnicity, health status, urbanicity versus rurality, quality of healthcare, and spatial coordinates. Data were collected at the lowest available geographic level, the “lower layer super output area” (LSOA), which has a mean (and median) population size of around 1500 (first centile 1084, 99th centile 2931 residents), since higher levels would involve aggregation across heterogeneous areas and populations and obscure inferential analyses.21 An average primary care practice covers a population approximately equivalent to five LSOAs. Mortality counts at the 2001 LSOA by age, sex, and cause were obtained from the Office of National Statistics. Datasets for practice-level QOF achievement, prevalence rates, and list sizes were downloaded from the Health & Social Care Information Centre.22 Population counts by gender and age group, as well as information on disability, ethnicity, and health status were obtained from the 2011 census, at the 2011 LSOA.23 24 Revised population estimates for prior years (based on 2001 and 2011 census information) were also downloaded.25 Digital vector boundaries for the 2001 LSOAs, generalised to 20 metres and clipped to the coastline to reduce size and improve visualisation, were obtained from the Office of National Statistics open geography portal.26 Area deprivation, as measured by the 2010 Index of Multiple Deprivation,27 and urban classification28 were available at the 2001 LSOA level.29 30 31
By definition, LSOAs are geographical areas mainly determined by population size and were first developed using data from the 2001 census. After the 2011 census some changes were introduced to reflect large population increases or reductions in localities, although 97.5% of LSOAs in England and Wales remained unchanged.21 Since most of the collected information was reported at the 2001 LSOA level, we proceeded to merge and analyse data using these geographies. Census data for 2011 were reported only for 2011 LSOAs, and we had to convert to the 2001 geographies. For the small percentage of areas that had merged or split, we used population weights to estimate data, and for some of these (approximately 1% of all LSOAs) population estimates were outside the 1000-3000 range, as would be expected.
Standardised mortality and final dataset
To allow for more meaningful comparisons across areas in terms of mortality, standardised mortality rates were calculated, using the 2011 census population data and all-cause and condition-specific mortality counts.32 Rather than standardising by the 1976 European Standard Population, we used 2011 census data to estimate the true age and sex population breakdown for England, which allowed us to produce yearly age and sex standardised mortality rates, from 2005 to 2012, per 100 000 people within each gender. Death among some age-sex subgroups can be a rare outcome at the LSOA level, so we also calculated two-year rates over the same time period.
The final dataset contained the following variables used in the regression analyses, complete for the 32 482 English LSOAs of 2001: all-cause one-year deaths and standardised mortality rates (yearly, 2005–12), all-cause two-year deaths and standardised mortality rates (yearly, 2005–11), condition-specific (diabetes, heart failure, hypertension, ischaemic heart disease, stroke, and chronic kidney disease; see online appendix 1 table A1 for details) one-year deaths and standardised mortality rates (yearly, 2005–12), condition-specific two-year deaths and standardised mortality rates (yearly, 2005–11), total population (yearly, 2005–12), percentage of whites (2011), non-urban (2004), index of multiple deprivation (2010). The dataset also included variables that were not used in the analyses, mainly due to collinearity, but were explored descriptively: percentage of population with day-to-day activities limited a lot or a little (2011), and percentage reporting bad or very bad health (2011). Using the Stata shp2dta command and the vector boundaries, we calculated the centroid for each LSOA in the British National Grid format.33 These were converted from British National Grid easting and northing to longitude and latitude in degrees.34 The manual conversion process was double checked for correctness using R, where a relevant command was available.35
Practice-level data were aggregated at the 2001 LSOA level, to account for multiple practices in a locality, and the number of these primary care practice “hubs” (a LSOA with one or more practices) varied between 6569 in the third year of the QOF (2006-07) and 6413 in the eighth (2011-12). Our decision to exclude the first two years of the scheme was driven by practicality: first, not all required information was released for 2004-05 (for example, the number of exceptions (patients for whom indicators were deemed inappropriate by practices)); second, the scheme underwent a major revision after 2005-06 with many disease domains added and others considerably modified. Available information from 2006-07 to 2011-12 included number of practices, total list size, and prevalence rates for 19 QOF clinical domains (atrial fibrillation, asthma, hypertension, cancer, coronary heart disease, chronic kidney disease, chronic obstructive pulmonary disease, dementia, depression, diabetes, epilepsy, heart failure, learning disabilities, left ventricular dysfunction, mental health, obesity, palliative care, stroke, and hypothyroidism).
To better capture true levels of QOF quality of care we calculated population achievement,36 37 defined as PA=(∑Ni)/(∑(Di+Ei)), where Ni, Di and Ei the numerator, denominator and exceptions for QOF indicator i respectively. Three population achievement measures were calculated, within each QOF year: (a) overall, across all indicators (PAoval); (b) across all intermediate outcome indicators (PAoutc); and (c) across a subset of nine intermediate outcome indicators in the clinical domains linked with condition-specific mortality (PAoutx; blood pressure control in hypertension, coronary heart disease, chronic kidney disease, diabetes, and stroke; cholesterol control in coronary heart disease, diabetes, and stroke; HbA1c control in diabetes). Details on all the indicators used, over time, are provided in online appendix 1 table A2. To quantify the morbidity “burden” in each practice hub, we defined morbidity load as the sum of QOF clinical domain denominators over the total list size for: (a) all indicators in all 19 domains (MLtot); and (b) the nine outcome indicators in the five clinical domains linked with condition-specific mortality (ML9).
Spatial weighted estimates datasets
Approximately 80% of LSOAs do not contain a general practice, so we used spatial analysis methods to estimate healthcare data at the LSOA level. Current UK policy only allows patients to register with primary care services in the locality they reside, and general practice choice is therefore relatively limited. However, that limitation allows us to use practice-level information and attribute it to geographical areas close to it. The LSOA centroid coordinates (longitude and latitude) were inputted to the spmat command in Stata and a 32 482×32 482 matrix of inverse distance (in miles) was obtained.38 The matrix, which holds a detailed distance mapping of each LSOA with all other LSOAs, allowed us to quantify geographical connectivity and proximity, with closer areas having larger values (or weights). This was then used to generate the prevalence, quality of care, and morbidity load measures reported previously, for all LSOAs from 2006-07 to 2011-12. Two general approaches, under different assumptions, were used for this purpose.
Complete local attendance
Under this approach, for practice-hub LSOAs (approximately 20% containing one or more practices), we assumed that the population in each attended the local practice or practices and therefore practice-level aggregates were directly applied to the population level. For each of the remaining LSOAs not containing a general practice (approximately 80%), weighted estimation followed these steps: (a) the product PLD of total list size and (centroid) inverse distance from the area was calculated and ranked for all practice-hub LSOAs; (b) the five “closest” practice-hubs (in terms of size and proximity) were selected; (c) each measure was calculated as the mean of the respective measures in the five selected practice-hubs weighted by PLD. In the few cases (<50 in each year) where the area population NA in a practice-hub was smaller than the total list size NL, we used the same weighted estimation process to account for NL–NA while NA was accounted for by the local practice(s).
2014 attribution dataset
The complete local attendance assumption can be difficult to justify for all the patients in all areas (especially in high density urban areas where practice options are numerous), and we used an alternative approach to generate the weighted estimates. In 2014, the Health & Social Care Information Centre released information on the attribution of general practice populations to LSOAs and vice versa, completely linking practice registers and LSOAs for the first time.39 Although the dataset covered only 2014, and some changes in registration practice occurred in 2012,40 we used it as a blueprint to generate annual attribution datasets starting from 2011-12 and moving back to 2006-07. Of the 8122 practices in our analyses, 7932 were identified in the attribution dataset (97.7%), 190 had closed down or merged by 2014, while 77 new practices had emerged. The algorithm used to quantify quality of care and morbidity load proceeded along these steps, for each LSOA:
1) If two or more practices were linked to the LSOA population in 2014, Poisson and negative binomial regression models were fitted to the data, with list size and distance to the practice as predictors. The best modelling strategy between the two was identified by comparing the standard errors of the predictions.
2) For practices present both in our analyses and the attribution dataset, attributed population over time was adjusted for the practice’s list size in the respective year (that is, assumed a constant attribution rate over time).
3) For practices present in our analyses but not in the attribution dataset, we generated estimates using the model selected in step 1 (that is, predictions based on list size and distance) across all years.
Some LSOAs were served completely by a single practice, and we assumed that this remained the case in previous years. It should be noted that the 77 newer practices are not included in the attribution estimates for 2006-07 to 2011-12 and are only used to model the 2014 attribution in the area. The patients they served in 2014 were effectively re-distributed to the active practices within each year according to the selected regression model. Similarly, for practices that ceased to exist or merged by 2014, patients were re-distributed to them in their years of activity, according to their characteristics. Finally, the estimated attribution counts across practices and within each year were used to generate the weighted mean estimates for quality of care and morbidity load (and the potential differences between total estimated attribution size and total LSOA population, due to the appearance or disappearance of practices, becomes irrelevant).
Statistical analyses
Three sets of multiple linear regression models were used to investigate the relationship between QOF quality of care and mortality. Following spatial weighted estimation, data were complete for all 32 482 English 2001 LSOAs. Each analysis set was applied to both spatial weighted estimation approaches.
The first set of models examined the relationship between QOF quality of care and 2011-12 standardised mortality rates, all-cause and condition-specific. As explained previously, the two-year estimates were considered more reliable because of the low incidence of deaths for younger subgroups. We chose 2011 as our population baseline from which to calculate mortality rates, since that was the census year and population counts were more reliable. Different models were used for each of the three population achievement measures (PAoval, PAoutc, PAoutx) and for each QOF year from 2008-09 to 2011-12. All-cause standardised mortality rates were regressed with PAoval and PAoutc, condition-specific standardised mortality rates with PAoutx. In all models we controlled for 2010 deprivation, urbanity versus rurality, ethnicity, and morbidity load (MLtot for PAoval, PAoutc; ML9 for PAoutx).
A second set of models investigated the relationship between changes in QOF quality of care over a three or five year period and 2011-12 standardised mortality rates. Changes in the three population achievement measures were used as predictors in separate regression models. For each measure we constructed and analysed change between: 2011-12 and 2008-09, 2010-11 and 2007-08, and 2009-10 and 2006-07 (three years); and between 2011-12 and 2006-07 (five years). Models were controlled for the same covariates as in the first set, and all-cause standardised mortality rates were regressed with PAoval and PAoutc changes, while condition-specific standardised mortality rates were regressed with PAoutx changes.
A third set of models were used as sensitivity analyses, to assess the relationship between QOF quality of care and mortality over time (longitudinally). The outcome was one-year standardised mortality rates from 2007 to 2012, and we investigated the relationship for each of the three population achievement measures, in the same or future years (up to lag of three years). This zero lag model linked population achievement to mortality in the same year (such as QOF performance in 2011-12 to standardised mortality rate in 2012) and included data for all six years, while at the other end the three-year lag model necessarily used a subset of the data and standardised mortality rates in years 2010 to 2012 (linked to population achievement from 2006-07 to 2008-09). The same covariates were used as before, with the addition of year. These analyses addressed the same question as the first set, longitudinally rather than cross sectionally and with more data, but possibly a less reliable outcome (standardised mortality rates based on between-census population predictions) and were used as sensitivity analyses.
Stata v13.141 was used for data management and all analyses, with an alpha value of 1% to better protect against type I error inflation because of the numerous tests we performed. However, we focused on effect sizes rather than P values since statistical significance is more likely and can be less meaningful in large datasets such as the one we analysed.42 Analyses sets 1 and 2 were run with the regress command since their cross sectional nature did not require longitudinal random-effects modelling. For analysis set 3, however, we used the xtreg command, to account for the correlated nature of observations within each LSOA, over time (that is, by fitting a random effect for LSOAs). Although we collected more potential covariates (such as reported health) they tended to be strongly correlated with deprivation and were excluded from analyses because of collinearity. The distributions of 1-year and 2-year standardised mortality rates are linked to a Poisson process,43 and we observed skew-normal distributions, although not extreme. Linear regressions are known to be robust in such scenarios of small or moderate skew, especially for large sample sizes,44 but we performed additional sensitivity analyses with the log transformed outcomes.
Moreover, we repeated the three sets of analyses across each of the two spatial datasets using maximum likelihood spatial regressions with the spreg command,45 to account for potential spatial auto-correlation (that is, mortality in one LSOA affecting mortality in neighbouring LSOAs). An additional sensitivity analysis focussed on standardised mortality rates for those aged 50 or over, to assess whether any potential relationships, which we would expect mainly for this group, were attenuated in the main analyses due to “noise” from younger, healthier people.
More sensitivity analyses assessed the sensitivity of the findings to the spatial weighted estimation assumptions. Further regressions with untransformed standardised mortality rates were run, using data from practice-hub LSOAs only. Finally three more sets of sensitivity analyses were executed for the complete local attendance approach, with a three practice-hub estimation limit, a 10 practice-hub estimation limit, and no limit at all, to assess how sensitive results were to the five practice-hub weighted estimation limit.
Results
We present results from the complete local attendance approach in the main paper and results from the attribution approach in online appendix 2. We mainly discuss the former but also highlight different results between the two methods and the other sensitivity analyses.
Descriptive statistics on mortality, demographics, reported health, deprivation, prevalence rates, QOF morbidity load, and quality of care (population achievement) are provided in table 1⇓ and online appendix 1 table A3, over geographical areas and time respectively. The spatial weighted estimation method is summarily described, in parallel with an example, in figure 1⇓. Spatial maps for morbidity load and population achievement on outcome indicators are provided in figures 2⇓, 3⇓, and 4⇓ for the largest population centres (Greater London, West Midlands (Birmingham), and Greater Manchester).46 Morbidity load represents the proportion of the population with the relevant chronic diseases, and therefore reflects demographic characteristics of resident populations (for example, age distribution) in addition to premature morbidity.
The North East and North West regions had the highest median all-cause death and standardised mortality rates (SMRs). Crude condition-specific death rates were more uniform across England, with the exception of London and its younger population, where rates were 33% below the national average in 2010-11. Condition-specific standardised mortality rates where highest for the North West and Yorkshire & Humber regions. Overall, median population QOF achievement varied from 83.6% (South East region) to 85.2% (North East). Population achievement for the subgroup of nine outcome indicators was lower and varied from 71.7% (London) to 74.8% (North East). Median morbidity load was highest in the North East and lowest in London, both for all indicators and for the nine outcome indicator subgroup.
Over time, all mortality metrics seemed to decrease, especially condition-specific standardised mortality rates, but late registration of Office of National Statistics deaths contributed to this (the probability of a death being “missing” is not time-invariant). Nevertheless, decreases in condition-specific deaths and standardised mortality rates and all-cause standardised mortality rates can be observed in the 2007-09 period, for which we would expect few unregistered deaths. The size of populations served by a practice-hub increased over time. Median recorded prevalence for most conditions and overall morbidity load (MLtot) increased over time, but the subgroup morbidity load (ML9) remained stable. Prevalence and performance rates estimated under the attribution approach were similar (online appendix 2, tables B1 and B2 and figs B1, B2, and B3).
Regression analyses
Levels of QOF quality of care and 2011-12 standardised mortality rates
All regression analyses in the first set of models failed to identify a significant relationship between quality of care (population achievement) and all-cause or condition-specific standardised mortality rates (table 2⇓). Area deprivation in 2010, as measured by the Index of Multiple Deprivation, was by far the strongest predictor of mortality across all models, with higher deprivation linked to higher standardised mortality rates in 2011-12. To put the effects into context, using QOF year 8 models, a change in the Index of Multiple Deprivation from the median (17.25) to the 90th centile (44.88) would correspond to an all-cause standardised mortality rate increase from the median to the 82nd centile. For the condition-specific standardised mortality rate models the effect was weaker, and a change in Index of Multiple Deprivation from the median to the 90th centile would correspond to a condition-specific standardised mortality rate increase from the median to the 73rd centile. In non-urban areas, all-cause standardised mortality rates were estimated to be lower, but not condition-specific standardised mortality rates. A non-urban LSOA (18.6%) would correspond to an all-cause standardised mortality rate change of −44.5, compared with an urban LSOA, which would take the median all-cause standardised mortality rate from 513.4 to 468.9 (40th centile). Ethnicity was also a significant but less strong predictor in all models, with larger standardised mortality rates observed in areas with larger proportions of non-white residents.
In the models where morbidity load across all 19 QOF domains (MLtot) was used as a predictor, it was found to be negatively associated with standardised mortality rates, while a positive association with standardised mortality rates was observed for the subgroup morbidity load (ML9) in the condition-specific mortality models. However, these relationships were relatively weak, with a change in MLtot from the 50th centile (1.94) to the 90th (2.30) corresponding to an all-cause standardised mortality rate move from the median to the 44th centile, and a change in ML9 from the 50th centile (0.41) to the 90th (0.50) corresponding to a condition-specific standardised mortality rate move from the median to the 52nd centile. The models explained approximately 29% of the all-cause and around 8% of the variance in condition-specific standardised mortality rates.
Changes in levels of QOF quality of care and 2011-12 standardised mortality rates
Findings were similar in the second set of analyses, where changes in population achievement (over three or five years) were not found to be significantly or modestly associated with all-cause or condition-specific standardised mortality rates (table 3⇓). The only exception was the relationship between three year change in overall achievement (from year 5 to 8) and all-cause mortality, where a small but statistically significant effect was observed. Estimates were close to the first set of analyses, and area deprivation was again the strongest predictor. Urbanicity versus rurality and ethnicity were weaker predictors. Similar relationships to the first set of analyses were observed between morbidity load and standardised mortality rate: negative for MLtot and positive for ML9.
Attribution dataset analyses
Results from the two spatial approaches generally agreed across all analyses. Across each attribution model, the rate of explained variance (R2) was very close to the explained variance in the respective complete local attendance model. Deprivation was again the strongest predictor by far, across all models and the relationship between QOF population achievement and outcomes was generally negligible (online appendix 2, tables B3, B4, and B5). There were a few analyses where the relationship seemed to be statistically significant (high performance linked to reduction in mortality, but the reverse in some cases). However, these relationships were not verified in the statistically more robust longitudinal random-effects models (third set), where all relationships between population achievement and outcomes were negligible.
Sensitivity analyses
In all sensitivity analyses (third set of analyses: longitudinal model with yearly standardised mortality rates from 2007 to 2012; log-transformed standardised mortality rates; practice-hub only and varying assumptions for the practice-hub estimation limit; spatial regressions and standardised mortality rates for those aged ≥50 years) findings were similar (third set of analyses reported in table 4⇓ and table B5 in online appendix 2, for the two spatial approaches; other results available from the authors). In almost all models and analyses the relationship between QOF population achievement and all-cause or condition-specific standardised mortality rates was negligible or, in a few cases, small but statistically significant (and often in the opposite direction to the expected). This small number of statistically significant findings is not unusual given the number of models executed and the size of the dataset, and we feel they can be discarded as spurious. Area deprivation was by far the strongest predictor in all sensitivity analyses. Spatial parameters estimated with spreg were often statistically significant but negligible when compared with the error terms.
Discussion
The Quality and Outcomes Framework (QOF) was introduced in 2004 across all family practices in the UK to reward high quality primary care, and over £10bn has since been invested in the programme. Although the incentive scheme does not fully capture quality in primary care, it does cover many important aspects, especially within its clinical domain. Between the third and eighth years of the QOF programme mortality rates in England decreased by 14%, but we could not identify a relationship between practice performance on the clinical aspect of the QOF and mortality outcomes in the practice locality.
Strengths and limitations of the study
The main strength of this analysis is that it was conducted at the population level and used two different novel spatial estimation techniques to analyse data for the whole of the primary care population in England.
However, there are several limitations. First, standardised mortality rates are imprecise at the level of small geographical areas such as lower layer super output areas (LSOAs). However, we felt that analysis at this level was required to meaningfully examine the relationship between recorded quality of care and patient outcomes. Aggregating data at higher levels introduces heterogeneity and potential confounding that would be very difficult to account for. In addition, use of standardised mortality rates should not pose a threat to the validity of our findings: even with an imprecise measure, we should have identified any existing modest link between performance and mortality. Second, because of late registration of some deaths by the Office of National Statistics, we expect the rates reported in the later years to be slightly underestimated. However, we observed a reduction in standardised mortality rates between 2007 and 2010 (although less pronounced) and we would not expect a relationship between late death registration and QOF quality of care that would bias the regression analyses results. Third, the index of multiple deprivation was collinear with various measures of poor health and deprivation collected in the 2011 census—unsurprisingly since it is an aggregate of income, employment, health, education, housing, crime, and environmental deprivation in the locality. We excluded census measures from the analyses because the coefficients were uninterpretable, although they explained some additional variability in the outcomes. Fourth, not everyone resident in a locality at time of death is registered with a participating general practice. However, over 95% of the UK population is estimated to be registered with a practice, and over 99% of registered patients attend practices participating in the QOF.47 Fifth, we were unable to attribute practice performance to LSOAs for the first two years of the QOF (2004/5 and 2005/6), the period when practices made the greatest improvements in performance.48 Sixth, the effect of the scheme might be delayed and the three year lag between care and outcome could be too narrow a time window. However, analyses with larger lag periods are prone to methodological problems, such as in accounting for migration.
In addition, we made numerous spatial weighted estimation assumptions. Under our first approach, we modelled patients as attending practices in their locality, which we considered a realistic assumption given existing restrictions on registration. Although patient dissatisfaction with access to primary care during work hours was identified as early as 2010,49 policy has not yet changed to allow registration with a practice close to work rather than close to home (a trial scheme was launched in April 2012,40 after the end of our study period). However, we had to assume that patients will always register with the practice(s) in their residing LSOA, provided one (or more) exists, an assumption that might be difficult to justify in urban areas where there can be a few options within walking distance. According to the 2014 attribution dataset, that was the case for around 60% of patients and not all, although we would expect that rate to be higher before the introduction of the 2012 trial scheme. If we assume that the 60% figure applies to all the years we analysed, a considerable amount of noise would have been introduced in the spatial estimates, implying that we would be unable to detect weak relationships between them and the outcome. Other characteristics of this approach included an estimation limit of the five “closest” practice-hubs, which, although arbitrary, we argue is reasonable; it might be too limiting to have fewer than five in areas surrounded by practice-hubs, and it is unrealistic for practices in one city to affect levels of care in another. In assigning weights to practice-hubs, we decided that ranking on the product of list size and proximity was appropriate as this limits the effect of the few very small practices. We repeated the analyses using a proximity ranking strategy and obtained almost identical results. Under our second approach, we used the 2014 attribution dataset to calculate the contribution of primary care practices and estimated the quality of care provided to the population of each locality. We assumed these contribution rates remained constant over time, and we used regression models to estimate the contribution of practices that had closed down or merged by 2014—assumptions and estimates that are not infallible. However, we arrived at similar results and conclusions from two very different starting points.
Under both spatial approaches, there is uncertainty in the estimates which we could not include in the models because of methodological and software limitations. However, we conducted numerous sensitivity analyses under different assumptions to address this issue. Overall, these limitations and assumptions might have attenuated the relationship between recorded quality of care and mortality, but we would expect a relationship of reasonable strength to have been detectable, as was the case for estimated morbidity burden.
The spatial weighted estimation approach is necessary when access to the Primary Care Mortality Database is not available,50 and it allows the analyses to be accurately controlled for the two strongest predictors of mortality (urbanicity v rurality and area deprivation in the patient locality) and other census-measured covariates, while accounting for spatial auto-correlation. An analysis at the practice level would have to use practice location proxies for deprivation and urbanicity v rurality, and census measured covariates would be unavailable, but would not have to make any spatial estimation assumptions for quality of care and the morbidity burden. We aim to verify our findings with the practice-level analyses if and then the data become available to researchers.
Although, from a statistical point of view, spatial regressions are more suitable for our data, we decided to use outputs from standard regression models as our main results for several reasons: (1) not all spatial regression models converged; (2) the auto-correlation estimates, although statistically significant in most cases, were extremely low compared with the error term (which is reasonable: deaths in an LSOA are generally not “spatially” linked to deaths in neighbouring LSOAs); (3) each model takes days or weeks to run in a dedicated modern server; (4) findings across all analyses, including spatial regressions, were almost identical; and (5), with the standard regression models, we could report easily interpretable model-fit measures (such as R2) and avoid further complicating the paper with spatial parameters and their interpretation. We also considered Poisson regression analyses with death counts as the outcome, but that approach was more problematic than linear regressions with standardised mortality rates, since collinearity issues did not allows us to accurately control the analyses for the distribution of the population within each LSOA.
Finally, this is an observational study and the potential for unmeasured confounding is always a possibility. For example, secondary care could be consistently very good where primary care is very poor and vice versa; thus the aggregate population effect from the two levels would be relatively constant, and we would fail to observe an effect for primary care. Although such a scenario is unlikely, the effect of high quality primary care on mortality will be attenuated by the (unknown) quality of secondary care in the LSOA.
Findings
We found that overall quality of care provided by practices—as measured by achievement across all clinical QOF indicators—was not associated with mortality rates in their localities for conditions covered by the QOF. There remained no association when potential effects were lagged for up to three years. This finding seems to contradict previous evidence that processes incentivised in the QOF increase longevity,20 but there are several possible explanations for this null effect. For most QOF indicators, levels of performance improved at the fastest rate over the first two years of the programme and were generally high from the third year of the scheme onwards, with relatively little variation between practices. It is therefore possible that we failed to detect a global mortality dividend predominantly gained in the first two years of the programme. It is also possible that performance has been exaggerated by practices responding to financial incentives and that actual levels of achievement are lower than reported. However, when we examined the associations between performance on intermediate outcomes indicators for specific diseases and related mortality we also found no effect. This applied both to indicators measured by practices and to indicators measured by third parties (for example HbA1c levels). Variation in practice performance is also substantial for most intermediate outcomes indicators (for example, the interquartile range for HbA1c≤7.5% was 57.3% to 68.2% in 2011-12), and we would therefore expect to detect associations between performance on these indicators and mortality.
It might be the case that the indicators of the scheme need to be reconsidered and better aligned with existing evidence. For example, clinical trial findings indicate that intensive glucose control is associated with increased mortality,51 especially risk of cardiovascular death in younger patients,52 while observational studies have generally demonstrated U-shaped relationships between levels of HbA1c in diabetic patients and death.53 54 55 Similar U-shaped relationships have been observed for other biometric measurements, including blood pressure and total cholesterol levels.55 56 These non-linear patterns might suggest that target values (sich as ≤7.5 mm Hg for HbA1c in 2011-12) are suboptimal measures of high quality of care and that target ranges might be more suitable.
Our regression models suggest that area characteristics, such as material deprivation and urbanicity versus rurality, or unmeasured factors associated with these characteristics have a greater impact on mortality than variations in quality of care provided by general practices. However, our models explained relatively little variation in mortality rates. Given that mortality rates fell substantially for several QOF conditions during the course of the programme, in particular cerebrovascular disease and coronary heart disease, this suggests that improvements were due to factors outside primary care, or at least not incentivised under the QOF. For coronary heart disease, previous studies suggest that the main drivers behind mortality reductions have been population improvements in risk factors, mainly declining rates of smoking.57 While the QOF included indicators relating to several risk factors, many of these incentivized processes (for example, recording smoking status and offering cessation support) rather than improvements in outcomes (such as reducing smoking rates).
Conclusions
We found that perceived improvements in performance incentivised under a nationwide pay-for-performance programme were not associated with a subsequent reduction in premature mortality rates. This suggests that the impact of the incentive scheme has fallen far short of previous estimates,20 although it is possible that there have been significant population benefits in terms of reduced morbidity incidence or improved quality of life, and that longer term mortality reductions will ultimately accrue. The apparent lack of large effect on mortality over the medium term may suggest that the QOF may not have been an optimal investment of health service resources, but mortality rates need to be further investigated within primary care practices, accounting for the quality of local secondary care services.58 If incentive schemes continue to be used in primary care with the intention of improving population outcomes, indicators will need to be reconsidered and better aligned with evidence on which activities contribute to reduction of premature mortality.
What is already known on this topic
The Quality and Outcomes Framework (QOF) is a UK pay-for-performance programme that has cost approximately £1bn per annum since 2004
Under the scheme, performance on numerous clinical indicators is incentivised, and achievement rates have increased—though partially due to pre-2004 trends
Evidence for improved patient outcomes seems contradictory, with decreased emergency hospital admission rates for diabetes but not for other incentivised conditions
No studies to date have examined the relationships between recorded practice performance under the QOF and death rates for conditions included in the programme
What this study adds
Material deprivation was by far the strongest predictor of mortality, and, when controlling for this and other covariates, we failed to observe a relationship between QOF achievement and mortality
This suggests that the impact of the incentive scheme has fallen short of previous estimates and indicators will need to be reconsidered and better aligned with evidence
Notes
Cite this as: BMJ 2015;350:h904
Footnotes
We thank the Health and Social Care Information Centre and the Office of National Statistics for the wealth of information they have collected and systematically organised, which made this study possible.
Contributors: EK and TD designed the study. EK extracted the data from all sources and performed the analyses, while some sensitivity analyses were performed by DAS. EK and TD wrote the manuscript. DAS, MA, RTW, and IEB critically edited the manuscript. EK is guarantor of this work and had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Competing interests: All authors have completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: EK was partly supported by a NIHR School for Primary Care Research fellowship in primary healthcare; TD was supported by a NIHR Career Development Fellowship. The views expressed are those of the authors and not necessarily those of the NHS, the National Institute for Health Research, or the Department of Health. No other relationships or activities could appear to have influenced the submitted work.
Transparency: EK affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.
Data sharing: Most of the data used in this study are freely available, and the authors are happy to share an organised and cleaned final dataset, except for the mortality data, which were obtained under a sharing agreement from the Health and Social Care Information Centre.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.