Impact of Financial Incentives on Early and Late Adopters among US Hospitals: observational study
BMJ 2018; 360 doi: https://doi.org/10.1136/bmj.j5622 (Published 04 January 2018) Cite this as: BMJ 2018;360:j5622- Igna Bonfrer, assistant professor12,
- Jose F Figueroa, instructor of medicine134,
- Jie Zheng, senior research statistician1,
- E John Orav, associate professor345,
- Ashish K Jha, professor136
- 1Department of Health Policy and Management, Harvard T H Chan School of Public Health, 42 Church St, Cambridge, MA 02138, USA
- 2Erasmus School of Health Policy & Management, Erasmus University Rotterdam, Rotterdam, Netherlands
- 3Department of Medicine, Harvard Medical School, Cambridge, MA, USA
- 4Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- 5Department of Biostatistics, Harvard T H Chan School of Public Health, Cambridge, MA, USA
- 6Department of General Internal Medicine, VA Boston Healthcare System, Boston, MA, USA
- Correspondence to: A Jha ajha{at}hsph.harvard.edu
- Accepted 17 November 2017
Abstract
Objective To examine how hospitals that volunteered to be under financial incentives for more than a decade as part of the Premier Hospital Quality Incentive Demonstration (early adopters) compared with similar hospitals where these incentives were implemented later under the Hospital Value-Based Purchasing program (late adopters).
Design Observational study.
Setting 1189 hospitals in the USA (214 early adopters and 975 matched late adopters), using Hospital Compare data from 2003 through 2013.
Participants 1 371 364 patients aged 65 years and older, using 100% Medicare claims.
Main outcome measures Clinical process scores and 30 day mortality.
Results Early adopters started from a slightly higher baseline of clinical process scores (92) than late adopters (90). Both groups reached a ceiling (98) a decade later. Starting from a similar baseline, just below 13%, early and late adopters did not have significantly (P=0.25) different mortality trends for conditions targeted by the program (0.05% point difference quarterly) or for conditions not targeted by the program (−0.02% point difference quarterly).
Conclusions No evidence that hospitals that have been operating under pay for performance programs for more than a decade had better process scores or lower mortality than other hospitals was found. These findings suggest that even among hospitals that volunteered to participate in pay for performance programs, having additional time is not likely to turn pay for performance programs into a success in the future.
Introduction
OECD (Organisation for Economic Co-operation and Development) countries use financial incentives to improve the quality of healthcare.1 These incentives are increasingly being used in low and middle income countries.2 The USA and the UK are probably the furthest ahead, but other countries are monitoring the experiences of these countries to determine what they might do. Providing incentives to improve the performance of hospitals has become common in the USA over the last decade.345 The Premier Hospital Quality Incentive Demonstration (HQID), a voluntary program run by the Centers for Medicare and Medicaid Services,67 ran from 2003 to 2009. It became the model for the national Hospital Value-Based Purchasing (HVBP) program, adopted after the passage of the Affordable Care Act,89 which has run since 2011 (details on both programs are shown in web appendices 1 to 3). The hospitals that were invited by Centers for Medicare and Medicaid Services and then participated in the HQID (early adopters) have been under financial incentives to improve quality for more than a decade and are likely to have a comparative advantage, having had more time to refine their delivery of care and strategically focus on improving patient outcomes. Previous evaluations of the HQID and the HVBP program showed limited impact on process measures but no improvements in patient outcomes or in cost reduction.1011121314151617181920 However, advocates argue that it takes time for hospitals to make meaningful improvements and that we need patience to better understand how delivery of care under pay for performance programs changes care.
Improving outcomes is difficult; it can require changes to workflows, restructuring the way providers are paid, and alignment of information technology systems.2122 The national HVBP program, which attaches incentives directly to outcomes of Medicare beneficiaries, has been in effect for a few years, and early evidence suggests that it has had little impact on patient outcomes.18 One might expect that most hospitals have not yet been able to equip themselves adequately to make meaningful improvements to patient care. However, the early adopters of pay for performance programs, the Premier HQID hospitals, have been under financial incentives since 2003 (except for 2009 to 2010 when the HQID ended and before the HVBP program was implemented).2324 These early adopters have likely had enough time to make the difficult structural changes which are necessary to improve outcomes. Given that early adopters volunteered to be in the HQID, they likely represent the best case scenario of how much improvement we might expect over a longer period. Whether the early adopters have outpaced late adopters or whether the HQID had spillover effects on the late adopters is unclear – and empirical evidence would be helpful for healthcare policymakers as the new administration makes changes to, or replaces, the Affordable Care Act.
As we continue to shift to value based payments in healthcare,25 it is crucial to understand the long term effects of financial incentives on quality of care and effectiveness of HVBP programs. Therefore, we used national Medicare claims data from 2003 to 2013 to answer three questions. First, how do early adopters that had participated in the HQID perform under the HVBP program compared with similar hospitals that did not participate in the HQID (late adopters) in terms of clinical process scores? Second, how do early adopters compare with late adopters on 30 day mortality outcomes for the three target conditions: acute myocardial infarction, congestive heart failure, and pneumonia? And third, are there any spillover effects for 30 day mortality on non-target conditions?
Methods
Data and matching
We identified all 2702 American acute care hospitals participating in Hospital Value-Based Purchasing (HVBP) programs in 2013 using publicly available Hospital Compare data. Of these hospitals, 233 had also voluntarily participated in the Hospital Quality Incentive Demonstration (HQID) from 2003 through 2009, and we defined these hospitals as early adopters (for details see web appendix 4). We obtained baseline hospital characteristics from the 2003 American Hospital Association Annual Survey.26
Previous work has shown that different hospital types are important to consider when examining patient outcomes under pay for performance programs.27 To limit the potential bias arising from differences in observable hospital characteristics between early and late adopters, including volunteering to participate in the HQID, we applied coarsened exact matching.28 This matching method, like other matching methods, prunes observations from the data. The remaining data have a better balance between the treated and control groups for the baseline of the outcome measure and for the following observed hospital characteristics: size, region, ownership, teaching status, location, presence of an intensive care unit, and safety net status (defined as the top 25% of hospitals with the largest disproportionate share index). The advantage of coarsened exact matching over other matching methods is that the resulting data are exactly balanced and do not need to be controlled further because both groups have exactly the same observed hospital characteristics (for details see web appendix 4).28
Data on clinical process scores (range from 0 to 100) for each hospital were obtained from Hospital Compare.2930 Clinical process scores were available for the three HVBP program target conditions (acute myocardial infarction, congestive heart failure, and pneumonia) from 2004 through 2014 (latest available year) (see web appendix 1 and 4). The clinical process score used in this study is the measure that the Centers for Medicare and Medicaid Services uses to determine the clinical process score of hospitals. These clinical process scores assess whether “what is known to be ‘good’ medical care has been applied.”3132 These process measures indicate whether or not a healthcare provider gives the recommended care to patients with a particular condition.33
Using the matched dataset, we identified the subset of Medicare beneficiaries admitted to the hospital from the 100% Medicare inpatient fee-for-service claims for 2003 through 2013 (latest available year at time of analysis). We included 1 371 364 patients with any of the three HVBP program target conditions or a selected non-target condition (stroke, gastroenteritis and esophagitis, gastrointestinal bleed, urinary tract infection, metabolic disorder, arrhythmia, and renal failure) (see web appendix 4).
Statistical analysis
Using the dataset of matched hospitals and their associated patients, we performed a segmented linear regression analysis to estimate differences in trends over time for outcome measures for both early and late adopters. Using these segmented linear models, we assessed the trends in outcome measures over three periods: the HQID period (fourth quarter of 2003 to fourth quarter of 2009), the pre-HVBP period (first quarter of 2010 to second quarter of 2011), and the HVBP period (third quarter 2011 to fourth quarter of 2013), and the associated years for the clinical process scores which are based on annual as opposed to quarterly data. We expected differences in trends to increase over time as the duration of the exposure to incentives increases.
Clinical process scores
Using the annual hospital level data, we estimated the relation between early adopter status and clinical process scores. The linear segmented regression model (see web appendix 4 for a formal description of all models used) included a binary variable to determine whether a hospital was an early adopter and further corrected for the underlying annual time trend in the HQID, pre-HVBP, and HVBP periods as well as the interactions between the time and early adopter variables. We included a stratum fixed effect that represented the strata of comparable hospitals based on the coarsened exact matching and a hospital random effect following the Centers for Medicare and Medicaid Services methods to account for correlation between patients within each hospital and for correlation over time. We weighted our analyses for hospital volume. We confirmed robustness of our findings using a hospital fixed effect, controlling for unobserved hospital characteristics that were time invariant (not shown).
30 Day mortality
Using quarterly patient level data, we estimated a similar model for mortality within 30 days after admission, standardized for age, gender, and comorbidities. We examined 30 day mortality data for each quartile for the combination of target conditions, for the three target conditions separately, and for the non-target conditions over the fourth quarter of 2003 to the fourth quarter of 2013. In addition, the segmented regression models included a correction for seasonality; the target condition for which a patient was admitted; the underlying time trend in the HQID, pre-HVBP, and HVBP periods by quarter; and patient characteristics (age, sex, race, and comorbidities) (see web appendix 4).
Differences in trends
Based on the estimates from the different linear segmented regressions, we compared differences between early and late adopters for each of the three periods and subsequently compared these differences for the HVBP period with those in the pre-HVBP period to determine whether the outcomes improved differently over time for early and late adopters.
All analyses were performed using software packages SAS version 9.4 or Stata version 14.0.
Patient involvement
No patients were involved in setting the research question or the outcome measures, nor were they involved in developing plans for implementation of the study. No patients were asked to advise on interpretation or writing up of results. There are no plans to disseminate the results of the research to study participants or the relevant patient community.
Results
Late adopters were frequently dropped during the matching process for being dissimilar to the early adopters with regards to the following characteristics: small size; located in the north east and west; for profit or public; non-teaching; small rural and large rural; and safety net hospital status. This resulted in a matched dataset of 1189 hospitals that are mostly medium or large, private not for profit, and based in urban areas. Table 1 shows the baseline hospital characteristics before and after matching. The associated sample of individuals consists of 263 088 patients admitted to a hospital classed as an early adopter, and 1 108 276 admitted to a late adopter. Table 2 shows the patient characteristics before and after matching. As we would expect from a matching on hospital characteristics, the patient characteristics across early and late adopters still differed after matching, mostly in terms of race, with early adopters caring for a larger share of white patients (87.4% v 84.9%, P<0.001). All observed patient characteristics were controlled for in the regression models.
Clinical process scores
Clinical process scores were incentivized during both the Hospital Quality Incentive Demonstration (HQID) and the current Hospital Value-Based Purchasing (HVBP) periods. Figure 1 shows that early adopters started from a slightly higher baseline clinical process score in 2004. Table 3 shows that early adopters had an average score of 91.5 versus 89.9 for late adopters in the HQID period for the combined target conditions. Improvements among the early adopters were smaller during the HQID period (difference −0.21, 95% confidence interval −0.31 to −0.11), although early adopters continued to perform at a slightly higher level than the late adopters (−0.55, −1.01 to −0.10) during the pre-HVBP period. Over the HVBP period, early and late adopters no longer differed in their clinical process scores. In the HVBP period, the increase in the clinical process score reached a ceiling where early and late adopters approach the same level: 98.5 versus 98.2 with a difference of −0.27 (95% confidence interval −0.77 to 0.22). Comparing the difference in the differences in the trend for the HVBP period with the pre-HVBP period, we found that there is no significant difference (P=0.19) across early and late adopters (last two columns of table 3). Estimates for the individual target conditions confirmed these patterns (web appendix 5).
30 Day mortality
Figure 2 shows that mortality fell for both early and late adopters during the study period. Both groups started from a similar baseline (14.9% and 14.8% for the early and late adopters in the fourth quarter of 2003) and ended at the same rate of 9.9% for both groups in the fourth quarter of 2013. Table 4 shows that the average mortality is slightly higher among late adopters for each period. This pattern is not confirmed by the underlying mortality from individual target conditions (web appendix 6). For the non-targeted conditions, we found no significant differences (P=0.48) in average mortality by period. In formal testing with the linear segmented regression models, we found that the reduction in mortality was comparable for early and late adopters during the HQID period (difference 0.00% points, 95% confidence interval −0.01 to 0.01). The reduction in mortality (−0.04% points for early adopters and −0.02% points for late adopters) slowed down after the abolition of the financial incentives. There continued to be no noticeable differences in mortality reductions between early and late adopters over the pre-HVBP (difference −0.02% points, 95% confidence interval −0.06 to 0.02) and HVBP (0.02, −0.02 to 0.07) period. Comparing the difference in the differences in the trends for the HVBP versus pre-HVBP period, we found no noticeable effect (0.05, −0.03 to 0.13). This suggests that the HQID did not have a meaningful effect on mortality through 2013, even though the hospitals had a decade of experience and volunteered to participate in the demonstration. These results, as expected, did not lead to spillover effects on mortality for the non-targeted conditions. The reduction in mortality from non-targeted conditions during the HQID period was comparable across early and late adopters (0.00, 0.00 to 0.01). After the abolition of the financial incentives, there continued to be no noticeable differences in mortality reductions between early and late adopters (0.00, −0.03 to 0.03) nor were these differences present over the HVBP period (−0.02, −0.05 to 0.01). Comparing the difference in the differences in the trends for the HVBP with the pre-HVBP period, we found no noticeable effect (−0.02, −0.07 to 0.03). Finally, when examining the individual target conditions, we found qualitatively similar effects (web appendix 7).
Discussion
While hospitals should be rewarded for better outcomes, not just for the number of inputs, the impact of pay for performance programs has been limited and disappointing. However, there has been a longstanding view that it may take time to make meaningful improvements to care under financial incentive programs. We examined how a group of hospitals, having effectively been under a pay for performance program for more than a decade, fared under the current Hospital Value-Based Purchasing (HVBP) program compared with control hospitals with far less experience in a pay for performance program. We found that despite substantial time, the early adopter Premier Hospital Quality Incentive Demonstration (HQID) hospitals did not have better outcomes and had only marginally better adherence to clinical process measures compared with a group of controls over time. Clinical process scores reached a ceiling for both groups, suggesting that the comparative advantage of the early adopters reduced over time, allowing the late adopters to catch up.
We found, contrary to our hypothesis, that the improvements in clinical process scores were smaller for the early adopters during the HQID period, even though they did perform at a slightly higher level. We further found no noticeable difference in the trends across early and late adopters for the HVBP versus pre-HVBP period. It might have been the case that the definition and communication of specific, measurable processes to provide good quality care as part of the highly visible HQID, resulted in non-participating hospitals also improving their processes, because of a more intrinsic motivation to provide good quality care. The inevitable interaction between early and late adopters might have further led to spillover effects by healthcare personnel teaching each other about standards and approaches to improve quality.
Our findings have important implications for the way policymakers should approach pay for performance programs. These findings provide evidence that having additional time is not likely to turn these programs into a success, at least as far as patient outcomes are concerned. Even for clinical processes, while HQID hospitals began with better performance at baseline, by the end of the study period, the gap between early and late adopters was gone – presumably because of a ceiling effect. This suggests that there is a need to change the measures for clinical process scores.
The limited effects after more than a decade of financial incentives might be explained by different factors. First, the incentives are very small (see web appendix 1 for size and timing of performance bonuses), and given that these incentives only cover a small set of conditions and that hospitals receive revenue from a multitude of payers, this modest incentive is diluted, limiting its impact.34 However, hospital margins tend to be small, usually only a few percentage points, so one might assume that these Medicare payments could still motivate many hospitals.7 Second, the program was extremely complex (in terms of its structure and how hospitals were incentivized), making it more difficult for hospitals to engage meaningfully in the program. Previous work has shown that when the incentivized measure is clear and simple (eg, hospital readmissions in the Hospital Readmission Reduction Program),35 it is possible to observe the impact of financial incentives on performance even within a year after implementation. Finally, given that studies suggest that people tend to discount future gains when the delay in receiving the benefit increases, it is possible that having to wait until the end of the year to receive bonuses or penalties might have reduced the impact.36
Weaknesses of this study
There are limitations to this study. First, observable hospital characteristics used for coarsened exact matching and observable patient characteristics used as covariates in the linear segmented regression models might not sufficiently capture unobserved differences between the early adopters who voluntarily joined the demonstration and the matched late adopters. This might lead to an overestimation of the effect of having more time to adapt to financial incentives because part of this effect might be driven by unobserved characteristics that make those who voluntarily joined different from those who did not (ie, primed to be successful). However, we found virtually no effects so an upwards bias seems unlikely, and we also have a relatively good comparison group so there is no obvious reason to expect large unobserved differences between both groups of acute care hospitals and their patients. Second, generalization to the general population is limited because this study only looks at patients aged 65 and older. However, Medicare covers more than 55 million people and accounts for more than 20% of the total American health spending.37 Third, given the other interventions being implemented simultaneously by Centers for Medicare and Medicaid Services, especially the public reporting under the Hospital Compare initiative for process measures (since 2005) and of 30 day mortality (since 2008), it is difficult to determine the pure effect of pay for performance financial incentives. This might have led to an underestimation of the effects studied here. However, because these interventions were implemented across all hospitals studied, both early and late adopters, we would expect an effect on the outcome levels but not on the differences in the trends across both groups. In other words, the average clinical process scores across early and late adopters might have changed as a result of public reporting, but we expect these changes to be similar across both groups, therefore not affecting the differences in the trends across both groups. Related to this, because the indicators for the HQID were published, late adopters might have already been aware of quality measures to be used for the targeted conditions, in turn already improving their clinical process scores. Next, when comparing trends in clinical processes of HQID and non-HQID hospitals, given that clinical process measures are top coded, it is possible that it may represent an artifact of top coding as most hospitals may be approaching their asymptote.14 Finally, we could not study the cost effectiveness of either of the two pay for performance programs. So far there have been mixed findings on the effectiveness of HVBP programs to simultaneously improve quality and control costs.38
Conclusion
We found that hospitals that have been under financial incentives for more than a decade have not been able to reduce patient mortality more than the late adopters, which had only been under financial incentives for less than three years. The HVBP program as currently structured, is not living up to the promise advocates originally envisioned. Given its cost, policymakers in the USA should consider one of two things: revise the current program or potentially end it. Given major efforts to shift more and more payments toward a value based framework, it is more likely that the program will be revised rather than completely stopped. There are a few ways that this may be done. First, the program should consider increasing the incentives. Currently, the small amount of money at stake is arguably not enough to change the way hospitals are doing business. Second, policy makers should focus on a few measures that matter most to patients (mortality, patient experience, and functional status). The current structure includes numerous measures that are difficult to track and measure by hospital leaders, and therefore, hard to improve. Given the growing worldwide interest in pay for performance programs and the unclear American health policy agenda,39 these findings should be considered by policymakers when assuming that programs like these simply need more time to have a meaningful effect.
What is already known on this topic
Previous studies have found that pay for performance programs have had limited impact on process measures and no impact on patient outcomes
No study has examined whether early adopters of pay for performance programs (ie, Premier Hospital Quality Incentive Demonstration) out performed late adopters of pay for performance programs (ie, Hospital Value-Based Purchasing)
What this study adds
Clinical process scores or 30 day mortality for Medicare beneficiaries were not found to be better at hospitals that have been operating under pay for performance programs for more than a decade
Pay for performance programs as currently implemented are unlikely to be successful in the future, even if their timeframes are extended
Acknowledgments
We thank the discussant and participants of the ASHEcon biennial conference in Philadelphia, participants of the Annual Research Meeting of AcademyHealth in Boston, and participants of the Health Economics seminar at Erasmus University Rotterdam for their valuable insights.
Footnotes
Contributors: All authors contributed to the design and conduct of the study; data collection and management; interpretation of the data; and preparation, review, or approval of the manuscript. Data analyses were performed by IB and JZ. IB, JFF, and AKJ are the guarantors.
Funding: IB received funding from a Rubicon Fellowship provided by the Netherlands Organization for Scientific Research. The funders had no role in the study design; collection, analysis, and interpretation of data; writing of the report; or decision to submit the article for publication.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Data sharing: No additional data are available.
Transparency: The lead author (IB) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.