Prediction of rates of thromboembolic and major bleeding outcomes with dabigatran or warfarin among patients with atrial fibrillation: new initiator cohort studyBMJ 2016; 353 doi: https://doi.org/10.1136/bmj.i2607 (Published 24 May 2016) Cite this as: BMJ 2016;353:i2607
- Shirley V Wang, assistant professor, associate epidemiologist1,
- Jessica M Franklin, assistant professor, biostatistician1,
- Robert J Glynn, professor1,
- Sebastian Schneeweiss, professor1,
- Wesley Eddings, research specialist1,
- Joshua J Gagne, assistant professor1
- 1Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, 1620 Tremont St, Boston, MA 02120, USA
- Correspondence to: S V Wang
- Accepted 1 May 2016
Objectives To compare stratified event rates from randomized controlled trials with predicted event rates from models developed in observational data, and assess their ability to accurately capture observed rates of thromboembolism and major bleeding for patients treated with dabigatran or warfarin as part of routine care.
Design New initiator cohort study.
Setting Data from United Health (October 2009 to June 2013), a commercial healthcare claims database in the United States.
Participants 21 934 adults with atrial fibrillation initiating dabigatran (150 mg dose only) or warfarin treatment as part of routine care.
Main outcome measures Predicted annual rates of thromboembolism or major bleeding, based on estimates from randomized controlled trials, models developed in routine care patients, and baseline risk scores (CHADS2, CHA2DS2-VASc, and HAS-BLED). Thromboembolism was a composite outcome, including primary inpatient diagnosis codes for ischemic or ill defined stroke, transient ischemic attack, pulmonary embolism, deep vein thrombosis, and systemic embolism. Major bleeding was a composite outcome including codes occurring in an inpatient setting for hemorrhagic stroke; major upper, lower, or unspecified gastrointestinal bleed; and major urogenital or other bleed.
Results 6516 (30%) and 15 418 (70%) of patients initiated dabigatran and warfarin, respectively. Annual event rates per 100 patients were 1.7 for thromboembolism and 4.6 for major bleeding. For thromboembolism, calibration of estimates from randomized controlled trials was similar to calibration for model based predictions; however, trial estimates for major bleeding consistently underestimated the rate of bleeding among patients in routine care. Underestimation of bleeding rates was particularly pronounced in warfarin initiators with high HAS-BLED scores, where event rates were underestimated by up to 4.0 per 100 patient years. Harrell’s c indices for discrimination for thromboembolism or major bleeding in dabigatran and warfarin initiators ranged between 0.59 and 0.66 for randomized controlled trial predictions, and between 0.52 and 0.70 for cross validated model based predictions.
Conclusion Estimated rates of thromboembolism under dabigatran or warfarin treatment in randomized controlled trials were close to observed rates in routine care patients. However, rates of major bleeding were underestimated. Models developed in routine care patients can provide accurate, tailored estimates of risk and benefit under alternative treatment to enhance patient centered care.
Widely recognized risk scores such as the CHADS2 score or CHA2DS2-VASc score for stroke and thromboembolism and HAS-BLED score for major bleeding are used to estimate individual risk of these outcomes among patients with atrial fibrillation.1 2 3 4 The CHADS2 score accounts for congestive heart failure, hypertension, age (≥75 years), diabetes mellitus, and previous stroke/transient ischaemic attack/thromboembolism (doubled risk weight), whereas the CHA2DS2-VASc score also accounts for vascular disease, age 65-74 years, and sex category (age≥75 years and previous stroke carry doubled risk weight). The HAS-BLED score accounts for (age>65 years, hypertension, abnormal renal and liver function, prior stroke, bleeding history (or predisposition), drugs predisposing to bleed, alcohol use disorders, and labile international normalized ratio.
While these risk scores provide baseline estimates of risk to guide whether to treat with an oral anticoagulant, they do not provide insight into optimal treatment selection when treatment is clearly indicated. The efficacy of oral anticoagulants in preventing thromboembolism in patients with atrial fibrillation is well known, with many studies reporting over 60%5 6 7 8 reduction in stroke for patients with non-valvular atrial fibrillation. Also known is the potential for anticoagulants to increase the risk of major bleeding.6 7 8
The updated 2012 European Society of Cardiology guidelines8 and the National Institute for Health and Care Excellence guidelines9 recommend that patients with at least one risk factor included in the CHA2DS2-VASc score be considered for anticoagulation therapy (excluding female patients under age 65 years with lone atrial fibrillation), and that anticoagulation should be offered to patients with atrial fibrillation and a CHA2DS2-VASc score of 2 or more.8 9 However, which anticoagulant to offer remains debated.8 9 10 When a clinician and patient decide to initiate anticoagulation therapy, accurate predictions of benefit (prevention of thromboembolism) or harm (increased risk of major bleeding) with different oral anticoagulation agents can help guide treatment choice.
The Randomized Evaluation of Long-Term Anticoagulation Therapy (RE-LY) trial compared the effectiveness of dabigatran, a novel oral anticoagulation agent, to warfarin. The trial found that, on average, the risk of stroke was 34% lower in participants randomized to receive the 150 mg twice daily dose of dabigatran than in those randomized to receive warfarin.3 Reductions in risk between 20-40% were found in observational studies in large healthcare databases comparing the effectiveness of dabigatran to warfarin.11 12 However, effectiveness of oral anticoagulation therapy could vary with baseline risk of thromboembolic or major bleeding outcomes. To provide evidence based, patient centered care that is tailored to patients with different risk factors, additional evidence beyond an average treatment effect for compared therapeutic options is necessary.
In recent trials comparing novel oral anticoagulant agents with warfarin, rates of thromboembolism or major bleeding under treatment with alternative oral anticoagulation agents were provided for patients at baseline risk of these events. Baseline risk of these events was categorized as low, medium, or high, based on the CHADS2 and HAS-BLED risk scores (low=0-1, medium=2, high risk≥3).1 2 3 4 However, estimates based on the highly selected populations eligible to participate in trials might not reflect the baseline risk among populations actually treated in practice.
The objective of the study was to compare stratified event rates from randomized controlled trials with predicted event rates from models developed in observational data, and assess their ability to accurately capture observed rates of thromboembolism and major bleeding under treatment with dabigatran or warfarin as part of routine care. The purpose of obtaining accurate predicted event rates in patients at varying baseline outcome risk was to answer the clinical question: what are the expected rates of thromboembolism and major bleeding for a given patient if that patient were to initiate treatment with dabigatran or warfarin?
Patients treated as part of routine care
We identified a population of patients initiating dabigatran or warfarin between October 2009 and June 2013 from Optum Life Sciences’ longitudinal database of commercial healthcare claims. This database comprised of claims data on over 14 million enrollees in United Health, a large private insurer in the United States.13 The database includes demographic and enrollment information, diagnoses and procedures from inpatient and outpatient healthcare claims as well as dispensed outpatient medications.
The study cohort comprised patients initiating either warfarin or dabigatran treatment (150 mg dose only). The index date for new initiation was defined as the first claim for dabigatran or warfarin occurring after at least 365 days of enrollment with United Health without a filled claim for an oral anticoagulation agent (dabigatran, warfarin, rivaroxabanor apixaban). We restricted the cohort to include patients over the age of 18 years with at least one recorded diagnosis of atrial fibrillation according to ICD-9 (international classification of diseases, 9th revision). We further restricted the cohort to the overlapping region of the propensity score.14 A flow diagram15 for construction of the cohort and additional details on propensity score estimation and trimming are available in the web appendix.
Outcomes, follow-up, and covariates
For each patient, follow-up for outcomes began the day after treatment initiation. Patients were censored at occurrence of the outcome of interest, treatment discontinuation (14 day grace period from end of days’ supply), death, disenrollment, prescription dispensation of another anticoagulant, or end of available data. Thromboembolism was a composite outcome, including primary inpatient diagnosis codes for ischemic or ill defined stroke, transient ischemic attack, pulmonary embolism, deep vein thrombosis, and systemic embolism. Major bleeding was a composite outcome including codes occurring in an inpatient setting for hemorrhagic stroke; major upper, lower, or unspecified gastrointestinal bleed; and major urogenital or other bleed. Outcomes and risk factors were defined based on ICD-9 diagnosis, ICD-9 procedure, or Current Procedural Terminology codes. Code lists and citations to validation studies are available in the web appendix. As a supplementary analysis, we evaluated a thromboembolism outcome that did not include transient ischemic attack, pulmonary embolism, or deep vein thrombosis.
We assessed the presence or absence of known risk factors included in the CHADS2 and HAS-BLED scores using claims occurring on the index date or during the year before. Although recent randomized controlled trials used the CHADS2 score to categorize participants into low, medium, and high risk categories, the CHADS2 score was updated after the trial protocols were conceived. The updated risk score, CHA2DS2-VASc, includes all risk factors in CHADS2 as well as additional risk factors for predicting thromboembolism and is regarded to be a better score for stratifying risk of thromboembolism in patients with atrial fibrillation.1 2 Because we are able to measure the updated CHA2DS2-VASc score risk factors within observational data, we also created and used indicators for those risk factors as secondary analyses.
Predicting rates of thromboembolic events and major bleeding on dabigatran versus warfarin
Figure 1⇓ shows the steps for obtaining predicted rates of thromboembolism and major bleeding from either randomized controlled trials or cross validated models fit to routine care data.
Predictions from randomized controlled trials
Estimated rates from randomized controlled trials of thromboembolic and major bleeding events within low, medium, and high risk categories were obtained from publications of the RE-LY trial comparing dabigatran to warfarin and the Apixaban for Reduction in Stroke and Other Thromboembolic Events in Atrial Fibrillation (ARISTOTLE) trial comparing apixaban and warfarin (additional details in the web appendix).4
The RE-LY trial published rates of stroke or systemic embolism for participants within three categories of CHADS2 (0-1, 2, ≥3) for participants randomized to receive either warfarin or a 150 mg dose of dabigatran.16 We applied these rates to the population treated as part of routine care by calculating the CHADS2 score in the routine care population, categorizing initiators of dabigatran or warfarin into the three risk categories, and then assigning each patient the rate reported from the trial according to exposure and risk stratum.
Although annual rates of major bleeding in the RE-LY trial were published for the overall study population, estimates were not provided for risk categories of the HAS-BLED score.3 Owing to the lack of published evidence on bleeding event rates stratified by baseline risk from the RE-LY trial, we obtained bleeding event rates within three categories of the HAS-BLED score (0-1, 2, ≥3) in patients randomized to the warfarin arm from publications for a trial comparing a different novel oral anticoagulant agent, apixaban, with warfarin (ARISTOTLE).17
We applied the annual rates of major bleeding within low, medium, and high risk categories of HAS-BLED from patients receiving warfarin in the ARISTOTLE trial to initiators of warfarin in routine care, using estimates derived from each of the three definitions of major bleeding used in the trial. Major bleeding in the ARISTOTLE trial was defined using three different criteria: ISTH (International Society on Thrombosis and Haemostasis), TIMI (thrombolysis in myocardial infarction), and GUSTO (global use of strategies to open occluded arteries; details in web appendix).17 18 19 To obtain the annual risk of major bleeding for patients receiving dabigatran in our routine care cohort, we multiplied the predicted annual rate of major bleeding within each HAS-BLED risk group for warfarin initiators by the hazard ratio from the RE-LY trial comparing dabigatran with warfarin on risk of major bleeding.17
Predictions from models developed in routine care patients
We included known thromboembolism and major bleeding risk factors contributing to the CHADS2, CHA2DS2-VASc, or HAS-BLED scores as predictors in Cox proportional hazards models fitted separately in initiators of dabigatran and warfarin. The models for thromboembolism included sex, age indicators (65-74, age≥75 years), prior stroke/transient ischemic attack/thromboembolism, congestive heart failure, hypertension, diabetes, and vascular disease. The models for major bleeding included indicators for age older than 65 years, hypertension, abnormal renal function, abnormal liver function, prior stroke, bleeding history (or predisposition), drugs predisposing to bleed, alcohol use disorders, and labile international normalized ratio.
Because the performance of models that are evaluated in the data that they are developed can be overly optimistic, we conducted repeated 10-fold cross validation for each model (fig 1⇑) and evaluated performance based on the averaged prediction when patients were in validation folds.20
Predictions from baseline risk scores
In addition to use of randomized controlled trials and models developed in routine care patients to predict rates of thromboembolism and major bleeding under treatment with dabigatran or warfarin, we also predicted baseline event rates using published event rates from CHADS2, CHA2DS2-VASc, and HAS-BLED risk scores. In the same manner as used for trial predictions, we applied published annual event rates for different levels of each score to the routine care patients with those scores.1 2 21 For example, if a published event rate for patients with a CHADS2 score of 2 was 4.5 per 100 patient years, then patients in routine care were assigned a predicted event rate of 4.5 per 100 patient years.
These baseline risk scores were originally developed in patients not on oral anticoagulation therapy1 21 or in a mix of treated and untreated patients.2 Baseline risk scores do not take into account the effect of therapy on event rates. The purpose of including predictions from baseline risk scores in our comparison was to highlight that event rates from baseline risk scores should be supplemented with risk stratified event rates while on therapy, in order to facilitate evidence based treatment choices (eg, no treatment v treatment A v treatment B).
We calculated Harrell’s c index as a measure of discrimination for survival models. The c index reflects the ability to distinguish patients with the event from those without the event of interest.22 When a model has discriminative ability no better than chance, the c index will be 0.5, whereas a perfectly discriminating model will have a c index of 1.0.
We assessed the accuracy of our predictions by looking at the calibration of predicted to observed event rates within patients at low, medium, and high baseline risk (eg, 0-1, 2, ≥3) of thromboembolism or major bleeding using categories defined by CHADS2 and HAS-BLED risk scores, respectively. These three categories were used because they were the most granular categories for which we obtained predicted event rates from the randomized controlled trials. We obtained goodness of fit test statistics for calibration of predictions in survival data.23
Predicted rates of thromboembolism and major bleeding within low, medium, and high risk categories
We standardized the distribution of baseline characteristics for each exposure group to match that of the entire population with stabilized inverse probability of treatment weights24 (details in web appendix). We also reported the predicted rate of events for initiators of dabigatran and warfarin using estimates from randomized controlled trials, models developed in routine care patients, and baseline risk scores.
This study was reviewed by the institutional review board at Brigham and Women’s Hospital. The cohort was created using SAS version 9.3. Analyses were conducted using Cran R version 3.0.2.
No patients were involved in setting the research question or the outcome measures, nor were they involved in developing plans for design or implementation of the study. No patients were asked to advise on interpretation or writing up of results. There are no plans to disseminate the results of the research to study participants or the relevant patient community.
In the cohort of 21 934 patients with atrial fibrillation initiating anticoagulation therapy as part of routine care, 30% (n=6516) initiated dabigatran. Annual event rates per 100 patients were 1.7 for thromboembolism and 4.6 for major bleeding. The prevalence of known risk factors for thromboembolism was relatively high (table 1⇓). Over 93% of patients had hypertension and over 25% had type 2 diabetes. Patients initiating warfarin treatment were more likely to be in higher categories of risk for both thromboembolism and major bleeding. These patients were older than dabigatran initiators, more likely to be women, have had prior stroke or transient ischemic attack (20% v 13%), congestive heart failure (27% v 16%), vascular disease (22% v 14%), and more likely to have been admitted to hospital within 30 days before initiation (42% v 32%). Nearly a third (29%) of warfarin initiators had a history of bleeding compared with 16% of dabigatran initiators, and abnormal renal function was more common in warfarin initiators (19% v 8%).
Randomized controlled trial participants were older than patients initiating dabigatran or warfarin as part of routine care, and a similar proportion were female. Trial participants were more likely to have had heart failure and less likely to have diabetes or hypertension than patients initiating warfarin therapy as part of routine care. Both groups had similar proportions of patients with prior stroke or transient ischemic attack. We were not able to compare some of the known characteristics that influence risk of bleeding because these were not reported in the main trial publication.3 Furthermore, the RE-LY trial excluded patients at higher risk of bleeding.25Table 1⇑ presents prevalence of bleeding risk factors from patients randomized to the warfarin arm in the ARISTOTLE trial,4 because this is the population from whom we obtained stratified estimated rates of bleeding while on warfarin. Prevalence of major bleeding risk factors in participants randomized to warfarin was lower than observed in routine care.
Roughly half of the RE-LY trial participants were on long term treatment with vitamin K antagonists before study entry and randomization, whereas we restricted the routine care population to patients with no oral anticoagulant use in the previous 365 days. The trial was conducted in patients with non-valvular atrial fibrillation, whereas in routine care, 14% of dabigatran initiators and 23% of warfarin initiators had valvular heart disease.
The c indices for discrimination based on the randomized controlled trial estimates were 0.59 and 0.66 for thromboembolism in dabigatran and warfarin initiators, respectively, and 0.60 for major bleeding outcomes irrespective of the criteria used to define major bleeding (table 2⇓). The discrimination for model based predictions was higher for warfarin initiators than dabigatran initiators (thromboembolism: 0.70 v 0.52 using CHADS2 risk factors, 0.72 v 0.54 using CHA2DS2-VAScrisk factors; major bleeding: 0.64 v 0.58). The lower performance in dabigatran initiators can be attributed in part to the smaller sample size and fewer events available for model development and validation. CHADS2, CHA2DS2-VASc, and HAS-BLED baseline risk scores were not designed to distinguish risk for patients on different treatments, and had discrimination ranging between 0.58 and 0.67 for thromboembolism and 0.60 and 0.62 for major bleeding.
Figure 2⇓ depicts the difference between predicted and observed rates per 100 patient years within low, medium, and high risk categories of the CHADS2 score for the thromboembolism outcome and the HAS-BLED score for the major bleeding outcome. Here, differences closer to zero reflect more accurate prediction of event rates within risk categories. Overall, the predicted rates of thromboembolism per 100 patient years from randomized controlled trials and models fit the observed rates in low, medium, and high baseline risk categories similarly (P>0.10; table 2⇑). As would be expected, estimated rates per 100 patient years from baseline risk scores (CHADS2 and CHA2DS2-VASc) predicting the rate of events had the patients not been on oral anticoagulant therapy fit the observed rates poorly, overestimating the rate of thromboembolism in dabigatran and warfarin initiators (P<0.001).
Predictions from randomized controlled trials consistently underestimated the rate of major bleeding observed in warfarin initiators (P<0.001); in the most extreme case, by up to 4.0 major bleeds per 100 patient years in patients with HAS-BLED scores of 3 or higher. The trial predictions also fit poorly in dabigatran initiators using each of the different major bleeding criteria (P values ranging from 0.05 to 0.09). Predicted baseline rates per 100 patient years from the HAS-BLED score also underestimated the observed rate of major bleeding in patients who initiated warfarin (P<0.001). Model based predictions had the best fit for the observed rate of major bleeding per 100 patient years in both dabigatran and warfarin initiators (table 2⇑).
Tables 3 and 4⇓ shows the predicted rates of thromboembolism and major bleeding per 100 patient years after adjustment for confounders. The confounders used for adjustment are listed in the web appendix. Table 3⇓ shows how much higher the anticipated thromboembolism event rates would be if patients remain untreated (baseline risk scores CHADS2 and CHA2DS2-VASc) than if they were to initiate either dabigatran or warfarin. The model based and trial based predictions for rate of thromboembolism per 100 patient years in dabigatran and warfarin treated patients were similar across low, medium, and high risk categories. Although the predicted rates of major bleeding per 100 patient years varied depending on the criteria for defining major bleeding events used in the randomized controlled trial, each criterion predicted similar event rates for dabigatran and warfarin treated patients. By contrast, model based predicted rates of major bleeding were higher in warfarin initiators than in dabigatran initiators; this difference in rate of bleeding was greatest in patients at high baseline risk.
Results for supplementary analyses using a combined thromboembolism outcome that excluded transient ischemic attack, pulmonary embolism, and deep vein thrombosis are available in the web appendix.
In this study of patients initiating warfarin or dabigatran as part of routine care, use of estimates from the RE-LY randomized controlled trial3 performed as well as models developed within a routine care population at predicting the rate of thromboembolism per 100 patient years under dabigatran or warfarin treatment. Trial estimates for major bleeding consistently underestimated bleeding risk for patients treated as part of routine care. The poor calibration of the trial estimated rate of bleeding to the observed rate in the routine care population is probably related to several factors, including enrollment of a high proportion of participants with demonstrated tolerance to warfarin and systematic exclusion of patients with conditions that put them at higher risk of bleeding. These factors resulted in distributions of patient characteristics and range in baseline risks quite different from what might be observed in routine care. That the RE-LY trial included “persistent” users further speaks to the difficulty in trying to use trial data to inform real world practice decisions. In addition, predicted rates of major bleeding within low, medium, and high risk categories were not available from a single head-to-head trial and had to be interpolated using published information from two trials. This interpolation highlights the practical difficulty for healthcare providers using trial results to inform treatment decisions.
Randomized controlled trials are an essential method for determining the efficacy of medications, and healthcare professionals regularly rely on the results of trials to make important treatment decisions for individual patients. However, trials often are conducted in restricted populations with characteristics that differ from those of patients treated in routine practice. In addition, an average estimate found in a trial might mask considerable heterogeneity in treatment effect.26 Our results suggest that, once trials have determined efficacy of a drug, analyses of observational data of patients treated in routine practice can be of great value for providing tailored estimates of risk for relevant benefit or safety outcomes under alternative treatment strategies.
Our results are most useful for a clinician and patient dyad trying to choose among no treatment, dabigatran, or warfarin when they know that the patient has certain risk factors included in the established CHADS2 and HAS-BLED risk scores. From these two scores, they have a sense of expected event rates if untreated, but ideally, to inform their decision, they would like to know the expected event rates if the patient were to initiate dabigatran or warfarin. Two potential sources of this information are randomized controlled trials, which is the status quo, and observational data, which are becoming increasingly available. Informed by the most accurate evidence available, physician-patient dyads can make patient centered shared decisions regarding treatment.
There are some situations in which randomized controlled trials are the only source of evidence to inform decision making. For example, in the case of new drug treatments entering the market, there will be a period during which there are insufficient data from clinical practice to be able to develop and validate prediction models for relevant outcomes, especially if the outcomes are also rare. However, there are many situations in which it is difficult to obtain estimates of risk from trials for important benefit or safety outcomes given clinically relevant treatment alternatives. For instance, when there is no risk score available on which to stratify baseline outcome probability, there are no head-to-head trials comparing relevant treatment alternatives on outcomes of interest, estimates of absolute risk differences are available only for the entire trial population and not within a specific risk score range, and trial inclusion and exclusion criteria result in major differences between trial and clinical practice populations.
Although other studies have previously found that randomized controlled trials underestimate bleeding risk in patients with atrial fibrillation,27 our study highlights the potential use of secondary, electronically captured patient data to develop risk models to guide treatment decision making. Our results suggested that if physicians and patients relied on predicted bleeding rates based on data from the trial, they might substantially underestimate bleeding risk. In situations where model based predictions can be demonstrated to be more accurate for patients treated as part of routine care, they may prefer to use these predicted event rates as one factor in the decision making process. Using large databases to develop and validate models, which can be iteratively updated over time, can provide stratified estimates of absolute risk to guide decision making and enhance patient care. The performance of such models can be internally cross validated, validated within data source over time, or validated across data sources. Estimates from models based on populations treated in routine care settings are naturally calibrated to a clinical population in which predicted individual risks could be used to guide clinical care. Risk models developed within large observational data sources also have the benefit of being able to be adaptively updated as practice patterns change, indications for use expand, or new treatments emerge for which there are no head-to-head trials.
Our study was limited by the focus on dabigatran and warfarin, without evaluation of the other novel oral anticoagulation agents that have recently entered the market. The relatively small number of patients in our observational data who initiated dabigatran in routine care and had thromboembolic outcomes during the study limited the reliability of the model based predictions for patients exposed to dabigatran. However, with larger data sources, and accrual of greater experience with dabigatran, one would expect that model based predictions would become increasingly more reliable while the trial based predictions would not change.
Our approach was also limited by the inability to capture important clinical variables that are not typically measured in electronic claims data (eg, smoking, body mass index). Under-recording of HAS-BLED risk factors in observational data could result in higher observed rates of bleeding within HAS-BLED risk categories owing to patients who are misclassified into lower risk categories. However, the greatest differences between the randomized controlled trial data and observational data predictions were in the high risk HAS-BLED category (score ≥3). These patients would not have been assigned a higher risk category had more HAS-BLED risk factors been captured in observational data. Furthermore, in spite of potential under-recording of risk factors in observational data, it is likely that that the average HAS-BLED score for routine care patients at high baseline risk of bleeding was still higher than the average score in trials participants with a HAS-BLED score of 3 or more. However, the mean score within HAS-BLED risk categories was not reported from the trials, so this cannot be evaluated.
The median duration of follow-up while on treatment was shorter in routine care (median <6 months) than in the randomized controlled trials (median 24 months), and half of trial participants were not naive to oral anticoagulation at the time of randomization. The higher rates of major bleeding observed in routine care patients could be partly attributed to exclusive contribution of person time from the first months following initiation of oral anticoagulation therapy. By contrast, rates from the trials included person time from participants with demonstrated tolerance for anticoagulants as well as person time further from initiation of treatment than was observed in the routine care patients.
Another limitation was how outcomes were defined in the randomized controlled trials versus the observational data. Trial outcomes were based on prespecified and adjudicated criteria whereas outcomes in the observational data were defined on the basis of previously validated claims algorithms. These differences in outcome assessment could have contributed to the differences in rates. Furthermore, there were concerns raised by the US Food and Drug Administration and others regarding how major bleeding events were ascertained and counted in the RE-LY trial.28 Although targeted reviews did not result in changes to the distribution of major bleeding across study arms that altered the main findings, misclassified events could contribute to the poor calibration of trial based predictions to bleeding in routine care patients.
There was also potential for residual confounding in the reported adjusted rates of thromboembolism or major bleeding for dabigatran and warfarin within the risk categories of the CHADS2 and HAS-BLED scores. The stabilized inverse probability weights we used only adjust for the covariates that were included in the estimation of the weights, and there could be residual confounding from confounders that were not included.
Finally, in spite of our repeated cross validation, it is possible that the model based estimates were still over fit to the data, making performance of the model based estimates overly optimistic. The performance of model based predictions could be quite different if applied to a different large healthcare database that covers a different patient mix. Although model based predictions might not be as useful to patients outside of the type of patients found in a specific large healthcare data source, they can still be quite accurate and of considerable benefit for patients who are captured within that large healthcare data source. Furthermore, the process we have described can be readily applied to regularly update estimates over time and obtain tailored estimates within other large healthcare data sources.
Currently, risk scores are used to guide care for patients with atrial fibrillation, aiding with the decision of whether to initiate oral anticoagulation therapy, but not which oral anticoagulant to select. Risk stratified event rates while on alternative treatments can be used to enhance informed decision making. Although no risk score is perfect, more accurate estimates of event rates can help prescribers and patients make better informed decisions and improve patient centered care. In our example, models developed in routine care patients provided accurate, tailored estimates of risk and benefit under alternative oral anticoagulation therapies.
What is already known on this topic
CHADS2 and HAS-BLED are validated risk scores that estimate baseline risk of thromboembolism or major bleeding, respectively, in patients with atrial fibrillation to guide whether to initiate anticoagulation therapy; but these scores do not indicate which anticoagulant drug to use
Recent randomized controlled trials provide estimates of risk for thromboembolism and major bleeding under treatment with dabigatran or warfarin within low, medium, and high baseline risk categories
Trials have strict inclusion and exclusion criteria that can result in distributions of patient characteristics that are different from those observed in patients treated as part of routine care
What this study adds
Models developed and validated using observational data performed as well as randomized controlled trials at predicting thromboembolism under treatment with dabigatran or warfarin in patients treated as part of routine care
These models performed better than trials at predicting major bleeding under treatment with dabigatran or warfarin in patients treated as part of routine care
Trials could underestimate the rate of major bleeding in routine care patients at the highest baseline risk for bleeding by up to 4.0 per 100 patient years
Contributors: SVW drafted the protocol and statistical analysis plan as well as drafted and revised the paper. SVW is the guarantor. JMF, RJG, SS, WE, and JJG revised the statistical analysis plan, reviewed the results, and drafted and revised the paper. SVW and WE analysed the data.
Funding: SVW was supported by grant number R00HS022193 from the Agency for Healthcare Research and Quality. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Agency for Healthcare Research and Quality.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: support from the Agency for Healthcare Research and Quality for the submitted work; SVW, JF, and JJG are paid consultants to Aetion, a software company; SS is principal investigator of the Harvard-Brigham Drug Safety and Risk Management Research Center funded by US Food and Drug Administration (FDA)—his work is partly funded by grants or contracts from the Patient-Centered Outcomes Research Institute, FDA, and National Heart, Lung, and Blood Institute; SS is consultant to WHISCON (World Health Information Science Consultants) and to Aetion, a software manufacturer of which he also owns equity; SS is principal investigator of investigator initiated grants to the Brigham and Women’s Hospital from Novartis and Boehringer Ingelheim unrelated to this study; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: This study was reviewed by the institutional review board at Brigham and Women’s Hospital.
Data sharing: Source data is available through appropriate licensing and data use agreements with Optum Life Sciences.
The lead author affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/.