Review Article
Discordance between reported intention-to-treat and per protocol analyses

https://doi.org/10.1016/j.jclinepi.2006.09.013Get rights and content

Abstract

Objective

To quantify the degree of disagreement between the two most popular methods for dealing with missing data: intention to treat (ITT) and per protocol (PP).

Study Design and Setting

We performed a systematic review of randomized two-armed clinical trials (CTs) published between 2001 and 2003, abstracted in PubMed and reporting both the ITT and PP analyses on a primary binary endpoint, out of which 74 papers were finally selected. The treatment effect of each CT was measured by the odds ratio, and the disagreement between them was quantified by the Bland–Altman method.

Results

On average, the PP estimator provides greater values LogeORPP = 1.25·LogeORITT, (95% CI: 1.15, 1.35) than the corresponding ITT estimator, although the limits of concordance showed that the ratio between the two estimators varies greatly from 0.39 up to 2.53.

Conclusion

These results confirm that missing values may cause both systematic and unpredictable bias in CTs. Further efforts should be made to minimize protocol deviations and to use better statistical methods to highlight the drawbacks of missing information. In the presence of protocol deviations, the conclusion of a CT cannot rest on the single reporting of either the ITT or the PP approach alone.

Introduction

The inherent point of a controlled clinical trial (CT), which defines it as experimental and makes it distinct from an observational study, is to assess the consequences of the assignment of an intervention to a patient [1]. Though some studies [2], [3] conclude that observational designs provide estimates of treatment effects not significantly different from those given by CTs, it is accepted [4] that in observational settings it is more hazardous to establish causality, because a higher degree of uncertainty is introduced in them by the unknown assignment procedure, which may be related to uncontrolled covariates whose effects may be confounded with the intervention effect. In a CT, the main source of uncertainty is attributed to the chance or sample variation, and thus measured by standard errors. But when potential deviations occur, the overall uncertainty is affected. As the units of a CT are human beings with legal and ethical rights [5], they might take decisions that overlap and are confounded with the clinician's decision. Other deviations may even occur in the course of treatment, resulting in dropout, missing data, or protocol violations.

To manage those deviations, two strategies are commonly used: the intention-to-treat (ITT) principle states that any subject should be analyzed as if he or she has completely followed the scheduled design and the per protocol (PP) approach proposes including only those volunteers who adhered to the assigned intervention and completed the prespecified follow-up without any major protocol deviation. Given that the ITT estimate includes patients who, in fact, did not receive the experimental treatment, one would expect them to provide attenuated values [6]. As ITT tries to preserve the experimental design, it has been usually recommended [7], [8], [9], [10] for nonequivalence trials, despite the need to impute outcome values for noncompliers with missing data. The dilemma with missing data is to distinguish between random and nonrandom missingness: as the randomness assumption rests on independence from nonobservable variables, it cannot be empirically contrasted and missing data result in a greater uncertainty of the trial conclusions.

When studying the adherence to the intervention assigned, we can consider the distinction between use effectiveness, which estimates the outcome in habitual conditions of administration (“proof of practice”), and method effectiveness (“efficacy”), which assesses the method's potential under ideal conditions and no protocol deviations (“proof of principle”). Shih and Quan [11] suggested that use effectiveness should be considered for management decisions involving a whole population; however, for clinicians involved with individual patients, method effectiveness among completers, together with the probability of completion, may be more relevant. If the dropouts in a trial are similar to future dropouts, and given that there is a good definition of the studied, treated, and sick populations [12], it can be argued that a valid study-based ITT analysis will adequately address use effectiveness. On the other hand, as dropout may be related to outcome [13] and it may have different cause in each treatment arm [14], the PP estimate that excludes protocol deviations will be biased [6], [15], [16], especially when there is a large percentage of dropout [17]. Furthermore, compliance can interact with treatment, resulting in better results for compliers in the active group but just the opposite (better for noncompliers) in the control group [10]. Thus, as the PP estimate is not acceptable in cases of substantial dropout, it has been argued that a valid estimate of method effectiveness can be derived from the ITT estimate taking into account the degree of noncompliance [16].

To summarize, our rationale is that point estimates and their standard errors are derived assuming random allocation in addition to a complete and identical follow-up in the treatment arms. Then, our hypotheses are that any deviation from the protocol design may generate two sources of error: bias in the estimates of the effect (systematic bias); but also, as was pointed out by Deeks et al. [4], an incorrect underestimation of the real variability present (unpredictability bias), because standard errors account solely for random variation.

Our main objective is to study empirically the relationship between the ITT and the PP estimators as reported by researchers in indexed medical journals and to quantify its degree of concordance.

Some authors [15], [18] have hypothesized about a loss of power of the PP estimate due to its reduced sample size, although others have questioned whether it can be compensated by its expected higher estimate [7]. Our second objective is to compare the statistical efficiency of the two approaches.

Section snippets

Data sources

We performed a systematic review of papers abstracted in PubMed, restricted to publication years 2001–2003, using the keywords “clinical trial,” “intention to treat” (or ITT), and “per protocol” (or PP). The papers were manually checked to make sure they performed both the ITT and PP analyses on the primary endpoint. Finally, seeking homogeneity, we restricted the study to CTs comparing only two groups of treatment and with a binary response.

Data extraction

We recorded sample size and number of positive

Data selection

The initial search identified 162 papers, but only 127 were true randomized CTs analyzed both by ITT and PP. From these, 53 were excluded, mainly because they had more than two groups or did not analyze a binary response (see Fig. 1 for details). The final sample comprised 74 papers.

Sample description

There was a large heterogeneity in sample size: the number of patients included in the ITT (PP) analyses ranged from 26 (21) to 5,792 (4,755) with a median of 155 (133). The percentage of losses ranged from 1.74 to

PP provides higher estimates

Our first conclusion is that, as expected, the PP analysis tends to provide, on average, higher estimates of effect than the ITT analysis [6], [16]. This result is in accordance with the idea that losses do not retain the treatment effect and missing data in CTs result in systematic differences between the approaches for dealing with them [16].

Unpredictability

Our second conclusion is about poor agreement: though the Lin reproducibility index was large, the discrepancy limits showed that both the ITT and PP

Conclusion

To conclude, we recommend first optimizing clinical plans to mitigate sample attrition [35], trying to avoid nonrandom errors, because “the best way of dealing with missing data is not to have them in the first place” [36]. Second, if nonrandom mechanisms are involved, the dropout mechanism should be carefully monitored [37], to allow statistical analysis reflecting the uncertainty introduced by protocol deviations: on modeling nonresponse, clinicians, and statisticians should work together to

Acknowledgments

While taking full responsibility for possible errors, we gratefully acknowledge helpful reviews of earlier versions of this work by Drs. Mike Campbell, Francesc Cardellach, Josep Lluis Carrasco, Guadalupe Gómez, and Ian White, as well as two anonymous reviewers. We also appreciate Donald Rubin's suggestions for future work and Alan Pounds for English editing. E.C. was partially supported by grant FIS PI041945 from the “Instituto de Salud Carlos III.”

Contributors: N.P., C.B., and E.C. designed

References (37)

  • D. Moher et al.

    The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials

    Ann Intern Med

    (2001)
  • Food and Drug Administration

    International conference on harmonization: statistical principles for clinical trials

    Fed Regist

    (1998)
  • W.J. Shih et al.

    Testing for treatment differences with dropouts present in clinical trials—a composite approach

    Stat Med

    (1997)
  • J.P. Collet et al.

    Sick population—treated population: the need for a better definition

    Eur J Clin Pharmacol

    (1991)
  • S.M. Snapinn et al.

    Informative noncompliance in endpoint trials

    Curr Control Trials Cardiovasc Med

    (2004)
  • W.R. Myers

    Handling missing data in clinical trials: an overview

    Drug Inf J

    (2000)
  • L.B. Sheiner et al.

    Intention-to-treat analysis and the goals of clinical trials

    Clin Pharmacol Ther

    (1995)
  • S.C. Choi et al.

    Effect of non-random missing data mechanisms in clinical trials

    Stat Med

    (1995)
  • Cited by (0)

    Erik Cobo was in part supported by FISS grant PI041945.

    View full text