Inverse probability weighting
BMJ 2016; 352 doi: https://doi.org/10.1136/bmj.i189 (Published 15 January 2016) Cite this as: BMJ 2016;352:i189- Mohammad Ali Mansournia, assistant professor of epidemiology1,
- Douglas G Altman, professor of statistics in medicine2
- 1Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
- 2Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
- Correspondence to: M A Mansournia mansournia_ma{at}yahoo.com
- Accepted 5 January 2016
Statistical analysis usually treats all observations as equally important. In some circumstances, however, it is appropriate to vary the weight given to different observations. Well known examples are in meta-analysis, where the inverse variance (precision) weight given to each contributing study varies, and in the analysis of clustered data.1
Differential weighting is also used when different parts of the population are sampled with unequal probabilities of selection. Two examples of intentional unbalanced sampling are:
-
Surveys with unequal probabilities of selection—In a national survey of hypertension prevalence, certain groups with relatively rare characteristics (such as people aged ≥65 years) were oversampled to improve the precision of estimates for those groups.2
-
Two-phase prevalence studies—In the first phase of a two-phase prevalence study of mental health status, the sampled patients completed a short screening questionnaire. In the second phase, a subsample was selected for a definitive diagnostic test with oversampling of the screen-positive cases to ensure precise estimates for diagnostic prevalence.3
In such cases the ordinary unweighted sample quantities, such as means or proportions, are likely to be biased estimates of their corresponding population quantities. This “selection bias” can be eliminated by performing a weighted estimation, giving each individual’s data a weight inversely proportional to their probability of selection. Intuitively, the weighting is used to deflate the weight for those individuals who are oversampled. The weighted analysis can be thought of as creating a study with no differential selection.
Inverse probability weighting can also be used when individuals vary in their probability of having missing information. Two contexts where there may be unintentional unbalanced selection are:
-
Studies with missing outcome data—In surveys such as that mentioned in example 1, the response rates will be affected by availability or willingness to participate. Likewise in a cohort study of the effect of obesity on hypertension, some individuals are censored due to loss to follow-up (such as emigration) or competing risks (such as death from other causes).4 In each case the amount of missing information will vary across subgroups.
-
Randomised trials with crossing over from one arm to the other—In a randomised trial 8010 postmenopausal women with early breast cancer were assigned to tamoxifen (n=2459) or letrozole (n=2463) for five years or to sequential treatment with two years of one of these agents followed by three years of the other. There was a selective crossover to letrozole of 619 patients in the tamoxifen arm after significant benefit was reported for letrozole compared with tamoxifen during the study. These 619 women may be artificially censored at the time they crossed over for analysis.5
In these situations, missing outcomes are unlikely to happen at random so that estimates will be biased. While the selection probabilities in examples 1 and 2 are known, the response or non-censoring probabilities in examples 3 and 4 are unknown. Inverse probability weighting can be used with weights estimated from a logistic regression model for predicting non-response or censoring. As in the first scenario, this application of the method aims to remove bias, but it is more controversial. Its validity relies on a correctly specified model including all prognostic variables associated with non-response or censoring, which cannot be assured.
In the breast cancer trial (example 4), although the intention-to-treat hazard ratio for overall survival (which ignores selective crossover) was 0.87 (95% confidence interval 0.77 to 1.00) in favour of letrozole, the adjusted hazard ratio using inverse probability of selection weights was 0.79 (0.69 to 0.90), suggesting that the true effect is greater than the intention-to-treat estimate.5
In observational studies, the probability of exposure can depend on external factors (called confounders) that also affect the outcome. The causal effect of interest is then confused with the effects of confounders. Such confounding can be thought as a type of selection bias, because confounding essentially means that some causes of the outcome also influence selection for the exposure. A particularly important context is:
-
Non-randomised studies comparing different treatments—In a cohort study 12 552 warfarin-naive patients with atrial fibrillation admitted to hospital for ischaemic stroke and treated with warfarin were compared with patients who received no oral anticoagulant at discharge.6
Outside randomised trials the choice of treatment is likely to be influenced by predictors of outcome, so called “confounding by indication”.7 Various strategies are used to try to remove the bias in non-randomised treatment comparisons. The conventional approach is to use multivariable regression, but a recent alternative is inverse probability of treatment weighting. Here the weights are based on each individual’s probability of receiving a specific treatment given the confounders, which is known as the propensity score (PS). The weights are 1/PS for the treated participants and 1/(1−PS) for the untreated participants.8 The weights can be estimated from a logistic regression model for predicting treatment. Key assumptions are that all confounders have been measured and properly modelled in this treatment model. In the warfarin study (example 5) the unadjusted hazard ratio for cardiac events was 0.73 (99% confidence interval 0.67 to 0.80) in favour of warfarin, whereas the adjusted estimate using inverse probability of treatment weighting was 0.87 (0.78 to 0.98), about half the effect size.6 If the cohort is also affected by censoring (see example 3 above), one can adjust simultaneously for confounding and selection bias due to censoring.4 8
Although helpful for bias reduction, estimates weighted by design weights (examples 1 and 2) tend to be less precisely estimated than the unweighted estimates, which is not necessarily true for examples 3-5. The ordinary 95% confidence interval for inverse probability weighted estimates may not provide the correct coverage and should be avoided. Instead, robust “sandwich” variance estimators or non-parametric bootstrapping should be used to provide valid confidence intervals.8 Deeper discussion of inverse probability weighting methods can be found elsewhere.8 9
Footnotes
-
Contributors: MAM and DGA jointly wrote and agreed the text.
-
Competing interests: We have read and understood the BMJ Group policy on declaration of interests and have no relevant interests to declare.
-
Provenance and peer review: Not commissioned; not externally peer reviewed.