Making inferences on treatment effects from real world data: propensity scores, confounding by indication, and other perils for the unwary in observational research
BMJ 2013; 347 doi: https://doi.org/10.1136/bmj.f6409 (Published 11 November 2013) Cite this as: BMJ 2013;347:f6409All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
Sir,
We read with particular interest the paper by Freemantle et al. about propensity score-based methods (1) and we are in full agreement that users need to be aware of their strengths and limitations. The authors made a good point about the importance of stratified analyses using propensity scores and noted that “if there are different effects for different propensity score values this should ring alarm bells”. Indeed, if some confounding factors are missing, the analyses with classical adjustment and with propensity scores are biased to estimate the treatment effect.
However, the conclusion of the authors who advocated for randomized control trial as a requirement in all situations is probably too strong. Indeed, many fields of research would benefit from propensity score-based methods without randomized trials, even though practitioners need unbiased estimates of treatment effects. In occupational health or in Public health, many interventions are actually treatments which need to be evaluated and in these situations randomized control trials are not always appropriate (2,3). The randomized trials could be feasible but not desirable for ethical reasons and they could be not feasible at all for financial reasons (for example, the study of a long term effect of a treatment) or by the nature of the issue (for example, the study of indication of a treatment in the "real" world). In the cases where observational studies are necessary, using propensity scores is an adequate method to estimate the treatment effect.
In conclusion, we feel that while the authors are right to issue cautions about propensity score-based methods, in particular the necessity to dispose of all confounding factors (i.e. the treatment assignment is ignorable), they may have opened another perspective of applications where the conclusion might be different.
References
1 Freemantle N, Marston L, Walters K, et al. Making inferences on treatment effects from real world data: propensity scores, confounding by indication, and other perils for the unwary in observational research. BMJ 2013;347:f6409.
2 Hernán MA, Alonso A, Logan R, et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiol 2008;19:766–79.
3 Descatha A, Leclerc A, Herquelot E. Use of Propensity Scores in Occupational Health? J Occup Environ Med 2013;55:477–8.
Competing interests: No competing interests
Re: Making inferences on treatment effects from real world data: propensity scores, confounding by indication, and other perils for the unwary in observational research
The Freemantle et al article is excellent and timely, considering the growing popularity of propensity score adjustment. The key assumption is that if two patients have identical scores, but one was treated and the other was not, any difference in their outcomes is due to treatment and to nothing else. That the decision to treat or not treat such matched patients was random.
To be able to find such a matched pair, one must have a model that omits factors associated with treatment. If one knows all the factors associated with treatment, then the modeled probability of treatment is 1 in the treated, 0 in the untreated patient group. No propensity matching can be done. End of analysis.
To the extent one does not know why a patient is treated or not, propensity analysis is possible and appropriate, if there is an inherently random element in who got treated. That is the key, untestable assumption behind propensity adjustment. If a model fails entirely to distinguish the treated from the untreated, perhaps one might have some faith that the decision to treat was random.
Propensity adjustment seems a paradox. The more it is needed because of treatment group differences, the fewer patients can be matched and analyzed to answer the research question (and it must still be assumed the decision to treat remaining matched patients was random). The less it is needed because groups seem balanced already, the more patients can be matched and analyzed. With the “reductio ad absurdum” that a perfect treatment model means no information with which to answer the research question.
Competing interests: No competing interests