Intended for healthcare professionals

Rapid response to:


Where there’s smoke . . .

BMJ 2014; 348 doi: (Published 21 January 2014) Cite this as: BMJ 2014;348:g40

Rapid Response:

Where there is smoke, there are often mirrors.

An editorial and two papers in BMJ opine that where there is smoke there is fire – air pollution is killing millions of people across the world [1-3]. Indeed there are countless papers declaring air pollution a grim reaper. A search using Google Scholar of (air pollution, mortality) returns about 201,000 hits. There is certainly a lot of smoke, but do we have fire? or mirrors? Let’s start with the two cited BMJ papers.

Regression-type modeling has three types of variables: outcome(s), predictors of interest, and confounding variables. Cesaroni has one outcome acute coronary events (myocardial infarction and unstable angina) where Guo has two mortality and years of life lost. Like the proverbial iceberg there could be hidden aspects that are important. Both researchers are free to choose their outcome of interest from other cardiovascular outcomes that could have been selected based on literature [4] where 11 are examined. Cesaroni has six predictor variables of interest and Guo has four. Both have a large number of covariates. I count 16 variables with a total of about 40 levels in Cesaroni [2]. Guo [3] is using a time-series analysis where the nature of the study design controls for many covariates. Still he has four main confounders, long term trends, seasonality, day of the week, and weather conditions and there is the choice of lag effects, any effects of air pollution may be delayed one or more days. It is of interest to compute an estimate of the number of questions at issue for the Cesaroni and Guo studies. One simple count can be computed as (number of outcomes) x (number of predictors at issue) x 2 raised to the power of possible covariates. For Cesaroni and Guo, I compute 393 216 and 1 024 question/modeling choices, respectively. Indeed in 2000, Clyde [5] makes the point, “Because of the large number of potential variables, model selection is often used to find a parsimonious model. Different model selection strategies may lead to very different models and conclusions for the same set of data. As variable selection may involve numerous tests of hypotheses, the resulting significance levels may be called into question, and there is the concern that the positive associations are a result of multiple testing.” More recent papers point to the same problem [6,7]. There is an argument that p-values for statistical significance should be dramatically reduced [8] from the common p<0.05.

There are additional considerations. A large number of decisions are made in moving from raw data to the file used for analysis and these decisions can profoundly affect the claims made [9]. Normal science works by independent examination of data and methods and by independent replication of studies. Normal science is slowed when data sets are not available for examination. It seems clear that Cesaroni did not have access to individual data from the eleven cohorts they report on. Arguably a more powerful analysis could have been done, for example, by clustering similar individuals across all the cohorts and looking for effects within clusters of similar individuals [10]. The clustering could have been done by a trusted 3rd party. As the clusters are micro aggregated, cluster statistics could be made public while protecting personal identity. At that point normal science could proceed.

Do we have smoke with fire or smoke with mirrors? We all should agree there is a lot of smoke as many studies report positive effects. Note however that there are large, well-conducted studies that find no association of air pollution with mortality [4, 11-14]. I think we should also agree that there are enough data set building, multiple testing and modeling decisions that there are plenty of mirrors. Until data sets are publicly available [15] and thoroughly vetted [16], thoughtful people should withhold judgment.

1. Brauer M, Mancini GBJ. Where there’s smoke . . . Poor air quality is an important contributor to cardiovascular risk. BMJ 2014;348:g40 doi:10.1136/bmj.g40.
2. Cesaroni G, Forastriere F, Stafoggia M, Andersen ZJ, Badaloni C, Beelen R, et al. Long-term exposure to ambient air pollution and incidence of acute coronary events - Analysis of eleven European cohorts from the ESCAPE Project. BMJ 2014;348:f7421.
3. Guo Y, Li S, Tian Z, Pan X, Zhang J, Williams G. The burden of air pollution on years of life lost in Beijing, China, 2004-08: retrospective regression analysis of daily deaths. BMJ 2013;347:f7139.
4. Milojevic A, Wilkinson P, Armstrong B, Bhaskaran K, Smeeth L, Hajat S. Short-term effects of air pollution on a range of cardiovascular events in England and Wales: case-crossover analysis of the MINAP database, hospital admissions and mortality. Heart 2014;100:1093-1098.
5. Clyde M. Model uncertainty and health effect studies for particulate matter. Environmetrics 2000;11:745-763.
6. Young S S, Karr A. Deming, data and observational studies: A process out of control and needing fixing. Significance 2011;September:122–126.
7. Gelman A, Loken E. The statistical crisis in science. The American Scientist 2014;102: 460-465.
8. Johnson VE. Revised standards for statistical evidence. PNAS 2013;110:19313-19317.
9. Madigan D, Ryan PB, Schuemie M. Does design matter? Systematic evaluation of the impact of analytical choices on effect estimates in observational studies. Therapeutic Advances in Drug Safety. 2013;4:53-62.
10. Obenchain RL, Young SS. Advancing statistical thinking in health care research. Journal of Statistical Theory and Practice 2013;7:456-469.
11. Chay K, Dobkin C, Greenstone M. The Clean Air Act of 1970 and adult mortality. J Risk Uncertainty 2003;27:279-300.
12. Enstrom JE. Fine particulate air pollution and total mortality among elderly Californians, 1973–2002. Inhalation Toxicology 2005;17:803–816.
13. Greven S, Dominici F, Zeger S. An approach to the estimation of chronic air pollution effects using spatio-temporal information. J Amer Stat Assoc. 2011;106:396-406.
14. Young SS, Fogel P. Air pollution and daily deaths in California. Proceedings, 2014 Discovery Summit.
15. Peng RD, Dominici F, Zeger SL. Reproducible epidemiologic research. American Journal of Epidemiology 2006;163:783-789.
16. Young SS, Miller HI. Are medical articles true on health, disease? Sadly, not as often as you might think. GeneticEngineering & Biotechnology News May 1, 2014;34 (9).

Competing interests: No competing interests

03 February 2015
S. Stanley Young
Raleigh, NC 27607, USA