Intended for healthcare professionals


Breast screening: the facts—or maybe not

BMJ 2009; 338 doi: (Published 28 January 2009) Cite this as: BMJ 2009;338:b86

Stephen Duffy's claims on the benefits and harms of breast screening are seriously wrong

Professor in statistics, Stephen Duffy, disagrees with our estimates of the benefits and harms from screening mammography. To judge who is right, readers first need to know the data sources and methods used to generate both ours and Duffy's estimates:


Duffy's approach

Our approach

Advantage of our approach

Research design

Unsystematic reviews of randomised trials and observational studies

Systematic reviews of randomised trials and observational studies

Reliable, and therefore the recommended method

Statistical methods (A)

Involves models and assumptions

Does not involve models and assumptions

Transparent, easy to understand, and data massage is not possible

Statistical methods (B)

Extrapolation far beyond what the data support

No extrapolations

Reliable, and therefore the recommended method

Statistical methods (C)

Subgroup analysis of attenders (similar to "per protocol analysis" in drug trials)

No subgroup analysis (similar to "intention-to-treat analysis" in drug trials)

Reliable, and therefore the recommended method

Next, we explain in more detail what is wrong with Duffy's methods and results, in relation to the reduction in breast cancer mortality and the amount of overdiagnosis.Relative reduction in breast cancer mortality

When Duffy tries to support the calculations by Wald and Law (1), which are misleading (2), he quotes a paper that is not transparent (3). It refers to the number of breast cancer deaths in the Two-County trial, but does not state which publication they come from. Given the many papers on this trial and the varying results they present, it would require arduous detective work for most people to find the missing reference, but we found it (4), as we have tabulated the data from all the papers previously. The missing reference reports a 24% reduction in the Östergötland part of the Two-County trial (4). The investigators' assessment of cause of death was not blinded (5), and an overview of the Swedish trials that used data from the Swedish cause-of-death register found only a 10% reduction in Östergötland (6). Even worse: Despite the fact that the follow-up was slightly longer, the unblinded trial authors reported 10 fewer deaths from breast cancer in the study group than in the overview and 23 more deaths in the control group (6). Thus, the discrepancy of 33 deaths were all in favour of screening, a difference that cannot be ascribed to chance (p<_0.001 and="and" that="that" is="is" large="large" enough="enough" to="to" greatly="greatly" influence="influence" the="the" estimated="estimated" effect.="effect." p="p"/>

When confronted with this huge discrepancy, Duffy and the trial's main investigator, Laszlo Tabar, have responded evasively, e.g. that they "prefer to use the original primary research material where available, rather than figures reported in secondary research" (7) (a pretty weird description of data emanating from a cause-of-death register), or "It is asserted in the overview report that the endpoint committees in the Two-County trial were aware of patients' study groups. No evidence is presented for this assertion" (8). The lack of blinding has been confirmed by one of the trialists who participated in the Two-County trial (5), and there are many other reasons why the data from the Two-County trial are unrealible (5). By using official registries, we found that many cancers and deaths appear to be missing from the reports of the trial (9).

The other reference Duffy provides to support his claim that our reply to Wald and Law should contain "inaccuracies" is to an unsystematic review (10) by himself, Tabar and Robert Smith from the American Cancer Society. Robert Smith also co-authored the curious statements disputing the well-known lack of blinding (7,8). Even more curious, this paper states about the Two-County trial: "Cause of death was determined on blind review" (10). Duffy has published many papers on the Swedish Two-County trial with Tabar (60 hits on PubMed combine the two names), but this is the only time we have seen a claim that this process was blinded, and there is convincing evidence that it was unblinded.

Duffy states that our estimate of a 15% reduction in breast cancer mortality has "no basis in empirical data" (11). This estimate is based on the most thorough review that exists, our Cochrane review (5), and agrees with the systematic review performed by the US Preventive Services Task Force in response to that review, which found a 16% reduction (12). Our estimate of 15% is lower than the pooled effect of all trials in our Cochrane review, which was 20% (5), but it is not the result of a "subjective opinion of the trials reviewed", as Duffy suggests. It results from a formal assessment of the risk of bias in the trials, which is used for all Cochrane reviews, and the Task Force identified the same biases as we did (12). The pooled effect of all trials is clearly biased, e.g. it includes the seriously biased estimates from the Two-Country trial noted above. Our estimate of 15% is likely optimistic, as the trials with the least risk of bias showed a non-significant reduction of only 7% (5). The meta-analyses of the Swedish trials, which Duffy seem to prefer, overestimate the effect of screening, as they are not systematic reviews and have not paid attention to the important biases that have been documented (5,12).Absolute reduction in breast cancer mortality

Based on our Cochrane review of the randomised trials, we calculated that if 2000 women are screened regularly for 10 years, one will avoid dying from breast cancer. Wald and Law claimed that the effect is 6 times bigger, namely that 6 per 2000 will benefit (1). Duffy believes that the calculation by Wald and Law (1) is "reasonable and simple" and "involves fewer assumptions" [than ours] (11). This is wrong (2). Our estimate is founded on the most reliable evidence and requires no assumptions. We have already noted that Wald and Law derive their grossly misleading estimate from an overly optimistic expectation of the screening effect and a curious and unnecessary detour that involves erroneous use of observational data (2).

It is easy to see why our estimate of 1 in 2000 is correct. We look first at the meta-analyses of the Swedish trials, as these are preferred by Duffy, and next at the Cochrane review. The first meta-analysis of the Swedish trials describes that after 9 years there were 2.6 and 3.3 deaths, respectively, per 1000 women, in the invited and control groups, and after 12 years, the numbers were 3.9 and 5.1 (13). Thus 2 breast cancer deaths per 2000 women were avoided after 10 years of screening, and not 6 as Duffy claims. There were 425 breast cancer deaths in the control groups of 125,866 women after a mean of 9 years of follow-up (13). If the effect of screening were a 90% reduction in breast cancer mortality, there would be 43 breast cancer deaths in a control group of the same size. The risk difference then becomes 0.00338-0.00034 = 0.00304, which gives a number needed to invite to screening of 328 to avoid one breast cancer death. This is very close to the 333 that Wald and Law reported (6 deaths avoided per 2000 in 10 years gives 333). Thus, if we were to try to generate the numbers provided by Wald and Law, but now based on the randomised trials, we would have to postulate that the effect were a 90% reduction in breast cancer mortality.

However, the effect of screening is much smaller than stated in the Swedish meta-analysis, and our estimate of 1 in 2000 can be derived easily from our Cochrane review. After 7 years, there were 133 deaths from breast cancer in the control groups of the adequately randomised trials out of 66,105 women (5), and a 15% effect therefore corresponds to 113 deaths in a study group of the same size, which gives 0.6 women per 2000. In an update of the review we have recently submitted for publication, we have included the recent UK Age Trial and now have 384 deaths in the control groups of the adequately randomised trials. The updated calculation gives 0.7 women per 2000.Overdiagnosis

Duffy's estimates of overdiagnosis are based on two randomised trials from Sweden (Two-County and Göteborg) and on observational studies from the UK, Netherlands, Denmark, Italy, Australia and the USA. They range from zero to 5% (10, 14-17) and Duffy arrives at these estimates by using complicated models with doubtful assumptions, e.g. using a multistate model and Markov Chain Monte Carlo methods (16). In contrast, in our systematic reviews, we found 30% overdiagnosis in the randomised trials (5) and 52% in observational studies in countries with organised screening programmes (18). Duffy calls our systematic approach a "highly selective interpretation of the published results on overdiagnosis", but it is not selective to exclude Duffy's flawed papers and use transparent methods that everybody can understand; it is sound science. Those who open their eyes and look at incidence rates of breast cancer before and after screening was introduced cannot avoid seeing, without any statistical help, that overdiagnosis is substantial (18). Using the same simple algebra as for mortality, it is easy to see that 30% overdiagnosis means that 10 women are overdiagnosed per 2000.Will the exaggerations ever stop?

Since randomised trials provide the most reliable evidence, we fail to understand why many epidemiologists and statisticians repeatedly use observational data and complicated statistical models and detours to prove their point, unless we assume that the very idea with these manoeuvres is to produce wrong results in favour of screening. Even Duffy has interesting reservations about models: "...when there is disagreement between direct results from empirical data and modelled estimates derived by combining information from disparate sources, it would be wise to trust the former" (19). But he doesn't follow his own advice.

The exaggerations by Wald and Law didn't stop with their 6 per 2000 estimate. They extrapolated this 10-year estimate to 20 years simply by multiplying 6 by 2, getting 12 per 2000 (1). As no-one knows what would happen after 20 years of screening, this extrapolation is pretty bold. It is correct that a longer period of screening than in the trials could lead to more benefit, but the harms will also increase. The benefit-harm ratio between avoided breast cancer deaths and overdiagnosed cases of 1 to 10 that we found is therefore not likely to change much. This ratio may seem surprising to many people, but for prostate cancer, for example, it is much worse, as the recently published European trial showed a ratio of 1 to 48, also using a straightforward method (20).

Duffy supports an approach of calculating the effect among those who actually attend screening, based on the effect among those invited and the attendance rate. However, because of roundings, it doesn't matter whether we use invited women or women actually screened as the denominator. But we would never use women actually screened, as it is well known that this approach is flawed, just as per-protocol analyses are flawed in contrast to intention-to-treat analyses in drug trials. Part of the mortality reduction observed among attendees is due to them being healthier than non-attendees, and as it is not possible to identify and exclude a similar poor prognosis group among the controls, this approach should not be used. Furthermore, even if it were used, it would not matter in terms of the benefit-harm ratio, as those who attend are not only those who benefit but also those who are harmed.

The recent Annual Report from the NHS Breast Screening Programme celebrates 20 years of breast screening and is edited by programme director Julietta Patnick (21). It features an interview with Duffy where he is quoted for saying that "The 10-year fatality of screening-detected tumours is 50% lower than that of symptomatic tumours" (21). This statement is prominently highlighted and comes with no caveats. In contrast, the Forrest report, which was the basis for introducing screening in the UK, clearly states that such comparisons should not be used and are so misleading that "various biases ... may appear to enhance survival even if screening did not have an effect" (22).

In a recent telephone interview with a journalist from New York Times, Patnick "dismissed the Cochrane figures as inaccurate. British studies, she said, show that the ratio of lives saved to lives unnecessarily disrupted is more like one to one" (23). This comment was recently made also by others affiliated with the UK screening programme, but it is an exaggeration of 10 times. On the other hand, we are pleased to see that there is now at least a recognition of overdiagnosis, which is in stark contrast to previous claims from screening proponents, including Stephen Duffy, that overdiagnosis is non-existent or only a minor problem (17), and to the fact that overdiagnosis is not mentioned in a single word in the Annual Review 2008 of the NHS Breast Screening Programme edited by Julietta Patnick (21), or in invitations to screening (24).

Screening proponents should be concerned about the main message of our article, that women are encouraged to participate in screening without being told about the serious and frequently occurring harms, rather than trying to defend indefensible estimates of the benefits and harms of mammography screening. Duffy's letter and reports and statements from NHS spokespersons support, albeit unintentionally, our recommendation that information on the benefits and harms of screening should be distrusted when it comes from people with conflicts of interest (24,25).References

1. Wald NJ, Law MR. Response to Gøtzsche. BMJ 2009.

2. Gøtzsche P, Hartling OJ, Nielsen M, Brodersen J, Jørgensen KJ. Estimate of breast screening benefit was 6 times too large. BMJ 2009

3. Tabar L, Vitak B, Yen MFA, Chen HHT, Smith RA, Duffy SW. Number needed to screen- lives saved over 20 years of follow-up in mammographic screening. J Med Screening 2004; 11: 126-9.

4. Tabár L, Vitak B, Chen HH, Duffy SW, Yen MF, Chiang CF, Krusemo UB, Tot T, Smith RA. The Swedish two-county trial twenty years later. Updated mortality results and new insights from long term follow-up. Radiol Clin N Am 2000; 38:625-51.

5. Gøtzsche PC, Nielsen M. Screening for breast cancer with mammography. Cochrane Database Syst Rev 2006;(4):CD001877.

6. Nyström L, Andersson I, Bjurstam N, Frisell J, Nordenskjöld B, Rutqvist L. Long-term effects of mammography screening: updated overview of the Swedish randomised trials. Lancet 2002;359: 909-19.

7. Duffy SW, Tabár L, Smith RA. The mammographic screening trials: commentary on the recent work by Olsen and Gøtzsche (authors' reply). J Surg Oncol 2002; 81:164-6.

8. Tabár L, Smith RA, Duffy SW. Update on effects of screening mammography. Lancet 2002; 360:337.

9. Zahl P-H, Gøtzsche PC, Andersen JM, Mæhlen J. Results of the Two-County trial of mammography screening are not compatible with contemporaneous official Swedish breast cancer statistics. Dan Med Bull 2006; 53:438-40.

10. Smith RA, Duffy SW, Gabe R, Tabar L, Yen AM, Chen TH. The randomized trials of breast cancer screening: what have we learned? Radiol Clin North Am 2004; 42:793-806.

11. Duffy S. Re: Estimate of breast screening benefit was 6 times too large. BMJ 2009

12. Humphrey LL, Helfand M, Chan BK, Woolf SH. Breast cancer screening: a summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med 2002; 137(5 Part 1):347-60.

13. Nyström L, Rutqvist LE, Wall S, Lindgren A, Lindqvist M, Ryden S, et al. Breast cancer screening with mammography: overview of Swedish randomised trials. Lancet 1993; 341:973-8.

14. Yen MF, Tabar L, Vitak B, Smith RA, Chen HH, Duffy SW. Quantifying the potential problem of overdiagnosis of ductal carcinoma in situ in breast cancer screening. Eur J Cancer 2003; 39:1746-54.

15. Paci E, Warwick J, Falini P, Duffy SW. Overdiagnosis in screening: is the increase in breast cancer incidence rates a cause for concern? J Med Screen 2004; 11:23-7.

16. Duffy SW, Agbaje O, Tabar L, Vitak B, Bjurstam N, Bjorneld L, Myles JP, Warwick J. Overdiagnosis and overtreatment of breast cancer: estimates of overdiagnosis from two trials of mammographic screening for breast cancer. Breast Cancer Res 2005; 7:258-65.

17. Olsen AH, Agbaje OF, Myles JP, Lynge E, Duffy SW. Overdiagnosis, sojourn time, and sensitivity in the Copenhagen mammography screening program. Breast J 2006; 12:338-42.

18. Jørgensen KJ, Gøtzsche PC. Overdiagnosis in publicly organised mammography screening programmes: systematic review of incidence trends. BMJ 2009; in press.

19. Duffy SW. Commentary on 'What is the point: will screening mammography save my life?' by Keen and Keen. BMC Medical Informatics and Decision Making 2009, 9:19.

20. Schröder FH, Hugosson J, Roobol MJ, Tammela TLJ, Ciatto S, Nelen V, et al. Screening and Prostate-Cancer Mortality in a Randomized European Study. N Engl J Med 2009; 360:1320-8.

21. Patnick J (ed). Saving lives through screening. NHS Breast Screening Programme Annual Review 2008.

22. Forrest P (ed). Breast Cancer Screening. Report to the Health Ministers of England, Wales, Scotland & Northern Ireland. Department of Health and Social Science, 1986.

23. Rabin RC. Benefits of mammogram under debate in Britain. NY Times, 30 March 2009.

24. Gøtzsche P, Hartling OJ, Nielsen M, Brodersen J, Jørgensen KJ. Breast screening: the facts - or maybe not. BMJ 2009; 338:446-8.

25. Jørgensen KJ, Klahn A, Gøtzsche PC. Are benefits and harms given equal attention in scientific articles on mammography screening? A cross-sectional study. BMC Medicine 2007;5:12.

Competing interests:
None declared

Competing interests: No competing interests

01 May 2009
Peter C Gøtzsche
Karsten Juhl Jørgensen
Nordic Cochrane Centre, Rigshospitalet, DK-2100 Copenhagen, Denmark