Overdiagnosis in screening mammography in Denmark: population based cohort study2013; 346 doi: http://dx.doi.org/10.1136/bmj.f1064 (Published 26 February 2013) Cite this as: 2013;346:f1064
- Sisse Helle Njor, statistician1,
- Anne Helene Olsen, statistician2,
- Mogens Blichert-Toft, professor emeritus3,
- Walter Schwartz, chief physician4,
- Ilse Vejborg, chief physician5,
- Elsebeth Lynge, professor1
- 1Department of Public Health, University of Copenhagen, Østre Farimagsgade 5, DK 1014 Copenhagen K, Denmark
- 2Institute of Community Medicine, University of Tromsø, Tromsø, Norway
- 3Danish Breast Cancer Cooperative Group, 2100 Copenhagen Ø, Denmark
- 4Mammography Screening Clinic, University Hospital Odense, 5000 Odense, Denmark
- 5Diagnostic Centre, University Hospital Copenhagen, Blegdamsvej, 2100 Copenhagen Ø, Denmark
- Correspondence to: S H Njor
- Accepted 5 February 2013
Objective To use data from two longstanding, population based screening programmes to study overdiagnosis in screening mammography.
Design Population based cohort study.
Setting Copenhagen municipality (from 1991) and Funen County (from 1993), Denmark.
Participants 57 763 women targeted by organised screening, aged 56-69 when the screening programmes started, and followed up to 2009.
Main outcome measures Overdiagnosis of breast cancer in women targeted by screening, assessed by relative risks compared with historical control groups from screening regions, national control groups from non-screening regions, and historical national control groups.
Results In total, 3279 invasive breast carcinomas and ductal carcinomas in situ occurred. The start of screening led to prevalence peaks in breast cancer incidence: relative risk 2.06 (95% confidence interval 1.64 to 2.59) for Copenhagen and 1.84 (1.46 to 2.32) for Funen. During subsequent screening rounds, relative risks were slightly above unity: 1.04 (0.85 to 1.27) for Copenhagen and 1.14 (0.98 to 1.32) for Funen. A compensatory dip was seen after the end of invitation to screening: relative risk 0.80 (0.65 to 0.98) for Copenhagen and 0.67 (0.55 to 0.81) for Funen during the first four years. The relative risk of breast cancer accumulated over the entire follow-up period was 1.06 (0.90 to 1.25) for Copenhagen and 1.01 (0.93 to 1.10) for Funen. Relative risks for participants corrected for selection bias were estimated to be 1.08 for Copenhagen and 1.02 for Funen; for participants followed for at least eight years after the end of screening, they were 1.05 and 1.01. A pooled estimate gave 1.040 (0.99 to 1.09) for all targeted women and 1.023 (0.97 to 1.08) for targeted women followed for at least eight years after the end of screening.
Conclusions On the basis of combined data from the two screening programmes, this study indicated that overdiagnosis most likely amounted to 2.3% (95% confidence interval −3% to 8%) in targeted women. Among participants, it was most likely 1-5%. At least eight years after the end of screening were needed to compensate for the excess incidence during screening.
The purpose of screening mammography is to reduce mortality from breast cancer without increasing mortality from other diseases. Preventive measures in healthcare might, however, also have unintended negative side effects, and the occurrence of these should be closely monitored. In screening mammography, the most serious concern is the risk of overdiagnosis—that is, diagnosis of breast cancer that would in the absence of screening not have led to clinically manifest disease in the woman’s lifetime.1 Overdiagnosis cannot be identified biologically, as distinguishing between progressive and non-progressive or slowly progressive cancers is not possible with current diagnostic tools. Overdiagnosis can therefore be investigated only epidemiologically.
Screening affects the incidence rate. Assuming a three year advancing of time of diagnosis (lead time) and screening of all women during a two year period, a doubling of the incidence rate is expected during the first round of screening.2 As screening continues, the incidence rate should go down to the level before screening, apart from an increase caused by the artificial aging—that is, breast cancer diagnosed at age 55 in the absence of screening will during screening be diagnosed, for example, at age 52. A complementary dip in the incidence rate is expected after women leave the screening programme.3 4 Overdiagnosis occurs if the cumulative incidence some years after the end of screening is higher than the cumulative incidence expected in the absence of screening.5 This pattern may be further complicated by possible temporal changes in the underlying incidence rate.
In most randomised controlled trials, screening was offered to the control groups after the end of the trials, making estimation of the amount of overdiagnosis difficult. The Malmö trial in Sweden left the control group unscreened. For women screened at age 45-79 and followed past screening age, this randomised controlled trial showed a relative cumulative incidence rate of 1.10 (95% confidence interval 0.99 to 1.22) for the intervention compared with the control group.6 The control groups were not offered screening in the Canadian randomised controlled trials either,7 8 but as these trials targeted only women aged 40-49 and 50-59, participants probably continued screening elsewhere. With the limited data from randomised controlled trials, data on overdiagnosis from service screening programmes are warranted. The challenge in observational studies is to measure the expected cumulative incidence rate of breast cancer in the absence of screening. By using both current and historical data from the screening regions, as well as current and historical data from non-screening regions, we can measure the change in incidence of breast cancer from before to during screening in the screening regions controlled for the underlying temporal trend in breast cancer incidence. Following this method, we present here estimates of overdiagnosis from Denmark, where population based, service screening mammography was implemented in two regions in the early 1990s while nationwide roll-out of screening took place in the late 2000s.
Material and methods
The population based screening mammography programme in Copenhagen started on 1 April 1991, inviting women aged 50-69 at the start of each biennial invitation round. The participation rate was 71% in the first round. The programme in Funen started on 1 November 1993 and invited women aged 50-69 at the date of invitation. The participation rate was 84% in the first round. See web appendix for details.
Use of opportunistic screening has always been limited in Denmark. In 2000 only 3% of women aged 50-69 in non-screening regions had a mammogram taken; this included diagnostic as well as opportunistic screening mammograms.9 The nationwide roll-out of screening mammography started only in 2007.
The Danish central population register holds information on current and historical addresses for all citizens. From this register, we identified women targeted by screening in Copenhagen from 1 April 1991 to 31 March 2005 and in Funen from 1 November 1993 to 31 October 2004. To be able to detect a possible compensatory dip above the age of invitation to screening, a sufficiently long follow-up time should be included in the analysis. From Copenhagen, we therefore included only women born 1 April 1921 to 31 March 1935—that is, women aged 56-70 years on 1 April 1991 (start of screening in Copenhagen); from Funen, we included women born 1 November 1923 to 31 October 1934—that is, women aged 59-70 on 1 November 1993 (start of screening in Funen) (fig 1⇓). For each of the two screening regions, we constructed three control groups: a historical control group, a national control group from non-screening regions, and a historical national control group (table 1⇓).
The historical control groups included birth cohorts of women from the two respective screening regions from before the screening started. We followed these birth cohorts in the same age range as the birth cohorts invited to screening (table 1⇑). The national control group included the same birth cohorts as the respective study groups but from non-screening regions. The historical national control groups were constructed in the same way as the historical control groups. The nationwide roll-out of screening targeting women aged 50-69 started in 2007, so this nationwide programme did not affect women born before 1937.
For the study group, we defined the entry date as the first date the woman was present in the target age group and screening region. We used similar definitions for the control groups. Person years at risk were accumulated from the woman’s entry into the respective study or control group until diagnosis of invasive breast carcinoma or ductal carcinoma in situ, emigration, death, or end of follow-up on 31 December 2009, whichever came first. We identified incident cases of invasive breast carcinoma and ductal carcinoma in situ by linkage to the Danish cancer register and the clinical database of the Danish Breast Cooperative Group. Linkage was based on personal identification numbers. We defined participants as targeted women who attended screening at least once. Data on participation came from the mammography screening registers in Funen and Copenhagen. The mammography screening registers hold individual information on all participants.
We estimated overdiagnosis as the cumulative incidence of breast cancer in a screening region compared with the expected cumulative incidence in the absence of screening. If the incidence of breast cancer in the absence of screening had developed equally in screening and non-screening regions, the expected incidence in the absence of screening could be estimated from the incidence in the historical control group controlled for the change in incidence from historical to present time in the national control group. Our study therefore has three control groups: the historical control group, the national control group, and the historical national control group.
However, as we included data on incidence of breast cancer for a period of almost 33 years, from 1977 to 2009 in Copenhagen, the background incidence in the absence of screening might have developed differently in the screening region and in the non-screening region (that is, interaction between region and period). We therefore started out by testing the assumption of no interaction between region and period. As separating the interaction between period and region from the effect of screening is not possible, we used data from the five year period before screening. Studying the interaction in the same cohorts as included in the study and control groups is important, as the interactions might vary by cohorts. We therefore used the same cohorts as in the study and control groups but looked at the five year period before the study period and control periods. This means that we used data on incidence of breast cancer in the screening cohorts for a pre-study period of five years before the first invitation to screening: the period 1 April 1986 to 31 March 1991 for Copenhagen and 1 November 1988 to 31 October 1993 for Funen. We used equivalent pre-study periods for the three control groups (table 1⇑). We used Poisson regression for this analysis (model A in appendix).
For Funen, we found no statistically significant interaction between region and period for any cohort (P=0.22-0.93), and we saw no trends. We consequently analysed the Funen data for the study period by using a Poisson model without interaction between region and period (model B in appendix). For Copenhagen, we found a statistically significant interaction between region and period for the four oldest cohorts (P=0.046), as illustrated in the appendix figure; the incidence of breast cancer in the Copenhagen study group in the pre-study period developed differently from the incidence in the control groups. We consequently took this interaction into account in the analysis of the Copenhagen data, assuming that for each two year cohort the interaction in the study period equalled the interaction found in the pre-study period, and using data from both the pre-study and the study periods (model C in appendix).
We analysed data separately for five time periods in the study group: programme prevalence screening round, programme incidence screening rounds, 0-3 years after end of screening, 4-7 years after end of screening, and at least 8 years after end of screening. As the numbers of women followed for more than 12 years were relatively small, we did not divide the analysis into the time periods 8-11 years, 12-15 years, and 16 years and above. We present the results as relative risks with two sided 95% confidence intervals. To get an estimate of overdiagnosis based on data from both Copenhagen and Funen, we made a pooled estimate by using a fixed effects weighted average of the relative risks on a logarithmic scale.10
In the Copenhagen study group, we included 32 931 women and followed them for an average of 13.9 years, giving 456 499 person years; 1892 invasive breast carcinomas and 110 cases of ductal carcinoma in situ occurred (table 2⇓). The population of Copenhagen was considerably larger during the historical control period, including 63 097 women followed for an average of 14.4 years, giving 909 875 person years; 2598 invasive breast carcinomas and 41 cases of ductal carcinoma in situ occurred. The national control group and the historical national control group for Copenhagen were fairly equal in size, including 281 311 women followed for an average of 14.8 years and 266 860 women followed for an average of 15.0 years, giving 4 173 549 and 3 999 172 person years. The numbers of invasive breast carcinomas differed, being 14 410 in the national control group and 10 323 in the historical national control group, a difference reflecting the underlying increase in incidence of breast cancer.11
In the Funen study group, we included 24 832 women and followed them for an average of 13.0 years, giving 323 363 person years; 1203 invasive breast carcinomas and 74 cases of ductal carcinoma in situ occurred (table 3⇓). The size of the Funen population was fairly constant over time, with 27 143 women followed for an average of 13.2 years in the historical control period, giving 359 426 person years; 1040 invasive breast carcinomas and 45 cases of ductal carcinoma in situ occurred. The national control group and the historical national control group for Funen were also of fairly equal size, including 213 380 and 209 443 women, both followed for an average of 13.0 years, giving 2 768 352 and 2 731 477 person years and 9898 and 7635 invasive breast carcinomas.
The cumulative incidence in Copenhagen during screening was 5% higher than expected in the absence of screening (relative risk 1.05, 95% confidence interval 0.88 to 1.24) (fig 2⇓). Inclusion of ductal carcinoma in situ changed the estimate only slightly, giving a relative risk of 1.06 (0.90 to 1.25). If 1000 targeted women were followed over 20 years, the relative risk of 1.06 would translate into 87 invasive breast carcinomas plus ductal carcinomas in situ compared with 82 cases expected in the absence of screening. We observed a doubling of the incidence of breast cancer during the programme prevalence peak (relative risk 2.06, 1.64 to 2.59; invasive breast carcinoma plus ductal carcinoma in situ) (fig 2⇓). The incidence was close to the expected incidence in the absence of screening during the programme incidence screening rounds (relative risk 1.04, 0.85 to 1.27). We saw a clear deficit during the first 0-3 years after the end of screening (relative risk 0.80, 0.65 to 0.98), after which the incidence gradually approached the level expected in the absence of screening. We found no significant difference in relative risk for cumulative incidence by age at entry (heterogeneity test P=0.47).
The cumulative incidence in Funen was 1% higher than expected (relative risk 1.01, 0.92 to 1.10) (fig 2⇑). Inclusion of ductal carcinoma in situ had a marginal effect on this estimate (relative risk 1.01, 0.93 to 1.10). If 1000 targeted women were followed over 20 years, the relative risk of 1.01 would translate into 78 invasive breast carcinomas plus ductal carcinomas in situ compared with 77 cases expected in the absence of screening. The incidence of breast cancer almost doubled during the programme prevalence peak (relative risk 1.84, 1.46 to 2.32; invasive breast carcinoma plus ductal carcinoma in situ) (fig 2⇑) and was non-significantly increased during the programme incidence screening rounds (1.14, 0.98 to 1.32). We saw a clear deficit during the first 0-3 years after end of screening (relative risk 0.67, 0.55 to 0.81), after which the incidence gradually approached the level expected in the absence of screening. We found no significant difference in relative risk for cumulative incidence by age at entry (heterogeneity test P=0.97).
At least eight years of follow-up
Including only birth cohorts with at least eight years of follow-up after the end of screening (Copenhagen 1921-31; Funen 1923-31), and including both invasive breast carcinoma and ductal carcinoma in situ, gave a relative cumulative incidence of relative risk 1.034 (0.86 to 1.25) for Copenhagen and 1.007 (0.91 to 1.12) for Funen.
The pooled estimate for all targeted women was 1.040 (0.99 to 1.09). The pooled estimate for women with at least eight years of follow-up was 1.023 (0.97 to 1.08).
Estimates for participants
From the mammography screening register, we know who has ever participated in one of the screening programmes. In Copenhagen, 32% of the studied birth cohorts had never participated in screening. Compared with the expected level in the absence of the screening programme, these women had a relative cumulative incidence (invasive breast carcinoma plus ductal carcinoma in situ) of 0.88. When we used these results to control for selection bias among screening participants, the relative cumulative incidence in participants became 1.08 (see appendix for details on methods). In Funen, 29% of the studied birth cohorts had never participated in screening, and the relative cumulative incidence for these women was 0.96, giving a relative cumulative incidence for participants of 1.02. In the birth cohorts followed for eight years or more after the end of screening, the relative cumulative incidence for participants became 1.05 in Copenhagen and 1.01 in Funen.
Our data indicate that among women targeted for screening, the excess lifetime risk of invasive breast carcinoma plus ductal carcinoma in situ amounted to 6% (95% confidence interval −10% to 25%) in the Copenhagen screening programme and 1% (−7% to 10%) in the Funen programme. Among targeted women followed for at least eight years after the end of screening, the excess risk amounted to 3% (−14% to 25%) in the Copenhagen programme and 0.7% (−9% to 12%) in the Funen programme. Among participants, the excess risk amounted to 1-5% for participants followed for at least eight years after the end of screening.
In these Danish screening mammography programmes, the incidence of breast cancer over time developed in accordance with the introduction of a lead time. The incidence increased with the start of screening, was somewhat above the level without screening in the subsequent screening rounds, and decreased significantly below this level once invitation to screening had stopped. The relative risk for incidence of invasive breast carcinoma plus ductal carcinoma in situ was still below unity 4-7 years after the end of screening (0.91, 0.73 to 1.14, for Copenhagen and 0.78, 0.64 to 0.94, for Funen), suggesting that the compensatory drop after the end of screening took at least eight years. Therefore, the most reliable estimate of overdiagnosis should be based on women followed for at least eight years after the end of screening and should include both invasive breast carcinoma and ductal carcinoma in situ.
If 1000 women targeted by screening in Copenhagen were followed for a total of 20 years, of which at least eight years were after end of screening, the relative risk of 1.034 would translate into 86.9 cases of invasive breast carcinoma and ductal carcinoma in situ in the screening group compared with 84.0 cases expected in the absence of screening. For Funen, a similar calculation would give 78.3 observed cases compared with 77.7 cases expected in the absence of screening.
In the Danish screening programmes, in which ductal carcinoma in situ in the first three invitation rounds constituted 13-14% of screen detected cases,12 13 the excess breast cancer risk in targeted women changed only slightly after inclusion of ductal carcinoma in situ. This was because cases of ductal carcinoma in situ constituted only a minor part of all the cases in the study group. This finding stresses the seriousness of any overdiagnosis, as invasive cancer is treated more intensively than ductal carcinoma in situ.
The relatively modest estimates of overdiagnosis found in this study make good sense, considering that screening is carried out in middle aged women in developed countries and that these women have an excellent life expectancy, leaving considerable time for preclinical disease to progress to symptoms in the absence of screening. Although occasional documented cases of regression of untreated tumours occur, untreated breast cancer is generally characterised by very low survival.14
Strengths and weaknesses of study
To our knowledge, this is the first analysis of overdiagnosis in screening mammography based on individual follow-up of all women targeted by screening as well as of women in population based control groups. Follow-up of women targeted by screening has also been reported from Florence, Italy.15 16 Our study was based on register data, which were all linkable owing to the use of the personal identification numbers as the common identifier in population and health registers. Our analysis took the underlying increase in incidence of breast cancer into account. Our study was entirely observational without assumptions on transition probabilities or length of lead time. As pointed out by de Gelder et al,17 the choice of denominator plays an important role in the calculation of overdiagnosis. To be able to inform women about the long term consequences of participation in screening, we calculated the cumulative incidence of invasive breast carcinoma plus ductal carcinoma in situ after a first invitation to screening divided by the cumulative incidence of invasive breast carcinoma plus ductal carcinoma in situ expected during the same lifespan in the absence of screening.
A limitation in the “three control group” design was the lack of control of possible interaction between region and period. We compensated for this by analysing our study group and three control groups in a pre-study period. The interaction between region and period was statistically significant for Copenhagen and was included in the analysis, whereas this was not the case for Funen. The difference between the two regions was most likely explained by the decreasing size and possibly changed composition of the Copenhagen population over time and the inclusion of historical control birth cohorts back to 1907 in Copenhagen but only to 1912 in Funen. Eventual earlier uptake of hormone therapy use in Copenhagen could also have contributed to the interaction. The Copenhagen study population was larger than the study population from Funen; however, the inclusion of the interaction term in the analysis for Copenhagen resulted in broader confidence intervals than for Funen. For Copenhagen, we estimated the interaction in the study period by assuming that it equalled the interaction found in the pre-study period. In view of the increase over time in the pre-study period in the incidence for the Copenhagen cohorts born in 1921-25 (see appendix figure), we might have overestimated the overdiagnosis in Copenhagen. Inclusion of an interaction term in the analysis for Funen only marginally changed the results
Another limitation was inclusion only of women first targeted at ages 56-69 for Copenhagen and 59-69 for Funen, to ensure follow-up time after end of screening. Results from this study therefore pertain to women aged 56 and above at first invitation. These women had, however, considerable screening experience as they had been invited up to seven times in Copenhagen and six times in Funen. Puliti et al found the excess incidence of breast cancer in the last year of screening to be quite similar for women aged 50-54, 55-59, and 60-64 years at the start of service screening.16 We therefore expect our estimate of overdiagnosis to be fairly representative also for women aged 50-69 at the start of service screening. Finally, the overdiagnosis in screening is expected to be affected by the detection rates for ductal carcinoma in situ. On the basis of crude proportion data, detection of ductal carcinoma in situ varies across screening settings, constituting 13-14% of screen detected cases in the two Danish programmes compared with 24% in US screening data.18 Overdiagnosis therefore has to be considered in view of the actual screening context.
Comparison with other studies
A wide range of estimates of overdiagnosis have been published. Here, we focus on estimates based on data from Denmark and neighbouring Nordic countries. In our earlier analysis of breast cancer incidence rates for women aged 50-69 in Copenhagen and Funen in the first years after the introduction of screening,19 we found marked prevalence peaks (consistent with the “red areas” in this study) and incidence rates within the expected time trend during subsequent screening rounds (consistent with the “yellow areas”). Using Markov modelling, Olsen et al estimated overdiagnosis based on data from the first two rounds of the Copenhagen programme to be 7.8% (0.3% to 26.5%) for the first round and 0.5% (0.02% to 2.1%) for the second round.20 Given the broad confidence intervals, these results were in agreement with those of our study. Jørgensen et al analysed the time trends in incidence of invasive breast carcinoma plus ductal carcinoma in situ for the period 1971-2003.21 This study merged data from Copenhagen, Funen, and Frederiksberg and used age groups instead of birth cohorts, population data instead of individual data, and a short follow-up period. The authors concluded that screening led to a 33% overdiagnosis. This crude estimate was not compatible with the outcome from the long term birth cohort data presented here.
Zahl et al compared the cumulative incidence of breast cancer in cohorts of Norwegian women offered screening three times with cohorts offered screening once and found a relative risk of 1.22 (1.16 to 1.30).22 Using the same study design, Zahl et al found a relative risk of 1.14 (1.10 to 1.18) for cohorts of Swedish women offered screening three times compared with cohorts offered screening once.23 These relative risks are to some extent comparable to our results from the subsequent screening rounds (“yellow areas”), and the estimated relative risk of 1.14 for Sweden was in fact the same as we found for Funen. The higher relative risk found in Norway has been linked to changes in use of hormone therapy.24
Dutch simulation data illustrated the importance of length of follow-up in studies of overdiagnosis.17 The same was seen in our data, where the relative risk was below unity not only 0-3 years after the end of screening but also 4-7 years after. If the follow-up had ended four years after end of screening, our study would have indicated an overdiagnosis of 9-10% instead of 1-6%. The importance of length of follow-up has been shown previously with data from the Malmö randomised trial, in which Moss found an overdiagnosis of 31% when the trial was still ongoing,25 whereas Zackrisson et al found an overdiagnosis of 8-10% when they included follow up to 15 years after the trial.6
Conclusions and policy implications
This cohort study based on data from two longstanding population based screening mammography programmes indicated that overdiagnosis of breast cancer including ductal carcinoma in situ most likely amounted to 2.3% (95% confidence interval −3% to 8%) in women aged 56-69 years when first targeted by screening. Among participating women, overdiagnosis of breast cancer including ductal carcinoma in situ most likely amounted to 1% in Funen and 5% in Copenhagen. Our study furthermore illustrated that at least eight years after the end of screening were needed to compensate for the excess incidence during screening.
What is already known on this topic
Overdiagnosis in screening mammography is a widely debated topic
Most studies have major methodological limitations
Cohort studies of women followed past their screening age are warranted
What this study adds
Population based data on overdiagnosis were controlled for the underlying trend in incidence of breast cancer
Overdiagnosis amounted to 2.3% in screen targeted women and to 1-5% in participants
At least eight years after end of screening were needed to compensate for the excess incidence during screening
Cite this as: BMJ 2013;346:f1064
Contributors: SHN participated in the design of the study, did the statistical analysis, participated in the analysis and interpretation of data, and drafted the manuscript. AHO participated in the design of the study, retrieved and checked data, and revised the manuscript critically. MB-T, WS, and IV contributed data on screening mammography and revised the manuscript critically. EL conceived the study, participated in the design of the study, and finalised the manuscript in collaboration with SHN. All authors have read and approved the final manuscript. All authors had full access to all of the data in the study. SHN is the guarantor.
Funding: This study was financially supported by the Esper and Olga Boel Foundation. The funding source had no role in the writing of the manuscript or the decision to submit it for publication.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: SHN and EL had financial support from Esper and Olga Boel Foundation; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: In accordance with Danish legislation, the study was notified to the Danish Data Inspection Agency (J No 2008-41-2191). No further ethical approval is needed for register based studies in Denmark.
Data sharing: No additional data available.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.