Effect of improved data collection on breast cancer incidence and survival: reconciliation of a registry with a clinical database

BMJ 2000; 321 doi: (Published 22 July 2000) Cite this as: BMJ 2000;321:214
  1. Anne Stotter, consultant surgeona (anne.stotter{at},
  2. Nicola Bright, medical statisticianb,
  3. Paul B Silcocks, assistant director (research and intelligence)b,
  4. Johannes L Botha, directorb
  1. a Glenfield Hospital NHS Trust, Leicester LE3 9QP,
  2. b Trent Cancer Registry, Weston Park Hospital NHS Trust, Sheffield S10 2SJ
  1. Correspondence to: A Stotter
  • Accepted 25 April 2000

In 1998 a project was undertaken to improve cancer registration in the Trent region by establishing a direct link between Trent Cancer Registry and a breast cancer clinical database for Leicestershire patients at Glenfield Hospital. This provided an opportunity to study registrations for 1997 to determine how many and who were missed. We then estimated the effect of improved ascertainment on incidence and survival.

Methods and results

Throughout 1998 Trent Cancer Registry sent registrations for 1997 to Glenfield Hospital for comparison. We investigated three main issues.

  • Differences in recorded date of diagnosis, including identification of an earlier date of diagnosis for patients registered as diagnosed at the time of death (that is, where diagnosis is based solely on death certificate information)

  • Characteristics of those who were missed

  • Effects of these factors on the apparent incidence and survival of breast cancer patients.

Using Leicestershire breast cancer registrations for 1993 on the registry database, we fitted a Cox model1 to the age specific survival and used it to predict survival for the 1997 cohort.

At the cancer registry 535 breast cancers were registered for 1997. On the Glenfield database the date of diagnosis was a median of 26 days earlier than the date of registration; 70 registrations were assigned to a previous year. By the end of 1998, 134 patients listed on the Glenfield database were still not registered. Thus, 599 diagnoses were finally identified for 1997, a 12% increase.

The median age of those registered was 58 years, compared with 74 for those missed (the median age of those who underwent surgery was 56 compared with 77 for those who did not). All 12 patients receiving private care were missed by the registry. Of 62 registrations based solely on death certificate information, 25 had an earlier date of diagnosis on the Glenfield database; they had a median survival of 11 months.

Using the model, we predicted five year survival to be 62% in the 535 registered cases and 59% in all 599 cases (figure).


Predicted survival of Leicestershire patients with breast cancer diagnosed in 1997


Improved ascertainment of breast cancer registrations apparently increased the incidence of cancer and, counter intuitively, seemed to reduce survival. This was because the missed cases were not typical of the cohort as a whole but comprised a subset with a lower life expectancy.

The main source of notifications for the registry was through the hospital patient administration system—that is, inpatients and day cases only. Patients managed as outpatients (such as those treated with tamoxifen only) were likely to be missed. The addition of the missing cases reduced the overall survival because they were predominantly older women, with a shorter survival.

Additionally, the registry is notified of the death of all registered patients and of those for whom cancer is mentioned as a cause of death. Registrations based solely on death certificate information are conventionally excluded from survival statistics, and the Glenfield database now allows a search for an earlier diagnosis date. The short median survival of 11 months for these patients contributes to a reduction in overall survival.

Standardisation of the definition of the diagnosis date in the registry to that used in the Glenfield database will result in nearly one month's improvement in the apparent survival of patients with breast cancer. This small increase will not, however, outweigh the other factors.

National cancer statistics are used as evidence of “black spots” in cancer incidence and survival. 2 3 It is crucial that those who use them recognise that statistics are influenced by many factors besides cancer epidemiology and treatment, in particular the details of how the data are collected. This becomes increasingly important when attempts are made to compare different centres and even countries.4


Contributors: The idea started with AS, who, with NB, collected and analysed the data on the 1997 patients. NB and PBS performed the survival analysis of the 1993 patients, and PBS applied the Cox model to the 1997 patients. AS and JLB wrote the paper and are the guarantors for the study.


  • Funding None.

  • Competing interests None declared.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
View Abstract