Intended for healthcare professionals

CCBY Open access

Development and validation of risk prediction equations to estimate survival in patients with colorectal cancer: cohort study

BMJ 2017; 357 doi: (Published 15 June 2017) Cite this as: BMJ 2017;357:j2497
  1. Julia Hippisley-Cox, professor1,
  2. Carol Coupland, professor1
  1. 1Division of Primary Care, University Park, Nottingham NG2 7RD, UK
  1. Correspondence to: J Hippisley-Cox julia.hippisley-cox{at}
  • Accepted 17 May 2017


Objective To develop and externally validate risk prediction equations to estimate absolute and conditional survival in patients with colorectal cancer.

Design Cohort study.

Setting General practices in England providing data for the QResearch database linked to the national cancer registry.

Participants 44 145 patients aged 15-99 with colorectal cancer from 947 practices to derive the equations. The equations were validated in 15 214 patients with colorectal cancer from 305 different QResearch practices and 437 821 patients with colorectal cancer from the national cancer registry.

Main outcome measures The primary outcome was all cause mortality and secondary outcome was colorectal cancer mortality.

Methods Cause specific hazards models were used to predict risks of colorectal cancer mortality and other cause mortality accounting for competing risks, and these risk estimates were combined to obtain risks of all cause mortality. Separate equations were derived for men and women. Several variables were tested: age, ethnicity, deprivation score, cancer stage, cancer grade, surgery, chemotherapy, radiotherapy, smoking status, alcohol consumption, body mass index, family history of bowel cancer, anaemia, liver function test result, comorbidities, use of statins, use of aspirin, clinical values for anaemia, and platelet count. Measures of calibration and discrimination were determined in both validation cohorts at 1, 5, and 10 years.

Results The final models included the following variables in men and women: age, deprivation score, cancer stage, cancer grade, smoking status, colorectal surgery, chemotherapy, family history of bowel cancer, raised platelet count, abnormal liver function, cardiovascular disease, diabetes, chronic renal disease, chronic obstructive pulmonary disease, prescribed aspirin at diagnosis, and prescribed statins at diagnosis. Improved survival in women was associated with younger age, earlier stage of cancer, well or moderately differentiated cancer grade, colorectal cancer surgery (adjusted hazard ratio 0.50), family history of bowel cancer (0.62), and prescriptions for statins (0.77) and aspirin (0.83) at diagnosis, with comparable results for men. The risk equations were well calibrated, with predicted risks closely matching observed risks. Discrimination was good in men and women in both validation cohorts. For example, the five year survival equations on the QResearch validation cohort explained 45.3% of the variation in time to colorectal cancer death for women, the D statistic was 1.86, and Harrell’s C statistic was 0.80 (both measures of discrimination, indicating that the scores are able to distinguish between people with different levels of risk). The corresponding results for all cause mortality were 42.6%, 1.77, and 0.79.

Conclusions Risk prediction equations were developed and validated to estimate overall and conditional survival of patients with colorectal cancer accounting for an individual’s clinical and demographic characteristics. These equations can provide more individualised accurate information for patients with colorectal cancer to inform decision making and follow-up.


  • A web calculator to calculate estimates of absolute survival can be accessed at Open source software is also available for download.

    We thank the EMIS practices who contribute to QResearch; EMIS for expertise in establishing, developing, and supporting the database; Public Health England (PHE) for supplying the cancer registration data; and the Office of National Statistics (ONS) for providing the mortality data. PHE and ONS bear no responsibility for the analysis or interpretation of the data.

  • Contributors: JHC initiated the study, undertook the literature review, data extraction, data manipulation, and primary data analysis, and wrote the first draft of the paper. CC contributed to the design, analysis, interpretation, and drafting of the paper. JHC is the guarantor.

  • Funding: No external funding.

  • Competing interests: Both authors have completed the ICMJE uniform disclosure form at (available on request from the corresponding author) and declare: JHC is professor of clinical epidemiology at the University of Nottingham and codirector of QResearch, a not-for-profit organisation, which is a joint partnership between the University of Nottingham and Egton Medical Information Systems (leading commercial supplier of IT for 60% of general practices in the UK). JHC is also a paid director of ClinRisk, which produces open and closed source software to ensure the reliable and updatable implementation of clinical risk equations within clinical computer systems to help improve patient care. JHC is a trustee of the EMIS National User Group (education and research charity). CC is professor of medical statistics at the University of Nottingham and a paid consultant statistician for ClinRisk. This work and any views expressed within it are solely those of the coauthors and not of any affiliated bodies or organisations.

  • Ethical approval: This study was reviewed in accordance with the QResearch agreement with National Research Ethics Service East Midlands, Derby (reference 03/4/021).

  • Data sharing: The equations presented in this paper will be released as open source software under the GNU lesser General Public License v3. The open source software allows use without charge under the terms of the GNU lesser public license version 3. Closed source software can be licensed at a fee.

  • Transparency: The manuscript’s guarantor (JHC) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See:

View Full Text