- Patrick Royston, professor of statistics1,
- Karel G M Moons, professor of clinical epidemiology2,
- Douglas G Altman, professor of statistics in medicine3,
- Yvonne Vergouwe, assistant professor of clinical epidemiology2
- 1MRC Clinical Trials Unit, London NW1 2DA
- 2Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, Netherlands
- 3Centre for Statistics in Medicine, University of Oxford, Oxford OX2 6UD
- Correspondence to: P Royston pr{at}ctu.mrc.ac.uk
- Accepted 6 October 2008
The first article in this series reviewed why prognosis is important and how it is practised in different medical settings.1 We also highlighted the difference between multivariable models used in aetiological research and those used in prognostic research and outlined the design characteristics for studies developing a prognostic model. In this article we focus on developing a multivariable prognostic model. We illustrate the statistical issues using a logistic regression model to predict the risk of a specific event. The principles largely apply to all multivariable regression methods, including models for continuous outcomes and for time to event outcomes.
Summary points
Models with multiple variables can be developed to give accurate and discriminating predictions
In clinical practice simpler models are more practicable
There is no consensus on the ideal method for developing a model
Methods to develop simple, interpretable models are described and compared
The goal is to construct an accurate and discriminating prediction model from multiple variables. Models may be a complicated function of the predictors, as in weather forecasting, but in clinical applications considerations of practicality and face validity usually suggest a simple, interpretable model (as in box 1).
Box 1 Example of a prognostic model
Risk score from a logistic regression model to predict the risk of postoperative nausea or vomiting (PONV) within the first 24 hours after surgery2:
Risk score= −2.28+(1.27×female sex)+(0.65×history of PONV or motion sickness)+(0.72×non-smoking)+(0.78×postoperative opioid use)
where all variables are coded 0 for no or 1 for yes.
The value −2.28 is called the intercept and the other numbers are the estimated regression coefficients for the predictors, which indicate their mutually adjusted relative contribution to the outcome risk. The regression coefficients are log(odds ratios) for a change of 1 unit in …
Sign in
Personal subscribers, sign in here:
Article access
Article access for 1 day
Purchase this article for £20 $30 €32*
The PDF version can be downloaded as your personal record
CiteULike
Connotea
Del.icio.us
Digg
Facebook
Reddit
Technorati
Twitter
Stumbleupon
Rapid responses
Latest Responses
The decline in the breast cancer incidence is 1.2% and it is not significant.
Published 10 February 2012
'twas ever thus
Published 10 February 2012
The value of historic human remains
Published 10 February 2012
In Praise of British Literature
Published 10 February 2012
Is real shared decision making possible?
Published 10 February 2012
Most responses
Does anyone understand the government’s plan for the NHS? (17 responses)
Published 17 Jan 2012
Bad medicine: medical nutrition (15 responses)
Published 18 Jan 2012
Shared decision making: really putting patients at the centre of healthcare (7 responses)
Published 27 Jan 2012
Why legislation is necessary for my health reforms (7 responses)
Published 1 Feb 2012
Search for evidence goes on (5 responses)
Published 17 Jan 2012