A systematic review of physicians' survival predictions in terminally ill cancer patientsBMJ 2003; 327 doi: http://dx.doi.org/10.1136/bmj.327.7408.195 (Published 24 July 2003) Cite this as: BMJ 2003;327:195
- Paul Glare (), head of department1,
- Kiran Virik, research fellow1,
- Mark Jones, biostatistician2,
- Malcolm Hudson, professor3,
- Steffen Eychmuller, medical director4,
- John Simes, director2,
- Nicholas Christakis, professor5
- 1 Department of Palliative Care, Royal Prince Alfred Hospital, Camperdown, NSW 2050, Australia
- 2 NHMRC Clinical Trials Centre, University of Sydney, Sydney, Australia
- 3 Department of Statistics, Macquarie University, Sydney
- 4 Department of Palliative Care, Kantonsspital, St Gallen, Switzerland
- 5 Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
- Correspondence to: P Glare
- Accepted 12 June 2003
Objective To systematically review the accuracy of physicians' clinical predictions of survival in terminally ill cancer patients.
Data sources Cochrane Library, Medline (1996-2000), Embase, Current Contents, and Cancerlit databases as well as hand searching.
Study selection Studies were included if a physician's temporal clinical prediction of survival (CPS) and the actual survival (AS) for terminally ill cancer patients were available for statistical analysis. Study quality was assessed by using a critical appraisal tool produced by the local health authority.
Data synthesis Raw data were pooled and analysed with regression and other multivariate techniques.
Results 17 published studies were identified; 12 met the inclusion criteria, and 8 were evaluable, providing 1563 individual prediction-survival dyads. CPS was generally overoptimistic (median CPS 42 days, median AS 29 days); it was correct to within one week in 25% of cases and overestimated survival by at least four weeks in 27%. The longer the CPS the greater the variability in AS. Although agreement between CPS and AS was poor (weighted κ 0.36), the two were highly significantly associated after log transformation (Spearman rank correlation 0.60, P < 0.001). Consideration of performance status, symptoms, and use of steroids improved the accuracy of the CPS, although the additional value was small. Heterogeneity of the studies' results precluded a comprehensive meta-analysis.
Conclusions Although clinicians consistently overestimate survival, their predictions are highly correlated with actual survival; the predictions have discriminatory ability even if they are miscalibrated. Clinicians caring for patients with terminal cancer need to be aware of their tendency to overestimate survival, as it may affect patients' prospects for achieving a good death. Accurate prognostication models incorporating clinical prediction of survival are needed.
Diagnosis, treatment, and prognosis are the core clinical skills fundamental to the good practice of medicine, but the first half of the 20th century saw treatment displace prognosis as the core skill accompanying diagnosis.1 The inception over the past 40 years of palliative medicine as the study of the specialised care for patients with incurable illnesses has led to a renewed interest in prognostication.
“How long do I have, doctor?” is a central question for patients with far advanced, incurable illnesses.2 Accurate prognoses are important so that patients can set appropriate goals and maximise their chances for having the kind of death that most people say they want. Accuracy of predicting survival is also a technical prerequisite for good decision making by clinicians, for study design and analysis by researchers, and for health service planning by administrators concerned with optimal end of life care. Because an accurate communicated or formulated prediction of survival is so relevant to the decisions that patients and doctors make, knowing how well clinicians can predict survival and whether modelling prognostic factors can add value to clinicians' predictions is important.
Several studies have suggested that contemporary doctors are inaccurate and overly optimistic when predicting the survival of patients with terminal cancer.3–5 The aim of this systematic review was to answer the following clinical questions related to clinical predictions of survival. Do doctors overestimate or underestimate the survival of terminally ill cancer patients on average? How reliable are doctors in estimating survival? Do doctors' estimates of survival provide information above and beyond prognostic or risk factor models for outcome? We obtained individual patient data from studies identified by a systematic search strategy and did a meta-analysis to answer these questions.
The systematic review followed the process described in the Method for Evaluating Research and Guideline Evidence (MERGE) document developed by the local health authority.6
We searched Ovid Premedline (Jan 2001) and Medline (1966-2000), Embase, Current Contents, Cochrane Library, and Cancerlit databases on 19 January 2001. The search strategy used the exploded MeSH terms neoplasms AND prognosis AND terminally ill, limited to human studies, English language, and cohort studies publication type. We also did a free text search using the text words predict$, survival-analysis, incidence, cohort-studies combined with prognos$, terminally ill, and neoplas$. Next, we hand searched the references sections of the electronically identified articles and a book on prognostication at the end of life7 in an attempt to identify any articles missed by the electronic search. Finally, we attempted to identify unpublished studies, and a medical librarian advised on accessing theses and trial registries.
We obtained potential papers and screened them to see if they met the following preset selection criteria: (a) the study involved patients with far advanced cancer (that is, patients deemed “terminal” by the authors or who were referred for hospice admission); (b) the results section included a temporal survival prediction, given in days or weeks, made prospectively for each patient by a doctor; (c) the results section provided the patients' individual survival durations; and (d) the methods section provided an explanation of how the date of death was determined. If the raw data for clinical prediction of survival (CPS) and actual survival (AS) were not retrievable from the publication directly we contacted the authors to obtain them. We excluded papers if these data were neither retrievable from the publication nor obtainable from the authors.
Appraising the quality of the articles
Three of us (PG, KV, and SE) then re-read studies selected for inclusion in the review and independently evaluated them for their quality by using the MERGE guide for critical appraisal. As MERGE does not have a specific checklist for studies of prognosis, we decided that studies of accuracy of CPS resembled evaluation of a diagnostic test in many ways, so we used the checklist provided for that purpose. MERGE incorporates a four point coding system for appraising the quality of a study. These range from “a” (criterion entirely fulfilled) through “b1” and “b2” to “c” (criterion not at all fulfilled). The subcodes for each aspect of the study are then summarised into a single code for the overall assessment of quality, scoring the risk of bias from “A” (low) through “B1” and “B2” to “C” (high).
The individual patient data for CPS and AS could be abstracted directly from papers if they were presented as a table or scatter plot. When CPS and AS were presented in summarised form we sought the individual patient data from the authors. We then entered all available data into a spreadsheet for statistical analysis.
Quantitative data synthesis
We used descriptive statistics to summarise the data (medians and ranges) by individual study and when pooled. We determined the accuracy of predictions by examining the absolute difference between CPS and AS and the extent of agreement between CPS and AS, measured by the weighted statistic. Owing to the skewed nature of survival data we decided to examine the differences between CPS and AS after log transformation. We calculated Spearman's rank correlation between log(AS) and log(CPS). To simplify calculations, we rounded AS to the nearest month (categorised together if more than six months). We used one way analysis of variance to test for heterogeneity among the individual studies. We identified groups of homogeneous studies by applying multiple comparisons with Tukey's method.
As well as CPS and AS, data for 15 patient based prognostic factors were also available from two of the studies. We analysed these additional data together in a combined data subset. We used multiple linear regression to determine three predictive models for log(AS): (a) study and log(CPS) alone; (b) the best model allowing for study and prognostic factors only; (c) the best model allowing for study, log(CPS), and prognostic factors. We included the factor “study” in the models because the patients differed significantly in status (as measured by Karnofsky performance status) between the two studies. We determined the best model by backward elimination in each case, retaining only statistically significant factors. To evaluate the effect of the health status of patients on accuracy of CPS, we regressed log(AS) against the variables identified in each of the three models according to three subgroups of patients defined by Karnofsky performance status (< 40, 40-50, or 60.
Figure 1 shows the trial flow of the systematic review. The electronic search produced 22 citations, yielding six papers of apparent relevance.4 8–12 The hand search identified 11 other studies.3 14–22 We identified one unpublished study but were unable to contact the author. No registered trials or theses were identified.
We obtained all 17 relevant, published papers. After reading them, we excluded five that did not meet the inclusion criteria (CPS not temporal or population not limited to end stage cancer).10 11 19 20 22 The remaining 12 studies document the CPS for 1983 patients. Data on 1594 patients (80.3% of total) were available for analysis from eight of the studies, and we entered these into the meta-analysis.3 4 8 13–16 21 Because some individual CPS or AS data were missing, 1563 complete CPS-AS dyads were available for analysis. We extracted individual patient data from tables or figures of four studies (n=296),3 15 16 21 and the authors generously provided us with their original data for the other four (n=1280). The four studies that were excluded at this stage either had no data available (n=3)9 17 18 or involved duplicate data (n=1).12 Table 1 summarises the characteristics of the eight studies included in the review. Four were from the United Kingdom,3 15 16 21 three were from Italy,8 13 14 and one was from the United States.4 Two studies involved referring doctors,13 16 and the rest involved “receiving” doctors (palliative care specialists). Three studies involved patients in hospital,3 16 21 and the rest involved patients being cared for at home. All studies involved patient populations that were heterogeneous for the primary cancer site.
Quantitative data synthesis
All eight studies were assessed as being biased (selection biases and misclassification biases), with half being at a high risk. Selection biases included a narrow spectrum of patients and failure to use an inception cohort. Misclassification biases included the timing of the prediction in relation to recruitment, variations in clinical experience of the doctor making the prediction, access to other clinical information when making the prediction, and involvement of the predicting doctor in providing ongoing care to the patient.
Simple summary results
When all 1563 evaluable CPS-AS dyads were pooled, the median CPS was 42 days and the median AS was 29 days, a difference of 13 days. The overall range of CPS was from zero days to more than two years, while AS had a range from zero to almost 500 days. Table 1 summarises individual study values. Overall, CPS was correct to within one week in 25% of cases, correct to within two weeks in 43%, and correct to within four weeks in 61%. CPS overestimated AS by at least four weeks in 27% of cases and underestimated it by at least four weeks in 12% of cases. Although the level of agreement between CPS and AS was only fair (weighted κ 0.36), the log transformation of CPS was significantly correlated with the log transformation of AS (Spearman rank correlation 0.60, t1540=32.3, P < 0.001). The rank correlations between CPS and AS for the individual studies ranged from 0.26 to 0.73, and were ≥ 0.49 in all but one (study 3).
Figure 2 shows the range of AS, expressed in months, for various categories of CPS. This box and whisker plot shows the skewed distribution of AS, given CPS and the increasing variability in AS as CPS increases. When CPS exceeds six months it has no predictive value. Figure 2 also shows that for predictions of up to six months the median survival increases in an approximately linear fashion, even if it is inaccurate. The positioning of the interquartile ranges confirms physicians' tendency to overestimate the survival of terminally ill cancer patients; no more than one in four outlived their prognosis when CPS was six months or less.
The patients in study 2 survived much longer than the patients in the other seven studies, and studies 3 and 6 had the shortest survivals. With the exception of study 2, CPS consistently overestimated AS. Figure 3 shows the median difference between AS and CPS, and its associated 95% confidence interval, for each of the eight studies; a lack of uniformity in the results is apparent.
A statistical test for heterogeneity with a one way analysis of variance confirmed significant heterogeneity between the studies (F7,1530 = 10.57, P < 0.001). Tukey's method of multiple comparisons suggested four different groupings of homogeneous studies; the discrepancy between CPS and AS was particularly marked in study 3. Because of the strong indication of heterogeneity, combining the data of the eight studies for extensive statistical analysis was not appropriate, which limited the aim of doing a comprehensive meta-analysis.
Modelling CPS and other prognostic factors
In the subset of 981 patients with data for multiple prognostic variables, log(CPS) was statistically significantly correlated with log(AS) (t961 = 31.93, P < 0.001). The R square value of 0.51 indicates that greater than 50% of the variation in log(AS) was explained by log(CPS). Next, we generated a model based on 15 patient based prognostic factors. Using backwards elimination we found that anorexia (t974=7.50, P < 0.001), dyspnoea (t974=3.00, P=0.003), blood transfusion (t974=1.95, P=0.054), use of palliative steroids (t974 = 5.12, P < 0.001), and the log transformation of the Karnofsky performance status (KPS) score (t974 = 16.62, P < 0.001) were all statistically significantly correlated with log(AS). A combination of all these prognostic factors accounted for only 35% of the variation in log(AS). Finally, when we added log(CPS) to this model, blood transfusion was no longer statistically significant. Thus palliative steroid use, anorexia, dyspnoea, and log(KPS) all contributed additional value to log(CPS) when predicting log(AS), but the additional value was small (R square 0.54).
Prediction of AS according to health status
We repeated the models described above with the patients divided into three subgroups based on Karnofsky performance status scores: < 40 (n=330), 40-50 (n=457), and ≥ 60 (n=194). Table 2 shows R square values obtained for each model, which indicate that more accurate predictions are made for sicker patients than for healthier ones. For each model, log(CPS) explains more of the variation in log(AS) as the patient becomes sicker. The additional value provided by the other prognostic factors (anorexia, dyspnoea, steroid use) changes little, irrespective of how poor the patient's performance is.
Statement of principal findings
This systematic review of eight published studies that cover more than 1500 predictions of survival by doctors in three different countries over 30 years has gone some of the way towards answering the three questions of interest concerning clinical prediction of survival (CPS). Doctors' predictions for terminally ill cancer patients (a population very close to death with a median survival of approximately four weeks) were inaccurate—they were correct to within a week in only 25% of cases and out by more than four weeks in a similar number. Doctors consistently overestimated the duration of survival in seven of the eight studies. Despite being inaccurate, clinical predictions are clinically useful; CPS and actual survival (AS) were strongly correlated. Furthermore, our independent modelling of supplementary data from two large Italian studies included in the review indicated that CPS seems to be better than conventional prognostic variables factors used in this population, such as performance status and symptoms, although CPS was more accurate in patients with worse performance status. These factors may help to refine the clinician's prediction to a limited extent. Our finding of the predominance of the CPS over patient related measures such as performance status and symptoms in predicting survival is also consistent with that of investigators whose data were not included in this review.19 20 These results reaffirm the importance of physicians' judgment in an era of expanding technology and dependence on test results.
The findings of this systematic review were also able to clarify whether or not CPS is more accurate closer to the event (referred to as the “horizon effect” by meteorologists).23 Because prognosis has a dynamic quality, it may become more or less certain as time passes. The horizon effect has only been studied to a limited extent for CPS. One study (which did not meet the inclusion criteria for this review) found evidence of the horizon effect.20 Another study (included in the review) found that the extent of prognostic error varies with both the CPS and the AS.4 The results of the systematic review support the concept of the horizon effect for CPS. Data were not available to answer some of the other questions relating to CPS, such as whether the demographics, training, or experience of the doctor makes a difference, whether the nature of the doctor-patient relationship does, or whether follow up predictions are superior to initial ones.
Strengths and weaknesses: comparison with previous studies
One previous qualitative systematic review on this topic broadly addressed all known prognostic factors in terminal cancer.5 The authors also concluded that CPS is one of the best predictors of survival and is correlated with AS. Our review extends those conclusions by focusing on several questions relating to the characteristics of the CPS and providing numerical answers to better understand its clinical usefulness as well as its limitations.
Attempting a quantitative analysis of this literature presented several methodological challenges. Because this is not a review of a healthcare intervention or conventional diagnostic test, it falls outside the domain of the Cochrane Collaboration. This was why we used the local health authority's document for planning and undertaking a review. Identifying the studies to be included was also problematic.
Our electronic search strategy lacked sensitivity; only one in three relevant studies was located electronically. One study was published in a journal that was not indexed at the time of publication, and others that were missed were indexed but were coded under MeSH terms that we did not include in our search strategy, such as forecasting, hospices, palliative care, life expectancy, and time factors. Re-running the electronic search with these new terms identified no publication that was missed by our initial search methods but improved the sensitivity of the electronic search. Although an established search strategy exists for identifying studies of prognostic factors, the terms needed to maximise the sensitivity of searches for studies of the accuracy of survival estimation in terminally ill patients need further consideration. Furthermore, some palliative care journals, especially non-English language ones, are not registered on electronic databases and their articles can only be located by hand searching.
For appraising the quality of the studies two over-riding issues arose. The first was deciding what criteria to use, and the second was deciding how to apply them. Although predicting survival has to do with prognosis, studies to compare the accuracy of CPS with AS are closer in concept to the evaluation of a diagnostic test than to studies of prognosis. However, unlike other test evaluations, no reference standard exists with which CPS can be compared, other than the outcome itself. This makes blinding and verification bias irrelevant, but it reduces the usefulness of applying quality criteria when appraising studies of CPS. As a form of a diagnostic test, CPS predicts for a future health state and so is similar to screening in its evaluation. Therefore, the study population needs to be a well defined inception cohort, and spectrum bias and loss to follow up are important validity concerns. Information about the experience, specialty, and training of the clinician making the predictions may also be relevant and needs to be available. As associated decisions about the application or withholding of life sustaining treatments such as fluids or antibiotics will also affect survival, the physician or investigator making the prediction should not be responsible for the patient's clinical care. These are the types of problems with the quality of the studies in the review, and, in the absence of established criteria, our quality ratings may not be valid.
Although the heterogeneity of the studies prevented us from doing a comprehensive meta-analysis, some pooling of the data was still possible and we believe our principal findings are valid. The multiple comparisons indicated that the results of some of the studies were sufficiently homogeneous to permit this in a limited way, but such an approach was not consistent with our original objective. Approximately three quarters of the known data on the accuracy of CPS were available to be analysed. If the data from the other four studies had been available for inclusion, the problem of heterogeneity may have been lessened.
Possible explanations and implications
That doctors cannot predict the timing of death in terminal cancer with much accuracy is not surprising, but the fact that their predictions are highly correlated with survival indicates that they are able to sense when things are starting to go wrong. Epidemiologists and other people who study the accuracy of predictions, such as meteorologists, decompose prediction accuracy into discrimination (ability to separate classes) and calibration (the ability to assign meaningful probabilities to outcomes).14 Stated in this way, the results of this review indicate that doctors' predictions of survival have discriminatory ability even if they are poorly calibrated. The fact that physicians are able to distinguish which patients are dying may be explained by the fact that patterns such as lack of response to treatment, the rate of disease progression, the onset of the anorexia-cachexia syndrome, or the loss of will to live may be recognised. That prognostic factors such as performance status, symptoms, or use of corticosteroids, although independently significant, provided little additional information is not surprising, as physicians are likely to integrate this kind of information and much else besides in their development of the CPS.
The key issue with CPS is not so much whether or how to improve physicians' discriminatory ability; rather it is how to supplement or support them in their formulation of prognosis and, in particular, how to enhance their calibration. Doctors need to be aware of their tendency to overestimate prognosis in cancer patients who are approaching death. This optimism may have serious implications for the patient in terms of inappropriate application of disease controlling treatment and delays in referral to a hospice or palliative care. The results of the meta-analysis suggest that survival of patients is typically 30% shorter than predicted. Doctors need to consider adjusting their predictions to take this into account, but arbitrarily assigning a “correction factor” of 0.7 to their CPS cannot be recommended.
Broad variation exists in the importance that patients, bereaved family members, physicians, and other care providers place on knowing the timing of the terminally ill patients' death,2 but this ambivalence should not contribute to a lack of calibration of clinicians' predictions. Accurate prognoses are important not only for diagnostic and therapeutic decision making but also for the selection and stratification of the participants in clinical trials. McKillop has proposed that prognostication has two aspects: the “general prognosis” based on the tissue diagnosis and its associated prognostic markers is incorporated with the patient's own physical and psychosocial attributes to provide an “individual prognosis,” specific to the person before the physician.23 Such a model provides the basis for understanding and improving the calibration of physicians' predictions. The identification of novel prognostic factors such as C reactive protein, cytokines, and patient rated quality of life and the development of clinical prediction tools may help to recalibrate physicians' predictions.
Unanswered questions and future research
Because CPS seems to be related to AS, further studies that merely look at the accuracy of predictions or document the miscalibration are not warranted. Further research is needed on whether the demographics, training, or experience of the doctor makes a difference; whether the nature of the doctor-patient relationship is important; whether predictions made at follow up are superior to initial ones; and ways to enhance the CPS. On the basis of our findings, CPS could now be used as the reference standard for evaluating other methods for predicting survival, and it has been used for this purpose.24 Understanding how doctors formulate their predictions, and interventions that train inexperienced doctors to make better predictions are also worthy of consideration.
What is already known on this topic
Accurate prediction of the timing of death is important for good clinical decision making in the care of patients with a terminal illness
Doctors' survival predictions are not very accurate and often overestimate survival
Though inaccurate, doctors' predictions correlate with survival
What this study adds
Doctors' survival predictions become more accurate closer to the date of death
Though inaccurate, predictions of up to six months in length are nevertheless reliable, as they are highly correlated with actual survival
Traditional prognostic indicators such as performance status, anorexia, and breathlessness add little information to that contained in the physician's prediction
If doctors are better able to anticipate death, they will be likely to be better able to make judicious use of medical treatments and optimise the use of palliative care, avoiding unnecessary treatments near the end of life. They will also help patients to achieve a good death if for no other reason than that they help to fulfil patients' own expectations about the kind of information they want. Although not all patients want all the prognostic information all of the time, most patients want most of the information most of the time.4 Doctors face two challenges in prognosticating near the end of life: formulating accurate predictions and communicating them. The former act, which has been the subject of this review, is a predicate for the latter, but we believe that both are necessary for patients to achieve a good death.
Contributors PG and JS participated in designing the review. PG, KV, and SE decided on trial inclusion or exclusion, extracted data, and assessed study quality. NC, KV, and SE checked the data and revised the manuscript, which was drafted by PG. MJ and MH did the statistical analyses. JS and NC were the principal advisers, guiding and interpreting the review. PG is the guarantor for the paper.
Competing interests None declared.
Ethical approval Not needed.