Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
BMJ 2003;327:195 (26 July), doi:10.1136/bmj.327.7408.195
Paul Glare, head of department1, Kiran Virik, research fellow1, Mark Jones, biostatistician2, Malcolm Hudson, professor3, Steffen Eychmuller, medical director4, John Simes, director2, Nicholas Christakis, professor5
1 Department of Palliative Care, Royal Prince Alfred Hospital, Camperdown, NSW 2050, Australia, 2 NHMRC Clinical Trials Centre, University of Sydney, Sydney, Australia, 3 Department of Statistics, Macquarie University, Sydney, 4 Department of Palliative Care, Kantonsspital, St Gallen, Switzerland, 5 Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
Correspondence to: P Glare paul{at}email.cs.nsw.gov.au
Data sources Cochrane Library, Medline (1996-2000), Embase, Current Contents, and Cancerlit databases as well as hand searching.
Study selection Studies were included if a physician's temporal clinical prediction of survival (CPS) and the actual survival (AS) for terminally ill cancer patients were available for statistical analysis. Study quality was assessed by using a critical appraisal tool produced by the local health authority.
Data synthesis Raw data were pooled and analysed with regression and other multivariate techniques.
Results 17 published studies were identified; 12 met the inclusion
criteria, and 8 were evaluable, providing 1563 individual prediction-survival
dyads. CPS was generally overoptimistic (median CPS 42 days, median AS 29
days); it was correct to within one week in 25% of cases and overestimated
survival by at least four weeks in 27%. The longer the CPS the greater the
variability in AS. Although agreement between CPS and AS was poor (weighted
0.36), the two were highly significantly associated after log
transformation (Spearman rank correlation 0.60, P < 0.001). Consideration
of performance status, symptoms, and use of steroids improved the accuracy of
the CPS, although the additional value was small. Heterogeneity of the
studies' results precluded a comprehensive meta-analysis.
Conclusions Although clinicians consistently overestimate survival, their predictions are highly correlated with actual survival; the predictions have discriminatory ability even if they are miscalibrated. Clinicians caring for patients with terminal cancer need to be aware of their tendency to overestimate survival, as it may affect patients' prospects for achieving a good death. Accurate prognostication models incorporating clinical prediction of survival are needed.
"How long do I have, doctor?" is a central question for patients with far advanced, incurable illnesses.2 Accurate prognoses are important so that patients can set appropriate goals and maximise their chances for having the kind of death that most people say they want. Accuracy of predicting survival is also a technical prerequisite for good decision making by clinicians, for study design and analysis by researchers, and for health service planning by administrators concerned with optimal end of life care. Because an accurate communicated or formulated prediction of survival is so relevant to the decisions that patients and doctors make, knowing how well clinicians can predict survival and whether modelling prognostic factors can add value to clinicians' predictions is important.
Several studies have suggested that contemporary doctors are inaccurate and overly optimistic when predicting the survival of patients with terminal cancer.3-5 The aim of this systematic review was to answer the following clinical questions related to clinical predictions of survival. Do doctors overestimate or underestimate the survival of terminally ill cancer patients on average? How reliable are doctors in estimating survival? Do doctors' estimates of survival provide information above and beyond prognostic or risk factor models for outcome? We obtained individual patient data from studies identified by a systematic search strategy and did a meta-analysis to answer these questions.
Search strategy
We searched Ovid Premedline (Jan 2001) and Medline (1966-2000), Embase,
Current Contents, Cochrane Library, and Cancerlit databases on 19 January
2001. The search strategy used the exploded MeSH terms neoplasms AND prognosis
AND terminally ill, limited to human studies, English language, and cohort
studies publication type. We also did a free text search using the text words
predict$, survival-analysis, incidence, cohort-studies combined with prognos$,
terminally ill, and neoplas$. Next, we hand searched the references sections
of the electronically identified articles and a book on prognostication at the
end of life7 in an
attempt to identify any articles missed by the electronic search. Finally, we
attempted to identify unpublished studies, and a medical librarian advised on
accessing theses and trial registries.
We obtained potential papers and screened them to see if they met the following preset selection criteria: (a) the study involved patients with far advanced cancer (that is, patients deemed "terminal" by the authors or who were referred for hospice admission); (b) the results section included a temporal survival prediction, given in days or weeks, made prospectively for each patient by a doctor; (c) the results section provided the patients' individual survival durations; and (d) the methods section provided an explanation of how the date of death was determined. If the raw data for clinical prediction of survival (CPS) and actual survival (AS) were not retrievable from the publication directly we contacted the authors to obtain them. We excluded papers if these data were neither retrievable from the publication nor obtainable from the authors.
Appraising the quality of the articles
Three of us (PG, KV, and SE) then re-read studies selected for inclusion in
the review and independently evaluated them for their quality by using the
MERGE guide for critical appraisal. As MERGE does not have a specific
checklist for studies of prognosis, we decided that studies of accuracy of CPS
resembled evaluation of a diagnostic test in many ways, so we used the
checklist provided for that purpose. MERGE incorporates a four point coding
system for appraising the quality of a study. These range from "a"
(criterion entirely fulfilled) through "b1" and "b2"
to "c" (criterion not at all fulfilled). The subcodes for each
aspect of the study are then summarised into a single code for the overall
assessment of quality, scoring the risk of bias from "A" (low)
through "B1" and "B2" to "C" (high).
Data abstraction
The individual patient data for CPS and AS could be abstracted directly
from papers if they were presented as a table or scatter plot. When CPS and AS
were presented in summarised form we sought the individual patient data from
the authors. We then entered all available data into a spreadsheet for
statistical analysis.
Quantitative data synthesis
We used descriptive statistics to summarise the data (medians and ranges)
by individual study and when pooled. We determined the accuracy of predictions
by examining the absolute difference between CPS and AS and the extent of
agreement between CPS and AS, measured by the weighted
statistic.
Owing to the skewed nature of survival data we decided to examine the
differences between CPS and AS after log transformation. We calculated
Spearman's rank correlation between log(AS) and log(CPS). To simplify
calculations, we rounded AS to the nearest month (categorised together if more
than six months). We used one way analysis of variance to test for
heterogeneity among the individual studies. We identified groups of
homogeneous studies by applying multiple comparisons with Tukey's method.
As well as CPS and AS, data for 15 patient based prognostic factors were
also available from two of the studies. We analysed these additional data
together in a combined data subset. We used multiple linear regression to
determine three predictive models for log(AS): (a) study and log(CPS)
alone; (b) the best model allowing for study and prognostic factors
only; (c) the best model allowing for study, log(CPS), and prognostic
factors. We included the factor "study" in the models because the
patients differed significantly in status (as measured by Karnofsky
performance status) between the two studies. We determined the best model by
backward elimination in each case, retaining only statistically significant
factors. To evaluate the effect of the health status of patients on accuracy
of CPS, we regressed log(AS) against the variables identified in each of the
three models according to three subgroups of patients defined by Karnofsky
performance status (< 40, 40-50, or
60).
|
We obtained all 17 relevant, published papers. After reading them, we excluded five that did not meet the inclusion criteria (CPS not temporal or population not limited to end stage cancer).10 11 19 20 22 The remaining 12 studies document the CPS for 1983 patients. Data on 1594 patients (80.3% of total) were available for analysis from eight of the studies, and we entered these into the meta-analysis.3 4 8 13-16 21 Because some individual CPS or AS data were missing, 1563 complete CPS-AS dyads were available for analysis. We extracted individual patient data from tables or figures of four studies (n=296),3 15 16 21 and the authors generously provided us with their original data for the other four (n=1280). The four studies that were excluded at this stage either had no data available (n=3)9 17 18 or involved duplicate data (n=1).12 Table 1 summarises the characteristics of the eight studies included in the review. Four were from the United Kingdom,3 15 16 21 three were from Italy,8 13 14 and one was from the United States.4 Two studies involved referring doctors,13 16 and the rest involved "receiving" doctors (palliative care specialists). Three studies involved patients in hospital,3 16 21 and the rest involved patients being cared for at home. All studies involved patient populations that were heterogeneous for the primary cancer site.
|
Quantitative data synthesis
Validity assessment
All eight studies were assessed as being biased (selection biases and
misclassification biases), with half being at a high risk. Selection biases
included a narrow spectrum of patients and failure to use an inception cohort.
Misclassification biases included the timing of the prediction in relation to
recruitment, variations in clinical experience of the doctor making the
prediction, access to other clinical information when making the prediction,
and involvement of the predicting doctor in providing ongoing care to the
patient.
Simple summary results
When all 1563 evaluable CPS-AS dyads were pooled, the median CPS was 42
days and the median AS was 29 days, a difference of 13 days. The overall range
of CPS was from zero days to more than two years, while AS had a range from
zero to almost 500 days. Table
1 summarises individual study values. Overall, CPS was correct to
within one week in 25% of cases, correct to within two weeks in 43%, and
correct to within four weeks in 61%. CPS overestimated AS by at least four
weeks in 27% of cases and underestimated it by at least four weeks in 12% of
cases. Although the level of agreement between CPS and AS was only fair
(weighted
0.36), the log transformation of CPS was significantly
correlated with the log transformation of AS (Spearman rank correlation 0.60,
t1540=32.3, P < 0.001). The rank correlations between
CPS and AS for the individual studies ranged from 0.26 to 0.73, and were
0.49 in all but one (study 3).
Figure 2 shows the range of AS, expressed in months, for various categories of CPS. This box and whisker plot shows the skewed distribution of AS, given CPS and the increasing variability in AS as CPS increases. When CPS exceeds six months it has no predictive value. Figure 2 also shows that for predictions of up to six months the median survival increases in an approximately linear fashion, even if it is inaccurate. The positioning of the interquartile ranges confirms physicians' tendency to overestimate the survival of terminally ill cancer patients; no more than one in four outlived their prognosis when CPS was six months or less.
|
Statistical aggregation
The patients in study 2 survived much longer than the patients in the other
seven studies, and studies 3 and 6 had the shortest survivals. With the
exception of study 2, CPS consistently overestimated AS.
Figure 3 shows the median
difference between AS and CPS, and its associated 95% confidence interval, for
each of the eight studies; a lack of uniformity in the results is
apparent.
|
A statistical test for heterogeneity with a one way analysis of variance confirmed significant heterogeneity between the studies (F7,1530 = 10.57, P < 0.001). Tukey's method of multiple comparisons suggested four different groupings of homogeneous studies; the discrepancy between CPS and AS was particularly marked in study 3. Because of the strong indication of heterogeneity, combining the data of the eight studies for extensive statistical analysis was not appropriate, which limited the aim of doing a comprehensive meta-analysis.
Modelling CPS and other prognostic factors
In the subset of 981 patients with data for multiple prognostic variables,
log(CPS) was statistically significantly correlated with log(AS)
(t961 = 31.93, P < 0.001). The R square value of 0.51
indicates that greater than 50% of the variation in log(AS) was explained by
log(CPS). Next, we generated a model based on 15 patient based prognostic
factors. Using backwards elimination we found that anorexia
(t974=7.50, P < 0.001), dyspnoea
(t974=3.00, P=0.003), blood transfusion
(t974=1.95, P=0.054), use of palliative steroids
(t974 = 5.12, P < 0.001), and the log transformation of
the Karnofsky performance status (KPS) score (t974 =
16.62, P < 0.001) were all statistically significantly correlated with
log(AS). A combination of all these prognostic factors accounted for only 35%
of the variation in log(AS). Finally, when we added log(CPS) to this model,
blood transfusion was no longer statistically significant. Thus palliative
steroid use, anorexia, dyspnoea, and log(KPS) all contributed additional value
to log(CPS) when predicting log(AS), but the additional value was small (R
square 0.54).
Prediction of AS according to health status
We repeated the models described above with the patients divided into three
subgroups based on Karnofsky performance status scores: < 40 (n=330), 40-50
(n=457), and
60 (n=194). Table
2 shows R square values obtained for each model, which indicate
that more accurate predictions are made for sicker patients than for healthier
ones. For each model, log(CPS) explains more of the variation in log(AS) as
the patient becomes sicker. The additional value provided by the other
prognostic factors (anorexia, dyspnoea, steroid use) changes little,
irrespective of how poor the patient's performance is.
|
The findings of this systematic review were also able to clarify whether or not CPS is more accurate closer to the event (referred to as the "horizon effect" by meteorologists).23 Because prognosis has a dynamic quality, it may become more or less certain as time passes. The horizon effect has only been studied to a limited extent for CPS. One study (which did not meet the inclusion criteria for this review) found evidence of the horizon effect.20 Another study (included in the review) found that the extent of prognostic error varies with both the CPS and the AS.4 The results of the systematic review support the concept of the horizon effect for CPS. Data were not available to answer some of the other questions relating to CPS, such as whether the demographics, training, or experience of the doctor makes a difference, whether the nature of the doctor-patient relationship does, or whether follow up predictions are superior to initial ones.
Strengths and weaknesses: comparison with previous studies
One previous qualitative systematic review on this topic broadly addressed
all known prognostic factors in terminal
cancer.5 The authors
also concluded that CPS is one of the best predictors of survival and is
correlated with AS. Our review extends those conclusions by focusing on
several questions relating to the characteristics of the CPS and providing
numerical answers to better understand its clinical usefulness as well as its
limitations.
Attempting a quantitative analysis of this literature presented several methodological challenges. Because this is not a review of a healthcare intervention or conventional diagnostic test, it falls outside the domain of the Cochrane Collaboration. This was why we used the local health authority's document for planning and undertaking a review. Identifying the studies to be included was also problematic.
Our electronic search strategy lacked sensitivity; only one in three relevant studies was located electronically. One study was published in a journal that was not indexed at the time of publication, and others that were missed were indexed but were coded under MeSH terms that we did not include in our search strategy, such as forecasting, hospices, palliative care, life expectancy, and time factors. Re-running the electronic search with these new terms identified no publication that was missed by our initial search methods but improved the sensitivity of the electronic search. Although an established search strategy exists for identifying studies of prognostic factors, the terms needed to maximise the sensitivity of searches for studies of the accuracy of survival estimation in terminally ill patients need further consideration. Furthermore, some palliative care journals, especially non-English language ones, are not registered on electronic databases and their articles can only be located by hand searching.
For appraising the quality of the studies two over-riding issues arose. The first was deciding what criteria to use, and the second was deciding how to apply them. Although predicting survival has to do with prognosis, studies to compare the accuracy of CPS with AS are closer in concept to the evaluation of a diagnostic test than to studies of prognosis. However, unlike other test evaluations, no reference standard exists with which CPS can be compared, other than the outcome itself. This makes blinding and verification bias irrelevant, but it reduces the usefulness of applying quality criteria when appraising studies of CPS. As a form of a diagnostic test, CPS predicts for a future health state and so is similar to screening in its evaluation. Therefore, the study population needs to be a well defined inception cohort, and spectrum bias and loss to follow up are important validity concerns. Information about the experience, specialty, and training of the clinician making the predictions may also be relevant and needs to be available. As associated decisions about the application or withholding of life sustaining treatments such as fluids or antibiotics will also affect survival, the physician or investigator making the prediction should not be responsible for the patient's clinical care. These are the types of problems with the quality of the studies in the review, and, in the absence of established criteria, our quality ratings may not be valid.
Although the heterogeneity of the studies prevented us from doing a comprehensive meta-analysis, some pooling of the data was still possible and we believe our principal findings are valid. The multiple comparisons indicated that the results of some of the studies were sufficiently homogeneous to permit this in a limited way, but such an approach was not consistent with our original objective. Approximately three quarters of the known data on the accuracy of CPS were available to be analysed. If the data from the other four studies had been available for inclusion, the problem of heterogeneity may have been lessened.
Possible explanations and implications
That doctors cannot predict the timing of death in terminal cancer with
much accuracy is not surprising, but the fact that their predictions are
highly correlated with survival indicates that they are able to sense when
things are starting to go wrong. Epidemiologists and other people who study
the accuracy of predictions, such as meteorologists, decompose prediction
accuracy into discrimination (ability to separate classes) and calibration
(the ability to assign meaningful probabilities to
outcomes).14 Stated
in this way, the results of this review indicate that doctors' predictions of
survival have discriminatory ability even if they are poorly calibrated. The
fact that physicians are able to distinguish which patients are dying may be
explained by the fact that patterns such as lack of response to treatment, the
rate of disease progression, the onset of the anorexia-cachexia syndrome, or
the loss of will to live may be recognised. That prognostic factors such as
performance status, symptoms, or use of corticosteroids, although
independently significant, provided little additional information is not
surprising, as physicians are likely to integrate this kind of information and
much else besides in their development of the CPS.
The key issue with CPS is not so much whether or how to improve physicians' discriminatory ability; rather it is how to supplement or support them in their formulation of prognosis and, in particular, how to enhance their calibration. Doctors need to be aware of their tendency to overestimate prognosis in cancer patients who are approaching death. This optimism may have serious implications for the patient in terms of inappropriate application of disease controlling treatment and delays in referral to a hospice or palliative care. The results of the meta-analysis suggest that survival of patients is typically 30% shorter than predicted. Doctors need to consider adjusting their predictions to take this into account, but arbitrarily assigning a "correction factor" of 0.7 to their CPS cannot be recommended.
Broad variation exists in the importance that patients, bereaved family members, physicians, and other care providers place on knowing the timing of the terminally ill patients' death,2 but this ambivalence should not contribute to a lack of calibration of clinicians' predictions. Accurate prognoses are important not only for diagnostic and therapeutic decision making but also for the selection and stratification of the participants in clinical trials. McKillop has proposed that prognostication has two aspects: the "general prognosis" based on the tissue diagnosis and its associated prognostic markers is incorporated with the patient's own physical and psychosocial attributes to provide an "individual prognosis," specific to the person before the physician.23 Such a model provides the basis for understanding and improving the calibration of physicians' predictions. The identification of novel prognostic factors such as C reactive protein, cytokines, and patient rated quality of life and the development of clinical prediction tools may help to recalibrate physicians' predictions.
Unanswered questions and future research
Because CPS seems to be related to AS, further studies that merely look at
the accuracy of predictions or document the miscalibration are not warranted.
Further research is needed on whether the demographics, training, or
experience of the doctor makes a difference; whether the nature of the
doctor-patient relationship is important; whether predictions made at follow
up are superior to initial ones; and ways to enhance the CPS. On the basis of
our findings, CPS could now be used as the reference standard for evaluating
other methods for predicting survival, and it has been used for this
purpose.24
Understanding how doctors formulate their predictions, and interventions that
train inexperienced doctors to make better predictions are also worthy of
consideration.
|
If doctors are better able to anticipate death, they will be likely to be better able to make judicious use of medical treatments and optimise the use of palliative care, avoiding unnecessary treatments near the end of life. They will also help patients to achieve a good death if for no other reason than that they help to fulfil patients' own expectations about the kind of information they want. Although not all patients want all the prognostic information all of the time, most patients want most of the information most of the time.4 Doctors face two challenges in prognosticating near the end of life: formulating accurate predictions and communicating them. The former act, which has been the subject of this review, is a predicate for the latter, but we believe that both are necessary for patients to achieve a good death.
Competing interests: None declared.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
StumbleUpon
Technorati What's this?
Read all Rapid Responses