Systematic reviews of evaluations of prognostic variablesBMJ 2001; 323 doi: https://doi.org/10.1136/bmj.323.7306.224 (Published 28 July 2001) Cite this as: BMJ 2001;323:224
- Douglas G Altman, director
- Imperial Cancer Research Fund Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, Oxford OX3 7LF
This is the last in a series of four articles
Prognostic studies include clinical studies of variables predictive of future events as well as epidemiological studies of aetiological risk factors. As multiple similar studies accumulate it becomes increasingly important to identify and evaluate all of the relevant studies to develop a more reliable overall assessment. For prognostic studies this is not straightforward.
Box 1 summarises the clinical importance of information on prognostic factors. Many of the issues discussed are also relevant to aetiological studies, especially cohort ones. Some features of prognostic studies lead to particular difficulties for the systematic reviewer. Firstly, in most clinical prognostic studies the outcome of primary interest is the time to an event, often death. Meta-analysis of such studies is rather more difficult than that for binary data or continuous measurements. Secondly, in many contexts the prognostic variable of interest is often one of several prognostic variables. When examining a variable of interest researchers should consider other prognostic variables with which it might be correlated. Thirdly, many prognostic factors are continuous variables, for which researchers use a wide variety of methods of analysis.
Systematic reviews are applicable to all types of research design, and studies of prognostic variables are an important additional area where appropriate methodology should be applied
Prognostic variables should be evaluated in a representative sample of patients assembled at a common point in the course of their disease—ideally they should all have received the same medical treatment or been in a randomised trial
When examined critically, a high proportion of prognostic studies are found to be methodologically poor
Meta-analysis of published data is hampered by difficulties in extraction of data and variation in the characteristics of the study and patients
The poor quality of the published literature is a strong argument in favour of systematic reviews but also an argument against formal meta-analysis
Meta-analysis of prognostic studies using individual data from patients may overcome many of these difficulties
The emphasis in this paper is on clinical studies to examine the variation in prognosis in relation to a single putative prognostic variable of interest (also called a prognostic marker or factor). A more detailed discussion can be found elsewhere.2
Identifying relevant publications
It is probably more difficult to identify all prognostic studies by searching the literature than it is for randomised trials, which itself is problematic. As yet there is no widely acknowledged optimal strategy for searching the literature for prognostic studies, but search strategies have been developed with either good sensitivity or good specificity 2).
Epidemiological studies are more prone to publication bias than randomised trials.4 It is probable that studies showing a strong (often statistically significant) prognostic ability are more likely to be published. Publication bias has recently been shown in studies of Barrett's oesophagus as a risk factor for cancer.5
Assessing methodological quality—design
There are no widely agreed quality criteria for assessing prognostic studies. As yet there is little empirical evidence to support the importance of particular study features affecting the reliability of findings, including the avoidance of bias. As a consequence, systematic reviewers tend either to ignore the issue or to devise their own criteria. Unfortunately the number of different criteria and scales is likely to continue to increase and cause confusion, as has happened for randomised trials and systematic reviews.6–8 Nevertheless, theoretical considerations and common sense point to several methodological aspects that are likely to be important. The table shows a list of those relating to internal validity, which draws on previous suggestions.9–13
Box 1 —Purpose of studies of prognostic factors (adapted from Altman and Lyman1)
To guide clinical decision making, including treatment selection and patient counselling
To improve understanding of the disease process
To improve the design and analysis of clinical trials (for example, risk stratification)
To assist in comparing outcome between treatment groups in non-randomised studies by allowing adjustment for case mix
To define risk groups based on prognosis
To predict disease outcome more accurately or parsimoniously
Box 2 —Effective strategies for searching Medline for prognostic studies3
Best single term
“Explode cohort studies” (MeSH)
Best complex search strategy with the highest sensitivity
or “explode mortality” (MeSH)
or “follow-up studies” (MeSH)
or “mortality” (subheading)
or “prognos*” (text word)
or “predict*” (text word)
or “course” (text word)
A reliable prognostic study requires a well defined cohort of patients at the same stage of their disease. Some authors suggest that the sample should be an “inception” cohort of patients early in the course of the disease (perhaps at diagnosis).9 Whereas homogeneity is often desirable, heterogeneous cohorts can be stratified in the analysis. Also, not all prognostic studies relate to patients with overt disease. An example is a study of prognostic factors in a cohort of asymptomatic patients infected with HIV.14
Both case-control and cross sectional studies may be used to examine risk factors, but these designs are much weaker. Case-control designs have been shown to yield optimistic results for evaluations of diagnostic tests, a result that is likely to be relevant to prognostic studies.15 In cross sectional studies it may be difficult to determine whether the exposure or outcome came first—for example, in studies examining the association between the use of oral contraceptives and HIV infection.
Windeler observed that summaries of prognosis are not meaningful unless associated with a particular strategy for treatment and suggested that the greatest importance of prognostic studies is to aid decisions about treatment.16 Most published checklists do not, however, consider the issue of subsequent treatment. If the treatment received varies in relation to prognostic variables then the study cannot deliver an unbiased and meaningful assessment of prognostic ability unless the different treatments are equally effective (in which case why vary the treatment?). Such variation in treatment may be quite common once there is evidence (usually non-systematic) that a variable is prognostic. Ideally, therefore, prognostic variables should be evaluated either in a cohort of patients treated the same way or in a randomised trial. 12 17
Criteria specific to studies
The inclusion of context specific as well as generic aspects of methodological quality is sometimes sensible. For example, a review of prognosis of idiopathic membranous nephropathy included two questions on the nature of the end points, reflecting particular problems in a discipline where many studies used ill defined surrogate end points.13
In addition to internal validity some checklists consider aspects of external validity and clinical usefulness of studies. Notably, Laupacis et al included five questions relating to the clinical usefulness of a study.11 Furthermore, some checklists reasonably include items relating to the clinical area of the review. For example, in their review of the association between maternal HIV infection and perinatal outcome, Brocklehurst and French considered whether there was an adequate description of the maternal stage of disease.18
Assessing methodological quality—analysis
The table includes two items relating to difficult aspects of data analysis. It is important to adjust for other prognostic variables because patients with different values of the covariate of primary interest are likely to differ with respect to other prognostic variables. This procedure is often referred to as control of confounding. In contexts where much is known about prognosis, such as many cancers, it is important to know whether the variable of primary interest (such as a new tumour marker) offers prognostic value over and above that which can be achieved with previously identified prognostic variables. It follows that prognostic studies generally require some sort of multiple regression analysis. Comparison of models with and without the variable of interest provides an estimate of its independent effect and a test of whether it contains additional prognostic information.
Two problems for the systematic reviewer are that different researchers use different statistical approaches to adjustment and adjust for different selections of variables. One way around the second of these problems is to use unadjusted analyses. This approach is sensible in systematic reviews of randomised controlled trials, but in prognostic studies it replaces one problem with a worse one. Whereas the least adjusted estimate “provides the maximum opportunity for comparison of consistent estimates across studies,”19 unadjusted analyses will generally be biased.
Many prognostic variables are continuous measurements, including many biochemical and physiological measurements, tumour markers, and levels of environmental exposure. If such a variable were prognostic the risk of an event would usually be expected to increase or decrease systematically as the level increases. Keeping variables continuous can greatly simplify any subsequent meta-analysis, but most researchers prefer to categorise patients into high risk and low risk groups based on some cut-off point. This type of analysis discards potentially important quantitative information and considerably reduces the power to detect a real association with outcome. 20 21 If a cut-off point is used it should not be determined by a data dependent process (such as exploring all cut-off points to find the one that minimises the P value).22
The extraction of data is an additional problem. Some authors do not present a numerical summary of the prognostic strength of a variable, such as a hazard ratio, unless the analysis showed that the effect of that variable was significant. Also, when numerical results are given they may vary in format—for example, survival proportions may be given for different time points.
Box 3 —Problems with systematic reviews of prognostic studies from publications
Difficulty of identifying all studies
Negative (non-significant) results may not be reported (publication bias)
Inadequate reporting of methods
Variation in study design
Most studies are retrospective
Variation in inclusion criteria
Lack of recognised criteria for quality assessment
Different assays or measurement techniques
Variation in methods of analysis
Differing methods of handling of continuous variables (some dependent on data)
Different statistical methods of adjustment
Adjustment for different sets of variables
Inadequate reporting of quantitative information on outcome
Variation in presentation of results (for example, survival at different time points)
Box 3 summarises the particular difficulties for the systematic reviewer of prognostic studies. Two major concerns are the quality of the primary studies and the possibility of publication bias. Because of the likelihood of serious methodological difficulties, in general it is difficult to carry out a sensible meta-analysis without access to the data of individual patients. 2 23 Many authors have concluded that a set of studies was too diverse or too poor (or both) to allow a meaningful meta-analysis. Box 4 summarises a systematic review of prognosis in elbow disorders, which reached such a conclusion. In a systematic review of studies of the possible relation between hormonal contraception and risk of transmission of HIV, Stephenson concluded that a meta-analysis was unwise.24 By contrast, Wang et al performed such a meta-analysis on a similar set of studies, arguing that this enabled the quantitative investigation of the impact of various features of the study.25
Box 4 —Case study: prognosis in elbow disorders
Hudak et al carried out a systematic review of the evidence regarding prognostic factors that affect the duration of elbow pain and outcomes.12 Selected papers were subjected to a detailed quality assessment by using a scheme adapted from other publications. Each paper was assessed on six dimensions: case definition, patient selection, follow up (completeness and duration), outcome, information about prognostic factors, and analysis. Each dimension was scored from 0 to 2 or 3. The authors' prespecified minimum requirements for studies providing strong evidence (number of studies out of 40 meeting the criteria in brackets) were:
Provided an operational definition of cases (15)
Included an inception cohort (defined in relation to onset of symptoms) or a survival cohort that included a subset of patients in whom duration of symptoms was less than 4 months (5)
Showed follow up of more than 80% of cases for at least 1 year (8)
Used a blinded and potentially replicable outcome measure appropriate to the research question (20)
Used adequate measurement and reporting of potential prognostic factors (36)
Provided crude proportions for at least one of response, recovery, and recurrence (34)
Papers were identified from a comprehensive literature search of multiple databases. The authors included the search strategy they used.
Of the 40 eligible studies assessed using the above criteria, none provided “strong evidence” and just four provided “moderate evidence,” none of which followed patients for more than one year. The authors note that several studies with excellent follow up were not based on inception cohorts. Only three of the 40 studies had used a statistical method to derive results adjusted for other factors.
Among the four providing evidence of a “moderate level” there was variation in study design (one case series, three randomised trials), patient selection, interventions, and length of follow up. As a consequence meta-analysis was not attempted. The authors made several suggestions for the methodological requirements for future studies.
Even when a set of published studies is of high quality there are many potential barriers to a successful meta-analysis. In essence it is desirable to compare the outcome for groups with different values of the prognostic variable. In principle it should be relatively easy to combine data from studies that have produced compatible estimates of effect with standard errors. In practice, the lack of comparable information from all studies is likely. In particular, the prognostic variable is likely to have been handled in various ways. In the simplest case, researchers may all have dichotomised but used different cut-off points. A meta-analysis is possible comparing “high” and “low” values, using whatever definition was used in the primary studies. Interpretation is difficult, because patients with the same values would be high in some studies and low in others. (This analysis will be biased if any studies used a cut-off point derived by the minimum P value method.) Studies may use different numbers of categories,26 and some may have categorised whereas others did not. Estimates derived from categorised and ungrouped analyses are not comparable.
Outcome is occurrence of event, regardless of time
As noted, in general it is necessary to allow for other potentially confounding variables in a meta-analysis. When time to an event is not relevant, logistic regression for both binary and continuous prognostic variables is used to derive an odds ratio after adjustment for other prognostic or potentially confounding variables. The adjusted odds ratio and confidence interval can be obtained from an estimated log odds ratio with its standard error. For a binary prognostic variable this odds ratio gives the ratio of the odds of the event in those with and without that feature. For continuous predictors it relates to the increase in odds associated with an increase of one unit in the value of the variable. Estimated log odds ratios from several studies can be combined by using the inverse variance method.27
Outcome is time to event
When the time to event is explicitly considered for each individual in a study, the data are analysed with “survival analysis” methods—most often the log rank test for simple comparisons or Cox regression for analyses of multiple predictor variables or where one or more variables is continuous. By analogy with logistic regression discussed above, these analyses yield hazard ratios, which are similar to relative risks. Log rank statistics and log hazard ratios can be combined using the Peto method or the inverse variance method, respectively.27
Practical difficulties are likely to make meta-analysis more difficult than the preceding explanation suggests. Most obviously the hazard ratio is not always explicitly presented for each study. Parmar et al described several methods of deriving estimates of the necessary statistics in a variety of situations.28 For example, an estimate can be derived from the P value of the log rank test. They also explain how to estimate the standard errors of these estimates.
Several authors have proposed more complex methods for combining data from several studies of survival. 29 30 All can be applied in this context if it is possible to extract suitable data, but some require even more data than the basic items just discussed. The use of sophisticated statistical techniques may be inappropriate when several more basic weaknesses exist in the data. Indeed some reviewers have had to summarise the findings of the primary studies as P values as it is difficult to extract useful and usable quantitative information from many papers.31
The principles of the systematic review should be extended to studies of prognosis, but doing so is by no means straightforward. The literature on prognosis features studies of poor quality and variable methodology, and the difficulties are exacerbated by inadequate reporting of methodology. The poor quality of the published literature is a strong argument in favour of systematic reviews but also an argument against formal meta-analysis. To this end it is valuable if a systematic review includes details of the methodology of each study and its principal numerical results.32
Although meta-analyses of published information may sometimes be useful, especially when the study characteristics do not vary too much and only the best studies are included, the findings are rarely convincing. The main outcome from such systematic reviews may be the realisation that there is little good quality information in the literature. Even an apparently clear result may best be seen as providing the justification for a well designed prospective study.33
By contrast, meta-analysis based on individual patient data is highly desirable. Among several advantages of such data it is possible to analyse all the data in a consistent manner. It may also be possible to include data from unpublished studies. Meta-analysis of the raw data from all (or almost all) relevant studies is a worthy goal, and there have been some notable examples, especially in an epidemiological setting.34 Apart from the considerable resources needed to carry out such a review, in most cases it is likely that many of the data sets are unobtainable. However, a careful collaborative reanalysis of the raw data from several good studies may be more valuable than a more superficial review that mixes good and poor studies. Two examples of such collaborative meta-analyses of raw data are a study of the relation between alcohol consumption and the development of breast cancer and a study of the relation between a vegetarian diet and mortality. 35 36
Poor quality studies may distort the results of a subsequent meta-analysis. When examined critically a high proportion of prognostic studies are found to be methodologically poor. 9 37 Prognostic studies are generally too small and too poorly designed and analysed to provide reliable evidence. Although some suggested guidelines have appeared, 2 13 progress may depend on developing a consensus regarding main methodological requirements for reliable studies of prognostic factors, as has happened for randomised trials. 38 39
As a consequence of the poor quality of research, prognostic markers may remain under investigation for many years after initial studies without any resolution of the uncertainty. Multiple separate and uncoordinated studies may actually delay the process of defining the role of prognostic markers. Systematic reviews can draw attention to the paucity of good quality evidence and, it is hoped, improve the quality of future research.
Series editor: Matthias Egger
Conflict of interest Competing interests: None declared.
Systematic Reviews in Health Care: Meta-analysis in Context can be purchased through the BMJ Bookshop (http://www.bmjbookshop.com/); further information and updates for the book are available on http://www.systematicreviews.com/