Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
Published 12 January 2009, doi:10.1136/bmj.a3006
Cite this as: BMJ 2009;338:a3006
Despina G Contopoulos-Ioannidis, assistant professor1,2, Anastasia Karvouni, research fellow3, Ioanna Kouri, research fellow3, John P A Ioannidis, professor3,4
1 Department of Paediatrics, University of Ioannina School of Medicine, Ioannina, Greece, 2 Department of Paediatrics, George Washington University, School of Medicine and Health Sciences, Washington, DC, USA, 3 Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Greece, 4 Institute for Clinical Research and Health Policy Studies, Tufts University School of Medicine, Boston, MA, USA
Correspondence to: J P A Ioannidis, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina 45110, Greece jioannid{at}cc.uoi.gr
Design Systematic review.
Data sources PubMed, contact with authors for missing information, and author survey for unpublished SF-36 data.
Study selection Randomised trials with SF-36 outcomes (the most extensively validated and used health survey instrument for appraising quality of life) that were published in 2005 in 22 journals with a high impact factor.
Data extraction Analyses on the two composite and eight subdomain SF-36 scores that corresponded to the time and mode of analysis of the primary efficacy outcome.
Results Of 1057 screened trials, 52 were identified as randomised trials with SF-36 results (66 separate comparisons). Only eight trials reported all 10 SF-36 scores in the published articles. For 21 of the 66 comparisons, SF-36 results were discordant for statistical significance compared with the results for primary efficacy outcomes. Of 17 statistically significant SF-36 scores where primary outcomes were not also statistically significant in the same direction, the magnitude of effect was small in six, moderate in six, large in three, and not reported in two. Authors modified the interpretation of study findings based on SF-36 results in only two of the 21 discordant cases. Among 100 additional randomly selected trials not reporting any SF-36 information, at least five had collected SF-36 data but only one had analysed it.
Conclusions SF-36 measurements sometimes produce different results from those of the primary efficacy outcomes but rarely modify the overall interpretation of randomised trials. Quality of life and health related survey information should be utilised more systematically in randomised trials.
We considered randomised trials to be eligible that reported on any of the two composite (physical, mental) and eight subdomain SF-36 scores (physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional, mental health). When referral was made to additional separate publications reporting primary efficacy or SF-36 outcomes, these were also retrieved. We also considered trials using SF-12, a shorter version of SF-36 (for composite scores). No restriction was set on disease and compared interventions. Whenever information was not reported on all 10 scores, we asked authors for missing information.
We searched the 22 target journals through PubMed using limits for randomised clinical trial (type of study) and 2005 (year of publication). Identified articles were downloaded in PDF format and screened electronically using Acrobat Reader "Find" tool for keywords: quality of life, SF36, SF 36, SF-36, short form 36, short form-36, SF-12, SF12, mental composite score, physical composite score, medical outcome study, MOS 36, MOS-36, and Ware. Articles passing electronic screening were further evaluated by two independent investigators (AK and IK). Disagreements were resolved by consensus. Remaining disagreements were resolved by DGC-I.
To probe whether SF-36 data may have remained unpublished we communicated (three emails, each sent three weeks apart) with the corresponding authors of 100 trials randomly selected among those not reporting SF-36 data. Selection was based on a list of 100 numbers generated randomly and applied to the 1057 retrieved articles, ordered serially per journal, after excluding the 52 eligible articles.
Data extraction
Data were extracted by three independent investigators (IK, AK, and DGC-I). Discrepancies were resolved by consensus. Remaining disagreements were resolved by JPAI.
From each eligible article we extracted information on authors, journal, design (superiority or non-inferiority), condition, interventions compared, sample size (randomised, analysed for SF-36), definition of primary efficacy outcome (as reported; if not clarified, we selected the outcome used for sample size calculations), time points and statistical analysis for the primary outcome and SF-36 assessments, whether SF-36 was a co-primary outcome, and whether any other quality of life and health related survey scales were used. We also recorded which SF-36 scores were reported and for which we could obtain missing information from authors.
Discordant results
For the primary efficacy outcome and for each of the presented SF-36 assessments we recorded whether the difference between compared arms was statistically significant (P<0.05) favouring the experimental arm, non-statistically significant, or statistically significant favouring the control arm. For trials with more than two arms we considered the comparison of each experimental intervention against control separately. We considered all comparisons and also present results separately for superiority and non-inferiority trials.15
Data on SF-36 outcomes were extracted for the reported analyses that corresponded as closely as possible to the same time points as for primary outcome data. Specifically, when measurements for primary or SF-36 outcomes were carried out at several time points, for primary efficacy outcomes we preferred analyses accounting for multiple measurements (for example, repeated measurement analysis) than analyses of single time points. If the primary outcome was a time to event analysis or incorporated serial longitudinal measurements, we preferred the analysis of serial longitudinal SF-36 measurements; if this was unavailable, we recorded whether there was formal statistically significant difference at any time points when SF-36 had been appraised. When the primary outcome was appraised at a single time point, we recorded the SF-36 outcomes at the single same (or closest) time point. In two comparisons where co-primary outcomes existed and could not be prioritised, we based the evaluation of statistical significance on overall authors interpretation.
We considered SF-36 results as statistically significant when at least one of the composite or subdomain scores showed a statistically significant result in favour or against the experimental intervention. There were no situations where some of the specific SF-36 scores were significant for the experimental intervention and others were significant against.
For statistically significant SF-36 effects when the respective primary efficacy outcome was discordant, we extracted information on the effect size of SF-36. Roughly, standardised mean differences of less than 0.30 standard deviations are small effects, 0.30-0.80 are moderate, and more than 0.80 are large.16 17 18 19 20 The corresponding cut-offs for raw scores are less than 4, 4-10, and more than 10 points.
For comparisons with discordant statistical significance on SF-36 and primary outcome results, we recorded whether the authors had discussed the SF-36 results at all, whether they commented on the discrepancy and if so with what arguments, and if SF-36 findings changed the interpretation of the trial results.
|
Concordance of results
Of the 66 comparisons, 21 (32%) had discordant statistical significance for primary efficacy and SF-36 results (table 1
). Moreover, of the 56 comparisons of superiority trials 19 had discordant primary efficacy and SF-36 results (see web extra fig 2).
|
In the 13 discordant comparisons with only SF-36 significant results (nine comparisons in favour and four against the experimental intervention; in seven trialsw7 w14 w21 w31 w44 w46 w47 and three trials,w15 w43 w51 respectively) there were 17 statistically significant specific scores (five normalised, 10 raw, two reporting only statistical significance without effect size); effect sizes were small in six, moderate in six, and large in three.
Interpretation of trial findings in discordant settings
Improved primary outcome only—SF-36 results did not modify the trials interpretation of these 11 comparisons (eight trials, table 2
).w4 w12 w16 w18 w41 w42 w43 w51 In five comparisons (four trials), SF-36 outcomes were only tabulated or alluded to in the results, without further discussion.w12 w16 w18 w42 In the other four trials the authors focused on other non-primary outcomes,w4 claimed that SF-36 was not sensitive enough to detect improvements,w41 adopted a non-intention to treat analysis for SF-36 with significant results,w43 or dismissed the importance of the negative effects on SF-36 in the face of benefits in disease-free survival.w51
|
Improved SF-36, non-inferiority on primary outcome—SF-36 did not modify the interpretation of these two trials.w46 w47 Both trials already concluded favourably for the experimental intervention that achieved the desired non-inferiority, and in one of themw47 the observed benefit in SF-36 was considered possibly due to chance.
Only SF-36 worsened—In one trialw15 where SF-36 worsened with the experimental intervention, the investigators interpreted the results as showing no consistent differences in quality of life, because an additional instrument (EQ5D) showed no significant differences.
Probing unpublished data
Authors of 69 of 100 additional randomly selected trials responded. SF-36 data had actually been collected from five trials. The data had been analysed for only one trial and did not show any statistically significant differences for SF-36 or the primary efficacy outcome.
In most trials for chronic conditions, quality of life and surveys of health status are useful to consider. SF-36 was reported in fewer than 5% of the trials we screened, and our author survey suggested that some additional trials (at least five of 100) had collected information on SF-36 but without analysing or publishing it two or three years after the publication of the main trial results. Quality of life seems to remain undervalued in clinical research: few trials collect quality of life related data, fewer report on them, data are only partially presented, and quality of life rarely affects the trial interpretation.
We should acknowledge some caveats. Firstly, by selecting high impact journals we identified trials with high visibility and probably also high quality.21 It is unlikely that this strategy would have selected for discordant results between outcomes. Secondly, selective analysis and reporting bias may affect primary outcomes and not just SF-36,22 23 24 25 but this should not have increased the perceived rate of discrepancies between outcomes. Thirdly, discordance at the level of statistical significance does not necessarily mean that results for different outcomes differ beyond chance. Among statistically significant results, chance findings and non-clinically important differences are possible, and primary outcomes should be given more weight in the discussion than secondary outcomes. Given that trials are typically powered to address the primary outcome, a significant result in the primary outcome with a non-significant result in quality of life or health survey assessments may sometimes simply reflect lack of power for the quality of life or health survey outcome. Therefore we also examined the SF-36 effect sizes and the circumstances and discussion of discordant results. Fourthly, we did not carry out the same in-depth evaluation for trials where efficacy and SF-36 outcomes were concordant. It is unlikely that authors would then have modified their inferences, but SF-36 may have strengthened the conclusions. Finally, we did not examine trials using only other quality of life or health survey instruments beyond SF-36. However, SF-36 is the most robustly standardised and widely used one, and we wanted to maximise comparability. Although other scales may also be used, one study found that only 4.2% of trials reported any quality of life outcome.2
Although quality of life and health survey scales have been used in clinical trials for over 25 years, several issues remain debated.26 Besides problems of fragmented, selectively reported information, it is sometimes impossible to say whether and which analyses are based on a priori analytical plans.11 27 Proper attention to the importance of these outcomes should be given in clinical trials. Otherwise, with a growth in the clinical trials administrative paperwork,28 outcomes such as SF-36 may become routine compulsory assessments without a genuine interest to learn from them.
Overall, quality of life and health survey assessments provide a different window into patient outcomes and deserve to be included in more trials with complete reporting of results, and standardised interpretation. Unbiased data on these outcomes may enhance our ability to improve clinical decision making.
|
Cite this as: BMJ 2009;339:a3006
Contributors: JPAI conceived the study and is guarantor. All authors designed the protocol, analysed and interpreted the data, and approved the final manuscript. DGC-I, IK, and AK collected the data. DGC-I and JPAI drafted the manuscript. IK and AK critically revised the manuscript for important intellectual content.
Competing interests: None declared.
Ethical approval: Not required.
Provenance and peer review: Not commissioned; externally peer reviewed.
K, Schmid I, Altman DG. Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research. CMAJ 2004;171:735-40.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
StumbleUpon
Technorati What's this?
Read all Rapid Responses