Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
BMJ 2008;336:1287-1290 (7 June), doi:10.1136/bmj.39560.759572.BE (published 21 May 2008)
Evangelos Evangelou, research associate1, Georgios Tsianos, research associate1, John P A Ioannidis, professor1
1 Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina 45110, Greece
Correspondence to: J P A Ioannidis jioannid{at}cc.uoi.gr
Design Survey of trials included in systematic reviews of treatments for diverse conditions.
Data sources Cochrane database of systematic reviews.
Data extracted Data on patients global assessments and on doctors global assessment for the same treatment against the same comparator.
Main outcome measures Relative odds ratio (ratio of odds ratios of global improvement with the experimental intervention versus control according to doctors compared with patients), and improvement rates according to doctors and patients.
Results Doctors global assessments were compared with patients global assessments for 63 different treatment comparisons (240 trials) in 18 conditions. The summary relative odds ratio across the comparisons was not significant (0.98, 95% confidence interval 0.88 to 1.08; I2=0%, 95% confidence interval 0% to 30%). In 62 of the 63 comparisons the effects of treatment rated by patients and by doctors did not differ beyond chance, but for single comparisons the confidence intervals were large. Rates of improvement on average did not differ between doctors assessments and patients assessments (summary relative odds ratio 0.98, 0.88 to 1.06; I2=0%, 0% to 24%).
Conclusion Doctors global assessments of the effects of treatments are on average similar to those of patients.
An important question is whether patients and doctors agree in their assessment of treatment outcomes. Self assessment by patients may avoid bias by an external assessor, whereas doctors may be more objective than their patients. Doctors may consider additional aspects of conditions that are not assessable by patients and may have insight into whether patients tend to amplify or minimise symptoms.1 In theory, biases may be more likely when a study does not use blinding of doctors or patients, such as when blinding is impossible or compromised. Moreover, in different circumstances and for different diseases biases may operate differently between patients and doctors—some patients with mental or neurological diseases, for example, may be biased or inaccurate in the appraisal of their condition. Similarly, doctors may be inaccurate when they have few or no objective signs and tests on which to base their observations and have to use primarily patient reported information.
Several studies have evaluated whether global assessment in specific conditions and settings is more appropriately done by patients than by doctors. Some studies suggest that patients opinions do not agree with those of doctors even though they are measuring the same outcome.2 3 4 Other studies, however, showed little difference between self reported assessment and doctors assessment.5 6 Evidence is lacking as to whether differences in appraisals also result in systematic differences in the estimates of treatment effects in clinical trials. For example, a meta-analysis of trials on the interleukin 1 receptor antagonist in rheumatoid arthritis suggested that patient reported outcomes provided more favourable estimates of treatment effects than outcomes reported by doctors.7
We obtained empirical information on the possible extent of discordance between doctors and patients global assessments of treatment effects in clinical trials for various diseases and treatments. We evaluated a sample of systematic reviews of clinical trials where both patients and doctors impressions of global improvement had been used as outcomes to evaluate the same treatment.
We searched the Cochrane Library database using the term "global". We also searched a random sample of 200 Cochrane reviews using the terms "patient assessment" or "clinician assessment" to check that we had not missed possible eligible reviews that did not use the term "global". The retrieved reviews were screened for eligibility, first by examining the tables and figures and, if in doubt, by examining the full text. Eligible reviews could contain more than one comparison with different treatments or comparators. For example, within a review we might assess the global effectiveness of a treatment compared with standard treatment and assess the global effectiveness of the same treatment compared with placebo. We counted and evaluated eligible comparisons. Finally, we searched all Cochrane systematic reviews on diseases where at least three eligible comparisons had already been identified through the search strategy.
In each eligible comparison we recorded the studies that had data on doctors global assessments and those that had data on patients global assessments and noted any overlap. For each of these studies we recorded the year of publication, first author, outcome definition for global change, and the 2x2 tables or the mean difference and standard deviation per arm for global change according to both the doctors and the patients.
Binary and continuous outcomes
We calculated the odds ratio of both doctors and patients assessments and the variances of their natural logarithms. We consistently coined the comparisons to reflect the contrast of the experimental treatment with comparator (placebo, no treatment, other treatment) and consistently to reflect improvement rather than deterioration. This means that when the data reflected the number of patients who deteriorated (for example, 12/30), we took the complementary counts (that is, 18/30); whenever the experimental treatment was better, this was coined to be consistently an odds ratio greater than 1.
We calculated the weighted standardised mean differences of the continuous outcomes and transformed them to odds ratios8 using a formula that incorporates the Hedges g, a measure that quantifies continuous outcomes using standardised mean differences.9 All comparisons were consistently coined as for the binary outcomes.
Analyses
For each comparison we combined the natural logarithms of the odds ratio of both doctors and patients assessments across each of the eligible studies to obtain the summary effect of the odds ratio of assessments for doctors and for patients. Then we compared the ratio of the summary odds ratio of doctors assessments with the summary odds ratio of patients assessments to obtain the relative odds ratio for each comparison. A relative odds ratio exceeding 1 equates to the doctors assessments giving a more favourable response for the experimental treatment than the patients assessments. A relative odds ratio less than 1 equates to the doctors assessments giving a less favourable response for the experimental treatment than the patients assessments. The variance of the natural logarithm of the relative odds ratio is the sum of the variances of the natural logarithms of the odds ratio of the doctors assessments and the odds ratio of the patients assessments.
We combined the estimates of the natural logarithm of the relative odds ratio across all comparisons to obtain the summary natural logarithm of relative odds ratio,10 11 using fixed effects and random effects.12 13 We used the Cochrans Q statistic (considered statistically significant for P<0.10) and the I2 metric to quantify heterogeneity between comparisons in the estimates of the natural logarithm of the relative odds ratio.14 I2 is independent of the number of comparisons and a value of 50% or more reflects sizeable heterogeneity. We also provide 95% confidence intervals for I2 in the main analyses.14 15 In the absence of heterogeneity (I2=0), random and fixed effects coincide.
For the main analysis we considered all eligible comparisons. We also carried out sensitivity analyses, limited to comparisons when all studies had both doctors and patients assessments or to trials that had both doctors and patients assessments. In these situations outcomes are directly paired, so we estimated a natural logarithm of the relative odds ratio for each study before combining these to obtain a summary value.
Furthermore, we carried out subgroup analyses according to condition, with the conditions merged into three categories: musculoskeletal, neuropsychiatric and pychosomatic, and other. Additional subgroup analyses were done according to type of assessment outcome (binary or continuous); whether both doctors and patients were blinded, only doctors were blinded, only patients were blinded, or neither were blinded; and whether the comparison referred to treatment compared with no treatment or placebo or to two active treatments.
Finally, doctors and patients assessments may agree at the level of the relative treatment effect (odds ratio) but may disagree on the absolute proportion of patients who improve in both arms. Therefore we also examined whether the overall proportions showing improvement differed between doctors and patients. We limited these analyses to the set of studies where data on both doctors and patients assessments were available for the same study. For these evaluations we combined both arms (experimental and control) for each type of outcome. For binary outcomes we estimated the total number of patients who had improved among the total of patients in the experimental and control arms combined. For continuous outcomes we estimated a common mean effect and variance, combining the respective measures of the experimental and control arms by fixed effects. Then we estimated the odds ratio of global improvement according to doctors and according to patients. For continuous outcomes we used the Hedges g transformation. We combined the estimates for the natural logarithm of the odds ratio for improvement across studies for each comparison. These summary estimates were then combined across comparisons. This was done in a similar fashion to the natural logarithm of the relative odds ratio.
All analyses were done in Intercooled STATA 8.2. P values are two tailed.
|
Data synthesis
The summary results across the 63 comparisons showed overall agreement for the global estimate of treatment effectiveness between doctors and patients. The summary relative odds ratio was not significant (0.98, 95% confidence interval 0.88 to 1.08) and no significant heterogeneity was observed across the comparisons (I2=0%, 95% confidence interval 0% to 30%; Cochrans Q P=0.99). Treatment effects according to patients and doctors did not differ beyond chance for 62 of the 63 comparisons, whereas for long acting β2 agonists in asthma doctors gave a significantly more favourable appraisal of effectiveness than did patients (relative odds ratio 2.86, 1.48 to 5.55). Most point estimates of relative odds ratios for specific comparisons were close to 1. On the basis of point estimates, the most unfavourable relative perception of doctors global assessment was in the use of methotrexate to treat psoriatic arthritis (relative odds ratio 0.21, 0.02 to 2.44)w16 whereas the most favourable was for the implementation of stress management therapy for post-traumatic stress disorder (relative odds ratio 14, 0.78 to 270).w19
When the analysis was restricted to the 44 comparisons (n=118 studies) with perfect overlap of studies the results were practically identical. The summary relative odds ratio showed no difference between doctors and patients (0.97, 0.87 to 1.09; I2=0%, P for heterogeneity 1.00). For the 17 comparisons with partial overlap (115 studies), data from doctors and patients were available in only some of the trials (n=76). When the analysis concerned the 194 trials that had data from doctors and patients (61 comparisons), the summary relative odds ratio was not significant (0.96, 0.86 to 1.07; I2=0%, P for heterogeneity 0.99).
Subgroup analyses
Despite some trends for more favourable appraisal by patients of effectiveness in musculoskeletal conditions (fig 2
) and neuropsychiatric or psychosomatic conditions (fig 3
) and by doctors in other conditions (fig 4
), the observed differences were not beyond chance (table
). The estimated treatment effects did not differ depending on type of outcome (continuous v binary) or type of comparator.
|
|
|
|
Rates of improvement
Rates of improvement did not differ between doctors and patients assessments (summary relative odds ratio 0.98, 95% confidence interval 0.88 to 1.06; I2=0%, 0% to 24%). This meant that for an improvement rate of 10% according to patients the expected average improvement rate according to doctors would be 9.8% (8.9% to 10.5%) and that for an improvement rate of 40% according to patients the expected average improvement rate according to doctors would be 39.5% (37.0% to 41.4%).
The random effects summary relative odds ratio for improvement for musculoskeletal conditions was 0.95 (0.84 to 1.06, I2=0%), for neuropsychiatric or psychosomatic conditions was 0.91 (0.60 to 1.33, I2=0), and for other conditions was 1.06 (0.89 to 1.22, I2=3%).
Most clinical questions have limited evidence from clinical trials and thus the uncertainty in the estimated treatment effects is often large, when only one topic is examined. By examining a large number of comparisons a more precise average emerges.
The previous literature on patients and doctors appraisals of outcome has dealt mostly with musculoskeletal diseases, along with other conditions such as cancer and asthma.1 2 3 4 5 6 16 17 18 19 20 21 Several studies have focused on the considerable discrepancies between these assessments. For example, patients with cancer rate their health status differently from their doctors, and different doctors can give different ratings for the same patient.20 Doctors may underestimate the needs of patients21 or fail to recognise functional disability.18 Surveys in musculoskeletal diseases have shown that patients and doctors often focus on different aspects of the disease: doctors prefer objective clinical signs or tests whereas patients focus more on their psychological wellbeing.3 4 17 It is impossible to say in each study and case how much patients and doctors focused on wellbeing or on disease activity. Different patients and doctors may have different perspectives. Differences may average out on large samples and the estimated treatment effects may remain unaffected. Nevertheless, differences between patients and doctors assessments may still be important for the management of individual patients or for making a correct diagnosis (for example, patients with rheumatoid arthritis v patients with fibromyalgia).22
Most of the comparisons we analysed were in trials where all assessors of outcome were blinded. In theory, if blinding is not violated then patients and doctors should not be biased in appraising the effectiveness of a treatment. Our results are consistent with this interpretation. The more limited data on circumstances in which blinding was not achieved show non-significant deviations between patients assessments and those of doctors. Nevertheless, for trials where only patients were unblinded we observed mostly trends for less favourable estimates of effectiveness by patients (table
). Thus bias due to lack of blinding was unlikely to lead to more optimistic results.
For many comparisons we found no full overlap of the studies. Therefore we carried out sensitivity analyses only when studies were fully matched. The results were almost identical. We did not, however, have individual level data to examine whether the same or different patients were thought to improve according to patients and doctors.
Finally, concordance between patients and doctors assessments may be better in clinical trials than in everyday practice. The experimental nature of clinical trials may compel doctors to be more careful, meticulous, and comprehensive in assessing patient outcomes, and patients enrolled in clinical trials may be self selected. In all, the average agreement between patients and doctors in our empirical evaluation should not necessarily be interpreted as evidence that one of the two is redundant. For some conditions, such as rheumatoid arthritis, both patients and doctors global assessments are typically used already.16 23 24 In other diseases and trials when only one of the two types of assessment is used, consideration should be given to evaluating both and studying their relative performance in measuring treatment effects. The views of both patients and doctors may offer complementary information in clinical trials and in everyday practice.
|
Contributors: JPAI had the original idea for this project and proposed the design. All authors worked on the protocol. EE and GT extracted the data and JPAI oversaw the collected data and arbitrated on discrepancies. EE did the statistical analysis with help from JPAI. All authors interpreted the data. JPAI and EE wrote the manuscript and all authors revised drafts and approved the final version. JPAI is guarantor.
Competing interests: None declared.
Ethical approval: Not required.
Provenance and peer review: Not commissioned; externally peer reviewed.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?