# Inconsistency between direct and indirect comparisons of competing interventions: meta-epidemiological study

BMJ 2011; 343 doi: https://doi.org/10.1136/bmj.d4909 (Published 16 August 2011) Cite this as: BMJ 2011;343:d4909## All rapid responses

*The BMJ*reserves the right to remove responses which are being wilfully misrepresented as published articles.

We disagree with Sabrina Trippoli's suggestion that findings from our

study [1] could be more optimistically interpreted regarding consistency

between direct and indirect comparisons [2]. The fact is that the

prevalence of statistically significant inconsistency (14%) was much

higher than expected by chance or the previously observed [1] Although the

appropriate use of indirect comparison methods may provide useful evidence

on the comparative effectiveness of different interventions, it is

important to avoid misleading results from inappropriate use of these

methods.

In studies of effectiveness of interventions (including individual

trials, pair-wise and network meta-analyses), outcomes are usually

measured using relative risk, odds ratio, or risk difference. As shown in

the example provided by Messori, Fadda, Maratea and Trippoli, the number

needed to treat may also be used to measure the relative effect of

competing interventions [3].

We thank Ades, Dias and Welton for raising an interesting issue about

the between-study variance in pair-wise and multiple treatment meta-

analyses [4]. We used random-effects model to combine results of multiple

individual studies [1]. However, the between-study variance cannot be

estimated when there is only a single study in analysis. Although taking

into account between-study variance when there is only singleton trials

may be theoretically sound, this seems to have rarely been done in

practice.

Following Ades et al's helpful suggestion, we re-analysed data from

16 trial networks with statistically significant inconsistency by

considering assumed between-study variance for singleton trials. The

between-trial variance for singleton trials was assumed to be equal to the

average between-trial variance for other treatment contracts in the trial

network. In a trial network with three singleton trials (CD005149), the

average between-trial variance was estimated by assuming I2=70%. The

results of the sensitivity analysis are shown in Table below. Of the 16

trial networks with significant inconsistency, three became statistically

non-significant. The overall proportion of inconsistency is therefore

13/112 (12%, 95% CI: 7% to 19%), which remains "more prevalent than

previously observed"

In the original analysis, the within-study variance for singleton

trials was often greater than the total variance (within-study variance

plus between-study variance) for multiple trials in trial networks. In

most cases, therefore, significant inconsistency in networks with

singleton trials cannot be explained by "artificially lowered" standard

errors. While the "false-positive" inconsistency may be a concern, it is

also likely that the prevalence of significant inconsistency between

direct and indirect estimates may have been under-estimated because of

inadequate statistical power (see Figure 2 in Song et al[1]).

We agree with Ades et al that "the details of the inclusion criteria

and interventions" should be carefully checked before any research

synthesis [4]. However, there is considerable subjectivity involved in

making judgements about clinical similarity amongst trials included in

systematic reviews. We have proposed a framework to delineate basic

assumptions underlying indirect and mixed treatment comparisons, which

consists of homogeneity assumption for conventional meta-analysis, trial

similarity assumption for adjusted indirect comparison, and evidence

consistency assumption for mixed treatment comparison [5]. The fulfilment

of only one or two assumptions may not guarantee a valid indirect or mixed

treatment comparison. For example, results of separate pair-wise meta-

analyses may be interpretable, while the indirect comparison based on

these pair-wise meta-analyses may not. Similarly, the result of indirect

comparison may be interpretable but it may not be necessarily consistent

with the result of head-to-head comparison trials [5]. In addition, it is

even possible that the results of biased indirect comparisons may be

consistent with the results of similarly biased direct comparisons.

There is a clear need for estimating comparative effectiveness of

competing interventions when evidence from direct comparison trials is

insufficient, so that indirect and mixed treatment comparisons have been

increasingly used. However, inappropriate use of indirect and mixed

treatment comparison may provide misleading results with important

implications to patients and population health. Further methodological

research is still required to promote more appropriate use of indirect

comparison and network meta-analysis [6].

Therefore, we believe it is appropriate for us to conclude that

inconsistency between direct and indirect comparisons may be more

prevalent than previously observed [1]. The validity of all statistical

models for network meta-analysis depends on certain basic assumptions [5,

7]. To correctly interpret results of indirect and mixed treatment

comparisons, it is crucial for researchers, clinicians, and other decision

makers to understand and carefully check these basic assumptions.

References:

1. Song F, Xiong T, Parekh-Bhurke S, Loke YK, Sutton AJ, Eastwood AJ,

et al. Inconsistency between direct and indirect comparisons of competing

interventions: meta-epidemiological study. BMJ 2011;343:d4909.

2. Trippli S. Is the title consistent with the results? Rapid

Responses to "Inconsistency between direct and indirect comparisns of

competing interventions: meta-epidemiological study, BMJ 2011, 343:

d4909", 2011.

3. Messori A, Fadda V, Maratea D, Trippli S. Simplified figure to

present the results of indirect comparisons: re-visitation based on the

number needed to treat. Rapid Responses to "Inconsistency between direct

and indirect comparisns of competing interventions: meta-epidemiological

study, BMJ 2011, 343: d4909", 2011.

4. Ades AE, Dias S, Welton NJ. Song et al have not demonstrated

inconsistency between direct and indirect comparisons. Rapid Responses to

"Inconsistency between direct and indirect comparisns of competing

interventions: meta-epidemiological study, BMJ 2011, 343: d4909", 2011.

5. Song F, Loke YK, Walsh T, Glenny AM, Eastwood AJ, Altman DG.

Methodological problems in the use of indirect comparisons for evaluating

healthcare interventions: a survey of published systematic reviews. BMJ

2009;338:b1147 doi:10.113/bmj.b1147.

6. Li T, Puhan MA, Vedula SS, Singh S, Dickersin K. Network meta-

analysis-highly attractive but more methodological research is needed. BMC

Med 2011;9:79.

7. Jansen JP, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N,

et al. Interpreting indirect treatment comparisons and network meta-

analysis for health-care decision making: report of the ISPOR Task Force

on Indirect Treatment Comparisons Good Research Practices: part 1. Value

Health 2011;14(4):417-28.

**Competing interests: **
We are the authors of the paper discussed

**26 October 2011**

Song et al carry out an empirical assessment of consistency between

direct and indirect evidence in 112 comparison networks [1], using the

Bucher method [2]. They report that direct and indirect estimates were

significantly different at p<0.05 in 16 cases (14%), rather than the 5

or 6 (5%) that would be expected by chance. They conclude "inconsistency

between direct and indirect comparisons may be more prevalent than

previously observed". However, their analysis is seriously flawed.

Each of the 112 comparison networks consists of a "triad" of A vs B,

B vs C, and A vs C trials. Among the 112 triads, 33 contained 1 contrast

consisting of a single trial, 21 contained 2 contrasts with singleton

trials, and in 9 all three contrasts had singleton trials. When there was

more than one trial on a contrast, Song et al use the Random Effects (RE)

method of DerSimonian and Laird [3], but with singleton trials they

effectively assumed a Fixed Effect model.

This is not a reasonable view of the data. If we are to believe that

the true effects of the AB and AC trials are drawn from a RE distribution,

why should we believe that the true effects on BC trials would all be

identical across studies, just because only one trial has been carried

out? The effect of this is to under-estimate the variance of the

"inconsistency", and hence to increase the "false-positive" detection of

inconsistency, so it ends up being greater than expected by chance. The

risk of such false positive inconsistencies will be greater (a) when the

degree of heterogeneity is highest and (b) when the number of trials in

the triad is low (and so more likely to include singletons), exactly as

they report. Matching up the information provided in Appendix 3 with

Figure 2, it appears that, out of the 16 significant inconsistencies

detected, 11 involved at least one contrast with a singleton trial.

It would be interesting to see a re-analysis of the data in which the

variances were not artificially lowered. For example the authors might

assume that the true effects of singleton trials are drawn from a

distribution with a standard deviation equal to the average of the between

-trial standard deviations of the other treatment contrasts in the

network. For triads consisting of 3 singleton trials, they might just use

an average variance from the entire dataset for each contrast.

Estimating the between-studies variance in RE meta-analyses is

recognised to be a difficult problem in both classical and Bayesian

statistics: estimation of variances in networks is particularly complex ,

because under the null hypothesis (H0) of "consistency" that Song et al

have set out to test, the true variances have to conform to special

relationships known as "triangle inequalities" [4]. For example, in a

triad of comparisons, under this H0, it is not possible to have zero

variances for two of the comparisons, but a non-zero variance for the

third. Similarly, if one comparison has zero variance, then that implies

that the variances of the other two must be equal. Indeed, even in the

absence of singleton trials, there is always a risk of breaking triangle

inequalities if each contrast is analysed independently of the others,

especially when there is a zero variance estimate. A strictly fair test of

the consistency hypothesis cannot be constructed unless the triangle

inequalities are followed.

But even if the technical difficulties were absent, Song et al's

findings would tell us more about the reliability of the systematic

reviews included in their study than about the reliability of indirect

comparisons. It is not difficult to find systematic reviews where trials

on completely different patient populations, or even treatments, are

pooled together. This is a common practice, which we do not recommend [5,

6] - not just because it is likely to lead to inconsistency in network

synthesis, but primarily because it can only produce uninterpretable

pooled effects! Without checking the details of the inclusion criteria and

interventions, it is hard to say whether the trials included in the

reviews form a reasonable basis for any kind of synthesis.

Network meta-analysis is widely used to draw coherent conclusions

about relative efficacy when there are more than 2 treatments for a given

group of patients. The method has occasionally been criticised, but as far

as we know, no alternatives have been proposed.

References

1. Song F, Xiong T, Parekh-Bhurke S, Loke YK, Sutton AJ, Eastwood

AJ, et al. Inconsistency between direct and indirect comparisons of

competing interventions: meta-epidemiological study. BMJ 2011;343:d4909.

2. Bucher HC, Guyatt GH, Griffith LE, Walter SD. The Results of

Direct and Indirect Treatment Comparisons in Meta-Analysis of Randomized

Controlled Trials. Journal of Clinical Epidemiology 1997;50:683-691.

3. DerSimonian, R. and Laird, N. Meta-analysis of clinical trials.

Controlled Clinical Trials 1986;7:177-188.

4. Lu G, Ades AE. Modelling Between-trial Variance Structure in

Mixed Treatment Comparisons. Biostatistics 2009;10:792-805.

5. Caldwell DM, Gibb DM, Ades AE. Validity of indirect comparisons

in meta-analysis.[Letter]. Lancet 2007;369:270.

6. Caldwell DM, Ades AE, Higgins JPT. Simultaneous comparison of

multiple treatments: combining direct and indirect evidence. BMJ

2005;331:897-900.

**Competing interests: **
The authors all work on methodology for indirect comparisons and network meta-analysis and run courses and workshops on the subject.

In the past issues of the BMJ, there

has been a lively debate on evidence-based methods which has been focused on

the need to prefer absolute risks (ARs) as opposed to relative risks (RRs)

[1,2] and on the evaluation of strengths and limits of indirect comparisons[3].

These two topics have one thing in common because indirect comparisons (as well

as all types of meta-analysis in general) are typically based on RRs (or

odds-ratios) and not on ARs or values of number need to treat (NNT).

In March 2011, we observed that the results of network meta-analysis (or, better, of adjusted

indirect comparisons according to the definition by Song et al. [3]) could be

summarised in a simple figure to increase their communicative value [4].

Thereafter, this method based on our `simplified

figure` has found a positive acceptance both in our national journals (e.g.[5])

and in international ones [6-10].

We present a revisited version of

our simplified figure in which the values of RR are replaced by the

corresponding values of NNT. This revisited figure is described using a

previously published example about oral anticoagulants for atrial fibrillation

[5]. The revisited analysis is presented in Figure 1; Panel A shows the results

based on RRs while Panel B shows the same results based on the values of NNT.

The end-point for these analyses is the occurrence of pulmonary or systemic

embolism or stroke. In Panel B, we used AR_{control group}=505/13112

which is the overall crude event rate observed in the two control groups of the

two trials evaluating dabigatran and rivaroxaban._{}

From a computational point of view,

it is well known that

NNT = 1 / (AR_{control group}-AR_{treatment
group})

and

RR_{TvC} = AR_{treatment group}/ AR_{control
group}

where that TvC indicates the

comparison of treatment group vs control group.

Consequently

AR_{treatment group} = RR_{TvC}

x AR_{control group }

and, finally,

NNT = 1 / (AR_{control group}-

RR_{TvC} x AR_{control group} ) =

= 1 / [AR_{control group }x

(1- RR_{TvC} )]

As shown by this latter equation, a

RR can be converted into a NNT provided that the AR is available for at least

one of the two groups being compared and, typically, for the control group. In

a network meta-analysis, if the value of AR for the control group is known, all

calculations proceed in chain based on this value and no other values of AR are

needed.

The example concerning oral

anticoagulants for atrial fibrillation is interesting because Panel B of Figure

1 stresses that all values of NNT are greater than 70 while the upper limit of

the 95%CIs in some cases reaches extremely high values (e.g. nearly

between dabigatran 150 mg/day vs. rivaroxaban 20 mg/day). Clearly, the results

of this analysis shown in Panel B are much less impressive than those suggested

by Panel A. While it has recently been observed that a new era has started for

anticoagulation in atrial fibrillation[11], our Panel B instead stresses that

the clinical relevance of the incremental benefits in this area is not as large

as one could think. Also the apixaban trial published a few days ago [12]

generates a NNT of 163 (95%CI: 101 to 685; same primary end-point , i.e. pulmonary or systemic embolism or stroke, as that assessed in Figure 1).

In conclusion, one lesson that can

be learnt from these examples is that also network meta-analysis can benefit

from an increased use of NNT. Some computational questions still need to be

fully addressed in applying the NNT to network meta-analysis; for example, in

our analysis on oral anticoagulants two controversial points are whether or not

the values of NNT require adjustment for the different durations of the

follow-up and, more importantly, which event rate for the control group is the

most appropriate.

**References**

1. McCartney M. Observations:

Medicine and the Media: The press release, relative risks, and the polypill BMJ

2011;343:doi:10.1136/bmj.d4720.

2. Penston J. Relative risks and the

polypill: Might BMJ lead crusade against reporting only relative risks? BMJ

2011;343:doi:10.1136/bmj.d5210.

3. Song

F, Xiong T, Parekh-Bhurke S, et al. Inconsistency

between direct and indirect comparisons of competing interventions:

meta-epidemiological study. BMJ 2011;343:d4909.

4. Fadda V, Maratea D, Trippoli S,

Messori A. Network meta-analysis. Results can be summarised in a simple figure.

BMJ

2011;342:d1555.

5.

Fadda V, Maratea D, Passaro D, Trippoli S. Metanalisi a rete o network meta-analysis, un nuovo strumento di analisi delle evidenze. Esempi di

applicazione. Boll. SIFO 2011;57:8-12.

6.

Passaro D, Fadda V, Maratea D, Messori A. Anti-platelet treatments in acute

coronary syndrome: simplified network meta-analysis. Int

J Cardiol 2011, Jun 6.

7. Messori A, Del Santo F, Maratea

D. First-line treatments for hepatitis C. Aliment Pharmacol Ther. 2011

Jun;33(12):1383-5.

8. Fadda V, Maratea D, Trippoli S,

Messori A. Treatments for macular degeneration: summarising evidence using

network meta-analysis. British Journal of Ophthalmology

9. Messori A, Del Santo F, Maratea D,

Trippoli S. Single-drug treatments for chronic hepatitis B: summarising current

information on effectiveness by network meta-analysis. Aliment Pharmacol Ther.

2011 Jul;34(1):105-7. doi: 10.1111/j.1365-2036.2011.

10. Maratea D, Fadda V, Messori A,

Trippoli S. Thromboprophylaxis after hip or knee arthroplasty: indirect

comparison between three new oral anticoagulants. J Thromb Haemost 2011 Jun 28.

doi: 10.1111/j.1538-7836.2011.

11. Mega JL. A new era for anticoagulation in atrial

fibrillation.N Engl J Med online first,

(10.1056/NEJMe1109748)

12. Granger CB, Alexander JH, McMurray JJV, et al. Apixaban versus warfarin in patients with atrial fibrillation. N Engl J

Med, online first,

28, 2011 (10.1056/NEJMoa1107039)

Competing interests:

None declared

**Competing interests: **
No competing interests

The title of the paper by Song et al. [1] could be much more optimistic; for example: "*86% agreement between direct and indirect comparisons of competing healthcare interventions*".

**References**

1. Song F, Xiong T, Parekh-Bhurke S, et al. Inconsistency between direct and indirect comparisons of competing interventions: meta-epidemiological study. BMJ 2011;343:d4909

**Competing interests: **
No competing interests

**17 August 2011**

## Re: Inconsistency between direct and indirect comparisons of competing interventions: meta-epidemiological study

NOTE: After the BMJ website has been re-designed and subjected to technical changes, our Rapid Response published on 31 August 2011 has lost its link to Figure 1 and has become difficult to read. For this reason, we re-submit our response according to the format of the new website.In the past issues of the BMJ, there has been a lively debate on evidence-based methods which has been focused on the need to prefer absolute risks (ARs) as opposed to relative risks (RRs) [1,2] and on the evaluation of strengths and limits of indirect comparisons[3]. These two topics have one thing in common because indirect comparisons (as well as all types of meta-analysis in general) are typically based on RRs (or odds-ratios) and not on ARs or values of number need to treat (NNT).

In March 2011, we observed that the results of network meta-analysis (or, better, of adjusted indirect comparisons according to the definition by Song et al. [3]) could be summarised in a simple figure to increase their communicative value [4]. Thereafter, this method based on our `simplified figure` has found a positive acceptance both in our national journals (e.g.[5]) and in international ones [6-10].

We present a revisited version of our simplified figure in which the values of RR are replaced by the corresponding values of NNT. This revisited figure is described using a previously published example about oral anticoagulants for atrial fibrillation [5]. The revisited analysis is presented in Figure 1; Panel A shows the results based on RRs while Panel B shows the same results based on the values of NNT. The end-point for these analyses is the occurrence of pulmonary or systemic embolism or stroke. In Panel B, we used AR

_{control group}=505/13112 which is the overall crude event rate observed in the two control groups of the two trials evaluating dabigatran and rivaroxaban.From a computational point of view, it is well known that

NNT = 1 / (AR

_{control group}-AR_{treatment group})and

RR

_{TvC}=AR_{treatment group}/ AR_{control group}where TvC indicates the comparison of treatment group vs control group.

Consequently

AR

_{treatment group}= RR_{TvC }x AR_{control group }and, finally,

NNT= 1 / (AR

_{control group}- RR_{TvC}x AR_{control group}) == 1 / [AR

_{control group }x (1- RR_{TvC})]As shown by this latter equation, a RR can be converted into a NNT provided that the AR is available for at least one of the two groups being compared and, typically, for the control group. In a network meta-analysis, if the value of AR for the control group is known, all calculations proceed in chain based on this value and no other values of AR are needed.

The example concerning oral anticoagulants for atrial fibrillation is interesting because Panel B of Figure 1 stresses that all values of NNT are greater than 70 while the upper limit of the 95%CIs in some cases reaches extremely high values (e.g. nearly 1000 in the comparison between dabigatran 150 mg/day vs. rivaroxaban 20 mg/day). Clearly, the results of this analysis shown in Panel B are much less impressive than those suggested by Panel A. While it has recently been observed that a new era has started for anticoagulation in atrial fibrillation[11], our Panel B instead stresses that the clinical relevance of the incremental benefits in this area is not as large as one could think. Also the apixaban trial published a few days ago [12] generates a NNT of 163 (95%CI: 101 to 685; same primary end-point , i.e. pulmonary or systemic embolism or stroke, as that assessed in Figure 1).

In conclusion, one lesson that can be learnt from these examples is that also network meta-analysis can benefit from an increased use of NNT. Some computational questions still need to be fully addressed in applying the NNT to network meta-analysis; for example, in our analysis on oral anticoagulants two controversial points are whether or not the values of NNT require adjustment for the different durations of the follow-up and, more importantly, which event rate for the control group is the most appropriate.

References1. McCartney M. Observations: Medicine and the Media: The press release, relative risks, and the polypill BMJ 2011;343:doi:10.1136/bmj.d4720.

2. Penston J. Relative risks and the polypill: Might BMJ lead crusade against reporting only relative risks? BMJ 2011;343:doi:10.1136/bmj.d5210.

3. Song F, Xiong T, Parekh-Bhurke S, et al. Inconsistency between direct and indirect comparisons of competing interventions: meta-epidemiological study. BMJ 2011;343:d4909.

4. Fadda V, Maratea D, Trippoli S, Messori A. Network meta-analysis. Results can be summarised in a simple figure. BMJ 2011;342:d1555.

5. Fadda V, Maratea D, Passaro D, Trippoli S. Metanalisi a rete o network meta-analysis, un nuovo strumento di analisi delle evidenze. Esempi di applicazione. Boll. SIFO 2011;57:8-12.

6. Passaro D, Fadda V, Maratea D, Messori A. Anti-platelet treatments in acute coronary syndrome: simplified network meta-analysis. Int J Cardiol 2011, Jun 6.

7. Messori A, Del Santo F, Maratea D. First-line treatments for hepatitis C. Aliment Pharmacol Ther. 2011 Jun;33(12):1383-5.

8. Fadda V, Maratea D, Trippoli S, Messori A. Treatments for macular degeneration: summarising evidence using network meta-analysis. British Journal of Ophthalmology June 16, 2011;10.1136/bjophthalmol-2011-300316.

9. Messori A, Del Santo F, Maratea D, Trippoli S. Single-drug treatments for chronic hepatitis B: summarising current information on effectiveness by network meta-analysis. Aliment Pharmacol Ther. 2011 Jul;34(1):105-7. doi: 10.1111/j.1365-2036.2011.

10. Maratea D, Fadda V, Messori A, Trippoli S. Thromboprophylaxis after hip or knee arthroplasty: indirect comparison between three new oral anticoagulants. J Thromb Haemost 2011 Jun 28. doi: 10.1111/j.1538-7836.2011.

11. Mega JL. A new era for anticoagulation in atrial fibrillation.N Engl J Med online first, August 28, 2011 (10.1056/NEJMe1109748)

12. Granger CB, Alexander JH, McMurray JJV, et al. Apixaban versus warfarin in patients with atrial fibrillation. N Engl J Med, online first, August 28, 2011 (10.1056/NEJMoa1107039)

Figure 1.Original and revisited versions (Panels A and B, respectively) of the `simplified` figure for presenting the results of direct and adjusted indirect comparisons. The clinical material is derived from two randomized clinical trials (Clin Trials Gov codes: NCT00403767 and NCT00262600); all analyses are based on the occurrence of pulmonary or systemic embolism or stroke in patients with atrial fibrillation. Both graphs show three direct comparisons (solid lines) and two indirect comparisons (dotted lines). Superiority is found in the direct comparison of dabigratran 150 mg/d vs. standard treatment as well as in the head-to-head comparison of dabigratran 150 mg/d vs rivaroxaban. Symbols: + , more effective at statistical level of p<0.05; -, less effective at statistical level of p<0.05; =, no difference; t, indicates which treatment is favoured by a trend in cases of no difference. Panel A has been reproduced, with modifications, from reference [5]; in Panel B, each comparison is associated with its respective value of NNT (with 95% CIs). Cases where the upper limit of the 95% CI for NNT extends to infinity did not reach the statistical level of p<0.05 while the others showed a significant difference at this level.Competing interests:No competing interests02 July 2012