Commentary: Outcome measures were flawed

BMJ 2010; 340 doi: (Published 03 June 2010) Cite this as: BMJ 2010;340:c2693
  1. G C Ebers, Action Research professor of clinical neurology
  1. 1University of Oxford, John Radcliffe Hospital, Oxford OX39DU
  1. George.Ebers{at}

    Interferons were introduced for multiple sclerosis in the early 1990s, after US-Canadian trials showed effects on clinical relapse rate and magnetic resonance imaging (MRI) spots, which were taken as surrogate outcomes for disability.1 The drug companies who marketed the interferons, and later glatiramer acetate, were given extended patent protection under the Orphan Drug Act. Under the terms of this act surrogate markers of response to treatment can be relied on if experts certify their validity. The lack of data on hard outcomes of disability, such as the need to use a stick or wheelchair, was accepted because multiple sclerosis is a 30-40 year disease, with only half of those affected becoming moderately disabled in a decade, and keeping trials intact beyond a few years proved difficult.

    Many specialists thought the visually obvious spots on MRI “were the disease.” As a result MRI scanning soon became indispensable for multiple sclerosis trials and individual high profile MRI centres capitalised on lucrative contracts with industry. Over the next two decades, little effort was made to validate the suppression of MRI spots against hard disease outcomes. Amid the enthusiasm for short term MRI monitoring of the impact of interferons, their lack of impact on long term disability (despite suppression of MRI spots) in patients with secondary progressive disease was ignored.2

    The National Institute for Health and Clinical Excellence (NICE) rejected interferons and glatiramer acetate for NHS use on the basis of a pharmacoeconomic analysis that showed, even with best case scenarios for surrogate outcomes, the price of the drugs would exceed guidelines for efficacy in terms of price per quality adjusted life years.1 3 4 This largely circumvented any debate about whether the drugs were actually effective. Nevertheless, interested parties, including patient groups and charities, were up in arms because there seemed to be a consensus that the drugs worked but were too expensive. The government then made a political decision to make the drugs available within the risk sharing scheme.

    Validity of outcome measures

    At around the time the scheme was launched, my group’s independent analysis of data from the placebo arms of 31 large clinical trials found that the pivotal outcomes to be used in the scheme, including short term disability scores and relapse rates, were not valid surrogates.5 With no improvement in the treated arms within these original trials, efficacy hinged on preventing the worsening seen in those receiving placebo. The trials had defined disability progression as increases of 0.5 to 1 points on a standardised disability scale (Kurtzke) confirmed at 3-6 months, a measure that is clearly subject to—and jaw droppingly within—inter-rater variability.6 We found that patients in the placebo group improved just as often as they worsened, by amounts equivalent to the clinical criteria for treatment failure. It was thus evident that what was being measured was random variation and measurement error in imperfectly blinded studies.5

    So if the disability measures were not measuring disability, what about the MRI changes? Multivariate analysis of data from the placebo arms found that changes in the MRI spots made no independent contribution to end of trial outcome; the effect of the changes was accounted for by clinical features such as duration of disease—something that can be measured at no cost.7

    Thus the only outcome measure that remained was relapse frequency—and this was unambiguously reduced in patients undergoing treatment in the risk sharing scheme. However, total relapse numbers do not predict the time to disability or death. Although relapse frequency in the first two years after onset has some association with long term outcome,8 participants in the pivotal trials of interferon and glatiramer acetate had disease durations of several years.1 3

    Long term data

    Although the drugs have been licensed for 20 years, we still have no clear evidence on long term outcome. This is because the FDA failed to tie approval of interferon to mandatory follow-up for hard outcomes in the original trial patients, as had been suggested, and did not enforce its requirement for validation of the original surrogates.9 The agency, beset by aggressive criticism from Congress, shifted the onus for gauging effectiveness onto practising physicians. Many US and EU physicians failed to perceive the added responsibility. A Cochrane review had conceded very short term efficacy only. 10 Evidence from more recent long term studies is not definitive and ascertainment is suboptimal.11 12 13

    The sobering interim findings from the risk sharing scheme spotlight important broad issues about the importance of determining efficacy in chronic diseases, in particular, the methods required to do this. The scheme also emphasises the “fragility” of adopting surrogate outcomes of impact. Shortcuts are wanted and needed, but measures with face validity and formal validation remain essential.

    The risk sharing scheme lacks randomisation, parallel control groups, and blinding of patient or examiner. It is completely reliant on data on natural course from a previous generation. That said, it may be possible to evaluate hard outcomes over time, and Boggild and colleagues are right that it is too early to make conclusions about efficacy.14 Who could have predicted that trials exposing patients to substantial risks would continue for two decades using the same unvalidated surrogates?

    More generally, the scheme’s findings raise questions about industrial-academic relationships and their governance. The scheme may have been well intentioned, but perhaps the public interest would be served by an independent inquiry. As McCabe and colleagues emphasise, expenditures of £0.5bn entail opportunity costs that must now be weighed against any benefit of these therapies.


    Cite this as: BMJ 2010;340:c2693


    • doi:10.1136/bmj.c1786
    • doi:10.1136/bmj.c1672
    • doi:10.1136/bmj.c2707
    • Competing interests: The author has completed the unified competing interest form at (available on request from the corresponding author) and declares (1) no financial support for the submitted work from anyone other than their  employer; (2) research support from Bayer Schering; (3) no spouses, partners, or children with relationships with commercial entities that might have an interest in the submitted work; and (4) that he was involved in collecting data for the London Ontario database, which the risk sharing scheme used as a control, and was an upaid adviser to the scheme and to the ScHARR team that did the pharmoeconomic analysis.