Intended for healthcare professionals

Rapid response to:


Updated NICE guidance on chronic fatigue syndrome

BMJ 2020; 371 doi: (Published 16 December 2020) Cite this as: BMJ 2020;371:m4774

Rapid Response:

Did the NICE guideline committee follow GRADE methodology?

Dear Editor

I would like to respond to the comment by Jason Busse and colleagues as it includes some remarkable statements. The authors criticize the NICE guideline committee on myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) for employing “a disastrous misapplication of GRADE methodology.” In a draft document, the committee rated the quality of evidence for GET as low to very low.

As an example of an “appropriate” application of GRADE, Busse and colleagues refer to a contested Cochrane review on GET for ME/CFS. [1] This review, however, also rated the quality of evidence in support of GET as low to very low with the sole exception of post-treatment fatigue where the quality of evidence was rated as moderate. At follow-up assessments, however, the Cochrane review also rated the evidence that GET reduces fatigue as very low quality. This suggests that the difference between both assessments was rather small.

The committee gave several arguments for downgrading the quality of evidence of GET, which are all in accord with GRADE methodology.

1) Indirectness. Following a 2015 report by the National Academy of Medicine, the NICE committee regarded post-exertional malaise (PEM) - a worsening of symptoms following exertion - as a characteristic feature of ME/CFS. Trials on GET used case definitions, such as the Oxford and Fukuda criteria, that were created in the 1990s and that do not require patients to experience PEM. The committee decided that a population diagnosed with such criteria may not accurately represent the ME/CFS population and that people experiencing PEM may respond differently to treatment than those who do not experience it. It was therefore agreed to downgrade the evidence for population indirectness. This is in agreement with other systematic reviews, which also differentiate between case definitions that require PEM and those that do not. [2] Busse and colleagues refer to the Cochrane review which performed a subgroup analysis showing “little or no difference between subgroups based on different diagnostic criteria.” All included studies in this review, however, used the Oxford or Fukuda criteria which do not require PEM.

2) Imprecision. The NICE committee downgraded for imprecision when a confidence interval crossed the minimally important difference (MID). This is in agreement with the GRADE handbook which suggests downgrading for imprecision “if a recommendation would be altered if the lower versus the upper boundary of the CI represented the true underlying effect.” Busse and colleagues, however, argue that researchers should not rate down for imprecision if the lower boundary of the confidence interval excludes a difference of 0.2 standard deviations. The authors do not clarify why such a small effect should be regarded as the clinical decision threshold. Independent estimates of the MID are a more appropriate choice and these are often larger than 0.2 standard deviations. [3]

3) Heterogeneity. The Cochrane review compared different forms of exercise therapy. In the trial by Wallman et al., for example, patients could reduce their activity level if exercise made them feel unwell while other forms of GET were strictly time-contingent. The trial by Jason et al. used anaerobic exercise, while the others focused on aerobic exercise. The FINE trial did not prescribe GET but ‘pragmatic rehabilitation’, an intervention that was delivered by nurses at home. The trial by Powell et al. tested exercise therapy combined with patient education based on cognitive-behavioral principles. By combining these different interventions in one meta-analysis, the estimates found in the Cochrane review resulted in high heterogeneity. The NICE committee, therefore, decided to make greater differentiation between these different forms of exercise therapy by performing multiple meta-analyses.

4) Risk of bias. The committee noted that GET-trials focused on subjective outcomes even though patients nor therapists could be blinded to treatment allocation. This combination was considered an important limitation when interpreting the evidence. The figures cited by Busse and colleagues compare GET to a passive control condition where patients received less time and attention from healthcare providers. Patients in the GET-group also received instructions to interpret their symptoms as less threatening and more benign. According to one therapist manual on GET “participants are encouraged to see symptoms as temporary and reversible, as a result of their current physical weakness, and not as signs of progressive pathology.” Treatment manuals also included strong assertions designed to strengthen patients’ expectations of GET. One patient booklet stated: “You will experience a snowballing effect as increasing fitness leads to increasing confidence in your ability. You will have conquered CFS by your own effort and you will be back in control of your body again.” Patients in the control group received no such instructions. There is therefore a reasonable concern that the reduction on fatigue questionnaires in the GET group reflects response bias rather than a genuine reduction in fatigue. Other reviews have previously come to a similar conclusion. [4, 5]

The recommendation from Busse and colleagues that lack of blinding should not result in downgrading the quality of evidence, even if subjective questionnaires are used as the primary outcome, is at odds with current understanding [6] and has far-reaching implications. It would either mean that drug trialists should no longer attempt to blind patients and therapists (because this wouldn’t affect the quality of evidence) or that behavioral interventions should be treated as an exception where risk of response bias can freely be ignored because it is practically not feasible to blind patients and therapists. Additionally, if the GRADE system was used as Busse and colleagues recommend, there would be a high risk that quack treatments and various forms of pseudo-science also provide reliable evidence of effectiveness in randomized trials. All that is needed is an intervention where therapists actively manipulate how patients interpret and report their symptoms.

In conclusion, the NICE guideline committee followed GRADE methodology sensibly while the recommendations by Busse and colleagues are highly problematic.

1. Larun L, Brurberg KG, Odgaard-Jensen J, Price JR. Exercise therapy for chronic fatigue syndrome. Cochrane Database Syst Rev. 2019;10:CD003200.

2. Wormgoor MEA, Rodenburg SC. The evidence base for physiotherapy in myalgic encephalomyelitis/chronic fatigue syndrome when considering post-exertional malaise: a systematic review and narrative synthesis. J Transl Med. 2021;19:1.

3. Norman GR, Sloan JA, Wyrwich KW. The truly remarkable universality of half a standard deviation: confirmation through another look. Expert Review of Pharmacoeconomics & Outcomes Research. 2004;4:581–5.

4. Vink M, Vink-Niese A. Graded exercise therapy for myalgic encephalomyelitis/chronic fatigue syndrome is not effective and unsafe. Re-analysis of a Cochrane review. Health Psychol Open. 2018;5:2055102918805187.

5. Tack M, Tuller DM, Struthers C. Bias caused by reliance on patient-reported outcome measures in non-blinded randomized trials: an in-depth look at exercise therapy for chronic fatigue syndrome. Fatigue: Biomedicine, Health & Behavior. 2020;8:181–92.

6. Hróbjartsson A, Emanuelsson F, Skou Thomsen AS, Hilden J, Brorson S. Bias due to lack of patient blinding in clinical trials. A systematic review of trials randomizing patients to blind and nonblind sub-studies. Int J Epidemiol. 2014;43:1272–83.

Competing interests: No competing interests

01 March 2021
Michiel Tack