Intended for healthcare professionals

Views & Reviews Personal View

Guidelines should reflect all knowledge, not just clinical trials

BMJ 2012; 345 doi: (Published 05 October 2012) Cite this as: BMJ 2012;345:e6702
  1. Teun Zuiderent-Jerak, associate professor of science and technology studies1,
  2. Frode Forland, public health specialist, project leader, Collaboration for Evidence Based Healthcare in Africa2,
  3. Fergus Macbeth, consultant oncologist3
  1. 1Institute of Health Policy and Management, Erasmus University Rotterdam, PO Box 1738, Rotterdam 3000 DR, The Netherlands
  2. 2KIT Biomedical Research, Royal Tropical Institute, Amsterdam, The Netherlands
  3. 3Velindre NHS Trust, Cardiff, UK
  1. Correspondence to: T Zuiderent-Jerak T zuiderent{at}

Over the past 20 years, evidence based medicine has had a substantial influence on clinical decision making throughout the developed world. It now underpins healthcare policy and the burgeoning industry of clinical guideline development. Two problems have resulted. Firstly, so called high level evidence is increasingly equated with strong recommendations; and secondly, evidence other than that derived from randomised controlled trials (RCTs) is seen as intrinsically less valuable or reliable.

The concept of a hierarchy of evidence with RCTs at the top is deeply ingrained, despite Sackett and colleagues’ warning that evidence based medicine “is not restricted to randomised trials and meta-analyses. It involves tracking down the best external evidence with which to answer our clinical questions.”1 They pointed out that RCTs and systematic reviews of RCTs will provide the most reliable evidence that a therapy will do more good than harm but acknowledged that “some questions about therapy do not require randomised trials (successful interventions for otherwise fatal conditions) or cannot wait for the trials to be conducted.”

They also made clear that other questions, such as those about diagnostic tests and prognosis, can only be answered using other forms of evidence. In his 2008 Harveian oration, Michael Rawlins, chairman of the National Institute for Health and Clinical Excellence (NICE), elegantly analysed the strengths and weaknesses of RCTs. He emphasised the need for other forms of evidence in decision making and criticised the concept of hierarchies of evidence, pointing out their great number and inconsistency. He added, “Hierarchies attempt to replace judgment with an oversimplistic, pseudoquantitative assessment of the quality of the available evidence.”2

Despite recent attempts to add nuance to hierarchies of evidence,3 the belief that RCTs and systematic reviews provide the best evidence to answer questions about therapy may pose problems for those trying to answer other clinical or public health questions. For example, NICE’s guidance on metastatic spinal cord compression includes 114 recommendations, 40% of which are about treatment. Only 7% of all its recommendations were based on RCTs; 40% were based on observational studies, including clinical audit.4 From the perspective of a hierarchy of evidence this guideline could easily be criticised for making recommendations that are based on “the lowest levels of published evidence . . . [or] little more than anecdote . . . and opinion,”5 rather than appreciated for the attempt to base its recommendations on the most robust knowledge available.

Nevertheless, hierarchies are widespread and influential in the business of guideline development. For example, the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group attempts to tackle the shortcomings of grading systems in healthcare. GRADE has wisely separated the concepts of level of evidence and strength of recommendations. Yet the system it has proposed, which is increasingly used by guideline developers worldwide, includes an evidence hierarchy in which an RCT is initially labelled as being of “high quality” and an observational study “low quality,” without considering how relevant that form of study might be to the particular clinical question.6 Consequently, although GRADE is trying to include the assessment of other types of knowledge, the hierarchy of evidence that Sackett and colleagues proposed would apply to most recommendations of therapy risks being applied universally, though this has been challenged.7

We lack methods to assess the value of other forms of knowledge relevant to development of clinical guidelines. So called “considered judgment” has always been important when formulating recommendations from an assessment of the evidence, and consensus methods were once widely used but less so now. In contrast to the fine grained assessment tools of GRADE for evidence that is high in the hierarchy, fewer (if any) methods have been agreed on to grade the evidence derived from, for instance, outbreak investigations, laboratory research, mathematical modelling, qualitative research, or quality improvement processes and clinical audit. This absence of grading methods leads to under-representation of evidence from such research in guidelines and guideline based health policy. The problem of implementation from which guidelines are said to suffer may be because their recommendations do not cover the issues that clinicians struggle with, rather than because clinicians are resisting the evidence.8

The question is not whether a clinical guideline can make reliable recommendations in the absence of evidence from RCTs, but how it can. The Appraisal of Guidelines for Research and Evaluation (AGREE) instrument is a method for assessing the quality of clinical guidelines.9 A recent Dutch study that analysed 62 guidelines with the AGREE instrument found that availability of high level evidence resulted in a high AGREE score and vice versa.10 The absence of evidence from clinical trials was generally not compensated for by the inclusion of other types of studies or expert opinion. One reason for this correlation may be the tendency to frame starting questions in the light of available evidence from RCTs rather than to start with a clinical problem and collect the best external evidence accordingly. Another reason may be the absence of good instruments for weighing and including different types of knowledge.

To bring guideline development more in line with what the pioneers of evidence based medicine envisioned, and to mitigate the increasing risk of confusing clinical practice guidelines with evidence summaries, experience must be gathered about the application and integration of different types of knowledge into guideline development worldwide. The Guidelines International Network, a global self-funding network of organisations and individuals, undertook to collect such experience. At its conference in Berlin in August 2012, a panel session and workshop started a discussion about the processes needed for reaching international consensus about integrating other types of evidence into guidelines for public health and clinical practice. More than 50 guideline developers from several countries discussed appraising study designs for complex, organisational interventions, ways of including qualitative studies, and the use of registries in developing recommendations. There was wide interest in a dedicated working group for further developing methods for weighing and including different types of knowledge in guidelines. These efforts should not be seen as criticism of the current production and use of clinical evidence, but as an attempt to renew our understanding of what counts as the best evidence for different clinical problems.


Cite this as: BMJ 2012;345:e6702


  • Competing interests: FF and FM are members of the board of the Guidelines International Network. The workshop in August 2012 was financially supported by the Guidelines International Network and the Dutch Council for Quality of Healthcare.

  • Provenance and peer review: Not commissioned; externally peer reviewed.


View Abstract