Intended for healthcare professionals

Education And Debate

North of England evidence based guidelines development project: methods of developing guidelines for efficient drug use in primary care

BMJ 1998; 316 doi: (Published 18 April 1998) Cite this as: BMJ 1998;316:1232
  1. Martin Eccles (Martin.Eccles{at}, professora,
  2. Nick Freemantle, senior research fellowb,
  3. James Mason, research fellowb
  1. a Centre for Health Services Research, University of Newcastle upon Tyne, Newcastle upon Tyne NE2 4AA
  2. b Centre for Health Economics, University of York, York YP1 5DD
  1. Correspondence to: Professor Eccles
  • Accepted 11 December 1997

Practice guidelines are valid if “they lead to the health gains and costs predicted for them.”1 When implemented, valid guidelines lead to changes in clinical practice and improvements in outcomes for patients.2-5 Invalid guidelines, however, may lead to the use of ineffective interventions that waste resources, or even to harm.

Guidelines must offer recommendations for both effective and efficient care, and these have not previously been available in the United Kingdom. We have reported the development and content of guidelines for primary care in the United Kingdom based explicitly on evidence of effectiveness.6-9 Here, we present the methods used to develop evidence based guidelines on the use in primary care of four important groups of drugs—angiotensin converting enzyme inhibitors in patients with heart failure, choice of antidepressants, non-steroidal anti-inflammatory drugs in patients with osteoarthritis, and aspirin as an antithrombotic agent.10-13 Abridged versions of the guidelines on angiotensin converting enzyme inhibitors, aspirin, and non-steroidal anti-inflammatory drugs will be published in subsequent articles.14-16

Summary points

Guideline development groups defined important clinical questions, produced search criteria, and drew up protocols for systematic review and, where appropriate, meta-analysis

Medline and Embase were searched for systematic reviews and meta-analyses, randomised trials, quality of life studies, and economic studies

Meta-analysis was used extensively by the group to answer specific clinical questions

Statements on evidence were categorised in relation to study design, reflecting their susceptibility to bias

Strength of recommendations was graded according to the category of evidence and its applicability, economic issues, values of the guideline group and society, and the groups' awareness of practical issues

Recommendations cease to apply in December 1999, by which time relevant results that may affect recommendations may be known

Guideline development groups

Guideline development groups comprised three broad classes of members—relevant healthcare professionals (up to five general practitioners (all with an interest and postgraduate training in primary care therapeutics), up to two hospital consultants, a health authority medical or pharmaceutical adviser, and a pharmacist); specialist resources (an epidemiologist (NF) and a health economist (JM)); and a specialist in guideline methodology and in leading small groups (ME). All group members were offered reimbursement of their travelling expenses and general practitioners could also claim for any expenses incurred in employing a locum.

Evidence: identification and overview

As a first step, the guideline development groups defined a set of clinical questions within the area of the guideline. This ensured that the guideline development work outside the meeting focused on issues that practitioners considered important and produced criteria for the search and the protocol for systematic review and, where appropriate, meta-analysis.

Search strategy

Searches were undertaken using Medline and, where appropriate, Embase. Using a combination of subject heading and free text terms, the search strategies located systematic reviews and meta-analyses, randomised trials, quality of life studies, and economic studies. Further details of the specific search strategies are provided in the full versions of the guidelines.10-13 Recent, high quality review articles and bibliographies and contacts with experts were used extensively. New searches were concentrated on areas where existing systematic reviews were unable to provide valid or up to date answers. The search strategy was backed up by the expert knowledge and experience of group members.

Synthesising published reports

We assessed the quality of relevant studies retrieved and their ability to provide valid answers to the questions posed. Assessment of the quality of studies considered issues of internal, external, and construct validity.17 The criteria used are shown in the box. Once individual papers had been assessed for methodological rigour and clinical importance, the information was synthesised.

Criteria for assessing quality of randomised trials

  • Appropriateness of inclusion and exclusion criteria

  • Concealment of allocation

  • Blinding of patients

  • Blinding of health professionals

  • Objective or blind method of data collection

  • Valid or blind method of data analysis

  • Completeness and length of follow up

  • Appropriateness of outcome measures

  • Statistical power of results

Describing evidence

We used meta-analysis to summarise and describe the results of studies, conducting analyses to answer specific questions raised by the guideline development groups. Our primary aim was to provide valid estimates of treatment effects using approaches that provided results in a form that could best inform treatment recommendations.

Meta-analyses combine statistically the results from similar studies and provide a weighted average of study estimates of effect. The most important criterion for combining studies is that their combination makes practical sense and, therefore, the results are interpretable. Statistical analysis procedures for meta-analysis using different outcomes are essentially analogous; all involve large sample theory and differ mainly in the details of calculations of standard errors and bias correction.18 Fixed effects models assume a common underlying effect and weight each study by the inverse of the variance. Random effects models assume a distribution of effects and incorporate this heterogeneity into the overall estimate of effect and its precision. Decisions on the appropriateness of fixed or random effects models were based primarily on a priori assumptions about the construct being tested in each case. Where heterogeneity between studies was identified, we also reported routinely random effects results.

Meta-analysis of binary data

Worked example In the study of left ventricular disease treatment trial of enalapril in patients with heart failure, there were 452 deaths in 1285 patients randomised to receive enalapril and 510 deaths in 1284 patients allocated to placebo at the end of the four years' follow up. In a two by two table these data provide an odds ratio of 0.82, a risk ratio of 0.89, and a risk difference of −0.045 (or a 4.5% reduction in the risk of death).

Publication bias

Publication bias and missing data can undermine substantially the validity of meta-analyses.21 Besides using sensitive search strategies, we went to considerable lengths to obtain missing data from the trials identified. We wrote to investigators and the companies sponsoring them, and followed up non-respondents with further letters and, where appropriate, other forms of communication.

Binary outcomes

Meta-analysis of binary data, such as the number of deaths in a randomised trial, enables the results of a group of trials to be expressed in several ways (box). The pooled odds ratio is a statistically robust measure but is hard to interpret clinically; risk ratios are easier to interpret. Both are inadequate for exploring the practical implications of interventions in primary care. Risk differences are less helpful for exploring underlying effects, but are useful for describing the importance of the effects of an intervention in practice. Pooled risk differences can be adjusted for time of exposure when reviews include trials of varying lengths. This provides estimates of annual risk that can also be expressed as numbers needed to treat.22

Continuous outcomes

Where continuous outcomes are measured similarly in different studies, meta-analysis can be used to calculate a weighted mean difference. If measurement between studies is not undertaken using a common metric—because different instruments are used or poor reliability between those undertaking rating is likely—standardised scores based on variance within the study may be calculated for each trial. This approach, for example, enables the statistical pooling of outcomes expressed in different versions of the Hamilton depression rating scale, in which the 17 and 21 item forms are commonly used. We used the straightforward approach proposed by Hedges and Olkin,25 in which the variance estimate is based upon the intervention and control group and the effect size is corrected for bias due to small sample size.

Economic analysis

The guidelines include systematic appraisals of effectiveness, compliance, safety, health service resource use, and costs of medical interventions in British general practice. The economic analysis is presented in a straightforward manner, showing the possible bounds of cost effectiveness that may result from treatment. Lower and higher estimates of cost effectiveness reflect the available evidence and the concerns of the guideline development group. Economic analyses are susceptible to bias through the methods used; we avoided making strong statements where uncertainty existed. However, the simplicity of presentation permits simple reworking with different values from the ones used by the group. This practice reflects the desire of group members for understandable and robust information upon which to base recommendations.

Presenting a review of previous economic analyses which have adopted a variety of differing perspectives, analytic techniques, and baseline data was not considered helpful. However, economic reports were reviewed to compare findings of the guideline project with representative published economic analyses and to interpret differences when these occurred.

Categorising evidence

Summarised evidence was categorised according to study design, and reflects susceptibility to bias. The box shows the categories in descending order of importance. Categories of evidence were adapted from the classification of the United States Agency for Health Care Policy and Research.26 Questions were answered using the best evidence available. If, for example, a question on the effect of an intervention could be answered by category I evidence, then studies of weaker design (controlled studies without randomisation) were not reviewed. This categorisation is most appropriate to questions of causal relations. Similar taxonomies for other types of research question do not yet exist.

Categories of evidence

Ia—Evidence from meta-analysis of randomised controlled trials

Ib—Evidence from at least one randomised controlled trial

IIa—Evidence from at least one controlled study without randomisation

IIb—Evidence from at least one other type of quasi-experimental study

III—Evidence from descriptive studies, such as comparative studies, correlation studies and case-control studies

IV—Evidence from expert committee reports or opinions or clinical experience of respected authorities, or both

Strength of recommendation

Informal consensus methods were used to derive recommendations, and reflect the certainty with which the effectiveness and cost effectiveness of a medical intervention can be recommended. Recommendations are based upon consideration of the following: the strength of evidence, the applicability of the evidence to the population of interest, economic considerations, values of the guideline developers and society, and guideline developers' awareness of practical issues. While the process of interpreting evidence inevitably involves value judgments, we clarified the basis of these judgments as far as possible by making this process explicit. The relation between the strength of a recommendation and the category of evidence is shown in the box.

Strength of recommendation

A—Directly based on category I evidence

B—Directly based on category II evidence or extrapolated recommendation from category I evidence

C—Directly based on category III evidence or extrapolated recommendation from category I or II evidence

D—Directly based on category IV evidence or extrapolated recommendation from category I, II or III evidence

Areas without evidence

Informal consensus methods were used to develop recommendations in areas where there was no evidence. This process sometimes identified important unanswered research questions. These are recorded at the end of the relevant section of the guideline.

Review of the guideline


External reviewers were chosen to reflect three groups: potential users of the guidelines, experts in the subject area, and guideline methodologists. Although the reviewers' comments influenced the style and content of the guidelines, these remained the responsibility of the development group.

Scheduled review

The recommendations of these guidelines cease to apply at the end of 1999, by which time new, relevant results that may affect recommendations are likely to be available.


We thank the following for their contribution to the functioning of the guidelines development group and the development of the practice guideline: Janette Boynton, Anne Burton, Julie Glanville, Susan Mottram.

Funding: The development of the guideline was funded by the Prescribing Research Initiative of the Department of Health.

Conflict of interest: None.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.