Intended for healthcare professionals

Education And Debate Improving the quality of health care

Research methods used in developing and applying quality indicators in primary care

BMJ 2003; 326 doi: (Published 12 April 2003) Cite this as: BMJ 2003;326:816
  1. S M Campbell, research fellow (stephen.campbell{at},
  2. J Braspenning, senior researcherb,
  3. A Hutchinson, professor in public healthc,
  4. M N Marshall, professor of general practicea
  1. a National Primary Care Research and Development Centre, University of Manchester, Manchester M13 9PL
  2. b UMC St Radboud, WOK, Centre for Quality of Care Research, (229), Postbus 9101, 6500 HB Nijmegen, Netherlands
  3. c University of Sheffield, Section of Public Health, Sheffield, S1 4DA
  1. Correspondence to: S Campbell

    Before we can take steps to improve the quality of health care, we need to define what quality care means. This article describes how to make best use of available evidence and reach a consensus on quality indicators

    Quality improvement is part of the daily routine for healthcare professionals and a statutory obligation in many countries. Quality can be improved without measuring it—for example, by guiding care prospectively in the consultation using clinical guidelines.1 It is also possible to assess quality without quantitative measures, by using approaches such as peer review, videoing consultations, and patient interviews. Measurement, however, plays an important part in improvement.2 We discuss the methods available for developing and applying quality indicators in primary care.

    Summary points

    Most quality indicators are used in hospital practice but they are increasingly being developed for primary care

    The information required to develop quality indicators can be derived by systematic or non-systematic methods

    Non-systematic methods are quick and simple but the resulting indicators may be less credible than those developed by using systematic methods

    Systematic methods can be based directly on scientific evidence or clinical guidelines or combine evidence and professional opinion

    All measures should be tested for acceptability, feasibility, reliability, sensitivity to change, and validity

    What are quality indicators?

    Indicators are explicitly defined and measurable items referring to the structures, processes, or outcomes of care.3 Indicators are operationalised by using review criteria and standards, but they are not the same thing; indicators are also different from guidelines (box 1). Care rarely meets absolute standards,5 and standards have to be set according to local context and patient circumstances. 6 7

    Activity indicators measure how frequently an event happens, such as the rate of influenza immunisation. In contrast, quality indicators infer a judgment about the quality of care provided,6 and performance indicators8 are statistical devices for monitoring performance (such as use of resources) without any necessary inference about quality. Indicators do not provide definitive answers but indicate potential problems or good quality of care. Most indicators have been developed for use in hospitals but they are increasingly being developed for use in primary care.

    Principles of development

    Three preliminary issues require consideration when developing indicators. The first is which aspects of care to assessw1 w2: structures (staff, equipment, appointment systems, etc),w3 processes (such as prescribing, investigations, interactions between professionals and patients),9 or outcomes (such as mortality, morbidity, or patient satisfaction).w4 Our focus is on process indicators, which have been the primary object of quality assessment and improvement. 2 10 The second issue is that stakeholders have different perspectives about quality of care.2 w5 For example, patients often emphasise good communication skills, whereas managers' views are often influenced by data on efficiency. It is important to be clear which stakeholder views are being represented when developing indicators. Finally, development of indicators requires supporting information or evidence. This can be derived by systematic or non-systematic methods.

    Box 1: Definitions and examples of guidelines, indicators, review criteria, and standards

    View this table:

    Non-systematic research methods

    Non-systematic approaches are not evidence based, but indicators developed in this way can still be useful, not least because they are quick and easy to create. An example includes a quality improvement project based on one case study such as a termination of pregnancy in a 13 year old girl. 11 12 Examination of her medical records showed two occasions when contraception could have been discussed, and this led to the development of a quality indicator relating to contraceptive counselling.

    Systematic, evidence based methods

    Whenever possible, indicators should be based solely on scientific evidence such as rigorously conducted (trial based) empirical studies. 13 14 The better the evidence, the stronger the benefits of applying the indicators in terms of reduced morbidity and mortality. An example of an evidence based indicator is that patients with confirmed coronary artery disease should receive low dose (75 mg) aspirin unless contraindicated, as aspirin is associated with health benefits in such patients.

    Systematic methods combining evidence and expert opinion

    Many areas of health care have a limited or methodologically weak evidence base, 2 6 15 especially within primary care. Quality indicators therefore have to be developed using other evidence alongside expert opinion. However, because experts often disagree on the interpretation of evidence, rigorous methods are needed to incorporate their opinion.

    Consensus methods are structured facilitation techniques that explore consensus among a group of experts by synthesising opinions. Group judgments are preferable to individual judgments, which are prone to personal bias. Several consensus techniques exist,1619 including consensus development conferences,17 w6 the Delphi technique,w7 w8 the nominal group technique,w9 the RAND appropriateness method,20 w10 and iterated consensus rating procedures (table).21

    Characteristics of informal and formal methods for developing consensus*

    View this table:

    Consensus development conferences

    In this technique, a selected group of about 10 people are presented with evidence by interested individuals or organisations that are not part of the decision making group. The selected group discusses this evidence and produces a consensus statement.w11 However, unlike the other techniques, these conferences use implicit methods for aggregating the judgments of individuals (such as majority voting). Explicit techniques use aggregation methods in which panellists' judgments are combined using predetermined mathematical rules, such as the median of individual judgments.17 Moreover, although these conferences provide a public forum for debate, they are expensive16 and there is little evidence of their effect on clinical practice or patient outcomes.w12

    Indicators derived from guidelines by iterated consensus rating procedure

    Indicators can be based on clinical guidelines.w13 w14 Review criteria derived directly from clinical guidelines are now part of NHS policy in England and Wales through the work of the National Institute for Clinical Excellence. One example is the management of type 2 diabetes.w15 Iterated consensus rating is the most commonly used method in the Netherlands,w13 w16 where indicators are based on the effect of guidelines on outcomes of care rated by expert panels and lay professionals.w17

    Delphi technique

    The Delphi technique is a postal method involving two or more rounds of questionnaires. Researchers clarify a problem, develop questionnaire statements to rate, select panellists to rate them, conduct anonymous postal questionnaires, and feed back results (statistical, qualitative, or both) between rounds. It has been used to develop prescribing indicators.w18 A large group can be consulted from a geographically dispersed population, although different viewpoints cannot be debated face to face. Delphi procedures have also been used to develop quality indicators with users or patients.w19

    Nominal group technique

    The nominal group technique aims to structure interaction within a group of experts. 16 17 w9 The group members meet and are asked to suggest, rate, or prioritise a series of questions, discuss the questions, and then re-rate and prioritise them. The technique has been used to assess the appropriateness of clinical interventionsw20 and to develop clinical guidelines.w21 This technique has not been used to develop quality indicators with patients, although it has been used to determine patients' views of, for example, diabetes.w22

    RAND appropriateness method

    The RAND method requires a systematic literature review for the condition to be assessed, generation of indicators based on this literature review, and the selection of expert panels. This is followed by a postal survey, in which panellists are asked to read the evidence and rate the preliminary indicators, and a face to face panel meeting, in which panellists discuss and re-rate each indicator.w10 The method therefore combines characteristics of both the Delphi and nominal group techniques. It has been described as the only systematic method of combining expert opinion and evidence.w23 It also incorporates a rating of the feasibility of collecting data.

    The method has been used mostly to develop review criteria for clinical interventions in the United Statesw24 and the United Kingdom.7 w25 As with the nominal group technique, panellists meet and discuss the criteria, but because panellists have access to a systematic literature review, they can ground their ratings in the scientific evidence. Agreement between similar panels rating the same indicators has been found to have greater reliability than the reading of mammograms.w10 However, users or patients are rarely included, and the cost implications are not considered.

    Embedded Image

    (Credit: SUE SHARPLES)

    Maximising effectiveness

    • Several factors affect the outputs derived using consensus techniques.19 These include:

    • Selection of participants (number, level of homogeneity, etc)

    • How the information is presented (for example, level of evidence)

    • How the interaction is structured (for example, number of rounds)

    • Method of synthesising individual judgments (for example, definition of agreement)

    • Task set (for example, questions to be rated).

    The composition of the group is particularly important. For example, group members who are familiar with a procedure are more likely to rate it higher.w26 The feedback provided to panellists is also important.w27

    Group meetings rely on skilled moderators and on the willingness of the group to work together in a structured meeting. Unlike postal surveys, group meetings can inhibit some members if they feel uncomfortable sharing their ideas, although panellists' ratings carry equal weight, however much they have contributed to the debate. Panels for group meetings are smaller than Delphi panels for practical reasons.

    Research methods for applying indicators

    Measures developed by consensus techniques have face validity and those based on rigorous evidence possess content validity. This is a minimum prerequisite for any quality measure. All measures have to be tested for acceptability, feasibility, reliability, sensitivity to change, and validity. 3 22 This can be done by assessing measures' psychometric properties (including factor analyses), surveys (patient or practitioner, or both), clinical or organisational audits, interviews or focus groups. Box 2 gives an example of the development and testing of review criteria for angina, asthma, and diabetes. 9 23

    Box 2: Developing and applying review criteria for angina, asthma, and type 2 diabetes

    Aim—Quality assessment of angina, asthma, and type 2 diabetes 9 23

    Sample—60 general practices in England

    Patient sample—1000 patients with angina, 1000 with asthma, 1000 with diabetes

    Method—Clinical audit; semistructured interviews with nurses and doctors

    Acceptability—Used only review criteria that were rated acceptable and valid by the nurses and doctors working in the practices

    Reliability—Excluded criteria with an inter-rater reliability κ coefficient <0.6

    Feasibility— Excluded criteria relating to <1% of the population sample


    Acceptability —The acceptability of the data collected depends on whether the findings are acceptable to both those being assessed and their assessors. For example, doctors and nurses can be asked about the acceptability of review criteria being used to assess their quality of care.

    Feasibility— Information about quality of care is often driven by availability of data.w28 Quality is difficult to measure without accurate and consistent information,w1 which is often unavailable at both the macro (health organisations) and micro (individual medical records) level.w29 Quality indicators must also relate to enough patients to make comparing data feasible—for example, by excluding those aspects of care that occur in less than 1% of clinical audit samples.

    Reliability —Reliability refers to the extent to which a measurement with an indicator is reproducible. This depends on several factors relating to both the indicator itself and how it is used. For example, indicators should be used to compare organisations or practitioners with similar organisations or practitioners. The inter-rater reliability refers to the extent to which two independent raters agree on their measurement of an item of care.22 In one study, five diabetes criteria out of 31 developed using an expert panel9 were found to have poor agreement between raters when used in an audit.23

    Sensitivity to change —Quality measures need to detect changes in quality of care in order to discriminate between and within subjects.22 This is an important and often forgotten dimension of a quality indicator.6 Little research is available on sensitivity to change of quality indicators using time series or longitudinal analyses.

    Validity —Content validity in this context refers to whether any criteria were rated valid by panels contrary to known results from randomised controlled trials.w30 The validity of indicators has received more attention recently.3 w2 w31 Although little evidence exists of the content validity of the Delphi and nominal group techniques in developing quality indicators,16 there is some evidence of validity for indicators developed with the RAND method.w30 There is also evidence of the predictive validity of indicators developed with the RAND method.w32


    Although it may never be possible to produce an error- free measure of quality, measures should be tested during their development and application for acceptability, feasibility, reliability, sensitivity to change, and validity. This will optimise their effectiveness in quality improvement strategies. Indicators are more likely to be effective if they are derived from rigorous scientific evidence. Because evidence in health care is often unavailable, consensus techniques facilitate quality improvement by allowing a broader range of aspects of care to be assessed and improved.7 However, simply measuring something will not automatically improve it, and indicators must be used within quality improvement approaches that focus on whole healthcare systems.24


    • This is the second of three articles on research to improve the quality of health care

    • Competing interests None declared.

    • Further references are available on These are denoted in the text by the prefix w Embedded Image


    1. 1.
    2. 2.
    3. 3.
    4. 4.
    5. 5.
    6. 6.
    7. 7.
    8. 8.
    9. 9.
    10. 10.
    11. 11.
    12. 12.
    13. 13.
    14. 14.
    15. 15.
    16. 16.
    17. 17.
    18. 18.
    19. 19.
    20. 20.
    21. 21.
    22. 22.
    23. 23.
    24. 24.
    View Abstract