Qualitative Research: Consensus methods for medical and health services research
BMJ 1995; 311 doi: https://doi.org/10.1136/bmj.311.7001.376 (Published 05 August 1995) Cite this as: BMJ 1995;311:376- aNuffield Community Care Studies Unit, Department of Epidemiology and Public Health, University of Leicester, Leicester LE1 7RH,
- bHealth Services Research Unit, Department of Public Health and Policy, London School of Hygiene and Tropical Medicine, London WC1E 7HT
- Correspondence to: Dr Jones.
Health providers face the problem of trying to make decisions in situations where there is insufficient information and also where there is an overload of (often contradictory) information. Statistical methods such as meta-analysis have been developed to summarise and to resolve inconsistencies in study findings—where information is available in an appropriate form. Consensus methods provide another means of synthesising information, but are liable to use a wider range of information than is common in statistical methods, and where published information is inadequate or non-existent these methods provide a means of harnessing the insights of appropriate experts to enable decisions to be made. Two consensus methods commonly adopted in medical, nursing, and health services research—the Delphi process and the nominal group technique (also known as the expert panel)—are described, together with the most appropriate situations for using them; an outline of the process involved in undertaking a study using each method is supplemented by illustrations of the authors' work. Key methodological issues in using the methods are discussed, along with the distinct contribution of consensus methods as aids to decision making, both in clinical practice and in health service development.
Defining consensus and consensus methods
Quantitative methods such as meta-analysis have been developed to provide statistical overviews of the results of clinical trials and to resolve inconsistencies in the results of published studies. Consensus methods are another means of dealing with conflicting scientific evidence. They allow a wider range of study types to be considered than is usual in statistical reviews. In addition they allow a greater role for the qualitative assessment of evidence (box 1). These methods, unlike those described in the other papers in this series, are primarily concerned with deriving quantitative estimates through qualitative approaches.
The aim of consensus methods is to determine the extent to which experts or lay people agree about a given issue. They seek to overcome some of the disadvantages normally found with decision making in groups or committees, which are commonly dominated by one individual or by coalitions representing vested interests. In open committees individuals are often not ready to retract long held and publicly stated opinions, even when these have been proved to be false.
The term “agreement” takes two forms, which need to be distinguished: firstly, the extent to which each respondent agrees with the issue under consideration (typically rated on a numerical or categorical scale) and, secondly, the extent to which respondents agree with each other, the consensus element of these studies (typically assessed by statistical measures of average and dispersion).
Application
The focus of consensus methods lies where unanimity of opinion does not exist owing to a lack of scientific evidence or where there is contradictory evidence on an issue. The methods attempt to assess the extent of agreement (consensus measurement) and to resolve disagreement (consensus development).
The three best known consensus methods are the Delphi process, the nominal group technique (also known as the expert panel), and the consensus development conference. Each of these methods involves measuring consensus, and the last two methods are also concerned with developing consensus. The consensus development conference will not be covered in this paper because it requires resources beyond those at the disposal of most researchers (unlike the other two methods), is commonly organised within defined programmes (for example, by the King's Fund in Britain and the National Institutes of Health in the United States), and has been discussed at length elsewhere.3 4 5 6
The methods described THE DELPHI PROCESS
The Delphi process takes its name from the Delphic oracle's skills of interpretation and foresight and proceeds in a series of rounds as follows:
Round 1: Either the relevant individuals are invited to provide opinions on a specific matter, based on their knowledge and experience, or the team undertaking the Delphi expresses opinions on a specific matter and selects suitable experts to participate in subsequent questionnaire rounds;
These opinions are grouped together under a limited number of headings and statements drafted for circulation to all participants on a questionnaire;
Round 2: Participants rank their agreement with each statement in the questionnaire;
The rankings are summarised and included in a repeat version of the questionnaire;
Round 3: Participants rerank their agreement with each statement in the questionnaire, with the opportunity to change their score in view of the group's response;
The rerankings are summarised and assessed for degree of consensus: if an acceptable degree of consensus is obtained the process may cease, with final results fed back to participants; if not, the third round is repeated.
Figure1 shows an example of this process for a Delphi study undertaken by one of the authors (JJ). In addition to scoring agreement with statements, respondents are commonly asked to rate the confidence or certainty with which they express their opinions.
The Delphi technique has been used widely in health research within the fields of technology assessment,7 8 9 10 education and training11 12 13 14 and priorities and information,15 16 17 and in developing nursing and clinical practice.19 20 21 It enables a large group of experts to be contacted cheaply, usually by mail with a self administered questionnaire (though computer communications have also been used), with few geographical limitations on the sample. Some situations have included a round in which the participants meet to discuss the process and resolve uncertainty or any ambiguities in the wording of the questionnaire.
THE NOMINAL GROUP TECHNIQUE
The nominal group technique uses a highly structured meeting to gather information from relevant experts (usually 9-12 in number) about a given issue. It consists of two rounds in which panellists rate, discuss, and then rerate a series of items or questions. The method was developed in the United States in the 1960s and has been applied to problems in social services, education, government, and industry.22 In the context of health care the method has most commonly been used to examine the appropriateness of clinical interventions23 24 25 26 27 but has also been applied in education and training,28 29 30 in practice development,31 32 33 and for identifying measures for clinical trials.34 35 36
A nominal group meeting is facilitated either by an expert on the topic37 or a credible non-expert38 and is structured as follows:
Participants spend several minutes writing down their views about the topic in question;
Each participant, in turn, contributes one idea to the facilitator, who records it on a flip chart;
Similar suggestions are grouped together, where appropriate. There is a group discussion to clarify and evaluate each idea;
Each participant privately ranks each idea (round 1);
The ranking is tabulated and presented;
The overall ranking is discussed and reranked (round 2);
The final rankings are tabulated and the results fed back to participants.
Figure2 shows an example of a modified nominal group undertaken by one of the authors (DH).
The method can be adapted and has been conducted as a single meeting or with the first stage conducted by post followed by a discussion and rerating at a face to face meeting. Some nominal group meetings have incorporated a detailed review of literature as background material for the topic under discussion.
Alongside the consensus process there may be a nonparticipant observer collecting qualitative data on the nominal group. This approach has some features in common with focus groups (see articl by Kitzinger39). However, the nominal group technique focuses on a single goal (for example, the definition of criteria to assess the appropriateness of a surgical intervention) and is less concerned with eliciting a range of ideas or the qualitative analysis of the group process per se than is the case in focus groups.
Method
ological issues WHO TO INCLUDE AS PARTICIPANTS
There can be few hard and fast rules about who to include as participants, except that each must be justifiable as in some way “expert” on the matter under discussion. Clearly, for studies concerned with defining criteria for clinical intervention, the most appropriate experts will be clinicians practising in the field under consideration. However, the inclusion of other clinicians such as general practitioners may be appropriate to provide an alternative clinical view, particularly when the study is expected to have an impact beyond a particular specialist field. When the discussion concerns matters of general interest, such as health service priorities, participants should include non-clinical health professionals and the expression of lay opinions should also be allowed for.
There is clearly a potential for bias in the selection of participants. Although it has been shown that doctors who are willing to participate in expert panels are representative of their colleagues,40 the exact composition of the panel can affect the results obtained.24 The results will also be affected by any “random” variation in panel behaviour. These problems can be overcome by using a different mixture of participants in further panels.
HOW TO MEASURE THE ACCURACY OF THE ANSWER OBTAINED
The existence of a consensus does not mean that the “correct” answer has been found—there is the danger of deriving collective ignorance rather than wisdom. The nominal group is not a replacement for rigorous scientific reviews of published reports or for original research, but rather a means of identifying current medical opinion and areas of disagreement. For Delphi surveys, Pill recommends that the results should, when possible, be matched to observable events.1 Observers of the accuracy of opinion polls before the 1992 general election in Britain might well agree with this conclusion.
HOW TO FEED BACK THE RESULTS OF EACH ROUND
Agreement with statements is usually summarised by using the median and consensus assessed by using interquartile ranges for continuous numerical scales. These summary statistics may be fed back to participants at each round along with fuller indications of the distribution of responses to each statement in the form of tables of the proportions ranking at each point on the scale (see box 2), histograms, or other graphical representations of the range (see box 3). Feeding back the group's response enables participants to consider their initial ranking in relation to their colleagues' assessments. It should be made clear to each participant that they need not conform to the group view—though, in the nominal group technique, those with atypical opinions (compared with the rest of the group) may face critical questioning of their view from other panel members. In a Delphi exercise, the researcher undertaking the study may ask participants who they have defined as outliers (for example, those in the lower and upper quartiles) to provide written justification for their responses.
Box 2—Example of feedback of second round results in a Delphi40
The following are possible adverse effects of lowering the number of junior medical staff in general medicine and its associated specialties. The star indicates the number you selected to indicate the extent to which you agreed or disagreed with each statement in response to the previous questionnaire. Each of the numbers below the scale represents the percentage of those responding to the questionnaire who selected that particular value. We would be grateful if you would read through the questionnaire and consider whether, in the light of your colleagues' assessments, you would like to alter your response. Please indicate the extent to which you agree or disagree with each statement by circling the appropriate number (0 indicates total disagreement and 9 total agreement): if your choice remains unchanged please circle the same number you selected on the previous questionnaire.
(i) Mortality rates in hospital will rise
disagree 0 1 2 3 4 5 6 7 8 9 agree
5-3-9-4-3-18-18-12-16-12
For nominal groups, rules have been developed to assess agreement when statements have been ranked on a 9 point scale (see box 3). In this example, the scale can be broken down so that scores 1-3 represent a region where participants feel intervention is not indicated; 4-6, a region where participants are equivocal; and 7-9, a region where participants feel intervention is indicated. The first rule is based on where the scores fall on the ranking scale (box 4): if all ratings fall within one of these predefined regions there is said to be strict agreement (in the example, all participants agreed that transurethral resection of the prostate was not indicated). An alternative relaxed definition for agreement is that all ratings fall within any 3 point region. This may be treated as agreement, in that all ratings are within an acceptable range, but the group opinion is ambiguous as to whether intervention is indicated or not.
The second rule tests whether extreme rankings are having an undue influence on the final results and consists of assessing the strict and relaxed definitions by including all ratings for each statement and then by excluding one extreme high and one extreme low rating for each statement. The ranges indicated in box 3 include all ratings, and it is noticeable that several of these ranges are from 1 to 9. It may be that these ranges exaggerate the dispersion of the group's response.
Validity and applicability
There has been an active debate on the validity of the Delphi method. For example, Harold Sackman argued that the Delphi method fails to meet the standards normally set for scientific methods.41 Many of his criticisms were aimed at past studies of poor quality rather than fundamental critiques of the method itself; he particularly criticised poor questionnaire design, inadequate testing of reliability and validity of methods, and the methods of defining and selecting experts. He also argued that the method forces consensus and is weakened by not allowing participants to discuss issues.
Reviews by Pill1 and by Gerth and Smith (personal communication) showed no clear evidence in favour of meeting based methods over Delphi. Rowe et al, though, concluded that the Delphi technique is generally inferior to the nominal group technique, but state that the degree of inferiority is small, arising more from practical than from theoretical difficulties; they argue for further research aiming to improve the practice of Delphi studies—particularly a careful consideration of what constitutes expertise.2
Consensus methods, in particular Delphi, have been described as methods of “last resort,”42 with defenders warning against “overselling” the methods43 and suggesting that they should be regarded more as methods for structuring group communication than as a means for providing answers. There is clearly a danger that since these approaches have a prescribed method and are often used to generate quantitative estimates, they may lead the casual observer to place greater reliance on their results than might be warranted. As we stated earlier, unless the findings can be tested against observed data, we can never be sure that the methods have produced the “correct” answer. This should be made clear in reporting study results.
The structures of Delphi and nominal groups (shown in box 1) aim to maximise the benefits from having informed panels consider a problem (often termed “process gain”) while minimising the disadvantages associated with collective decision making (“process loss”), particularly domination by individuals or professional interests. The extent to which these are realised depends on the ability of those running the studies to use the advantages of the methods. An important role of the facilitator in the nominal group is to ensure that all participants are able to express their views and to keep particular personal or professional views from dominating the discussion; participants in both Delphi and nominal group panels should be selected as to ensure that no particular interest or preconceived opinion is likely to dominate.
Uses
Consensus methods provide a useful way of identifying and measuring uncertainty in medical and health services research. Delphi and nominal group techniques have been used to clarify particular issues in health service organisation: to define professional roles, to aid design of educational programmes, to enable long term projections of need for care for particular client groups where there has been considerable uncertainty (for example, for cases of HIV and AIDS9), and to develop criteria for appropriateness of interventions as part of technology assessment. In addition to forming studies in their own right, these techniques have been widely used as component parts of larger projects.8 31 The two pieces of research from which materials have been presented in this paper each formed part of larger projects: the Delphi exercise44 was concerned with defining possible adverse effects of reducing junior doctor staffing levels as part of a study of the adequacy of hospital medical staffing levels; the nominal group23 was concerned with defining appropriate indications for surgical intervention as part of a population based assessment of need for prostate surgery within an NHS region.
Conclusions
The emphasis, when the findings of Delphi and nominal group studies are presented, should be on the justification in using such methods, the use of sound methodology (including selection of experts and the clear definition of target “acceptable” levels of consensus), appropriate presentation of findings (where proposed standards for presentation—as for clinical practice guidelines45—should be considered), and on the relevance and systematic use of the results. The output from consensus approaches (including consensus development conferences) is rarely an end in itself. Dissemination and implementation of such findings is the ultimate aim of consensus activities—for example, the publication of consensus statements intended to guide health policy, clinical practice, and research, such as the consensus statement on cancer of the colon and rectum.46