Checklists for improving rigour in qualitative research: a case of the tail wagging the dog?BMJ 2001; 322 doi: http://dx.doi.org/10.1136/bmj.322.7294.1115 (Published 05 May 2001) Cite this as: BMJ 2001;322:1115
- Rosaline S Barbour (), senior lecturer in primary care R&D
- Accepted 9 November 2000
Qualitative research methods are enjoying unprecedented popularity. Although checklists have undoubtedly contributed to the wider acceptance of such methods, these can be counterproductive if used prescriptively. The uncritical adoption of a range of “technical fixes” (such as purposive sampling, grounded theory, multiple coding, triangulation, and respondent validation) does not, in itself, confer rigour.
In this article I discuss the limitations of these procedures and argue that there is no substitute for systematic and thorough application of the principles of qualitative research. Technical fixes will achieve little unless they are embedded in a broader understanding of the rationale and assumptions behind qualitative research.
Checklists can be useful improving qualitative research methods, but overzealous and uncritical use can be counterproductive
Reducing qualitative research to a list of technical procedures (such as purposive sampling, grounded theory, multiple coding, triangulation, and respondent validation) is overly prescriptive and results in “the tail wagging the dog”
None of these “technical fixes” in itself confers rigour; they can strengthen the rigour of qualitative research only if embedded in a broader understanding of qualitative research design and data analysis
Otherwise we risk compromising the unique contribution that systematic qualitative research can make to health services research
Checklists in quantitative research
In medical research the question is no longer whether qualitative methods are valuable but how rigour can be ensured or enhanced. Checklists have played an important role in conferring respectability on qualitative research and in convincing potential sceptics of its thoroughness.1–3 They have equipped those unfamiliar with this approach to evaluate or review qualitative work (by providing guidance on crucial questions that need to be asked) and in reminding qualitative researchers of the need for a systematic approach (by providing an aide-mémoire of the various stages involved in research design and data analysis4).
Qualitative researchers stress the importance of context but sometimes forget that research itself is carried out against an ever-changing backdrop. Now that it has secured a place in the methodological mainstream, qualitative research is increasingly being influenced by funding and editorial policies. Despite disclaimers by authors that their checklists should be viewed as being “reflective rather than constitutive of good research,”5 there is evidence that checklists are sometimes being used prescriptively.
Over the past two years, several researchers have informed me that they must comply with various procedures (such as respondent validation, multiple coding, etc) in order to satisfy the requirements of specific journals where they hope to publish their work. (I am not concerned here with the accuracy of such claims, although my own experience suggests these are exaggerated.) While we all attempt to tailor our writing to match the style and format of the journal in question, the strategic adoption of such technical fixes has wider repercussions. The complex dilemmas in research design that qualitative researchers face with regard to sampling, choice of methods, and approaches to analysis cannot be solved by formulaic responses. If we succumb to the lure of “one size fits all” solutions we risk being in a situation where the tail (the checklist) is wagging the dog (the qualitative research).
From reading recent journals and my experience of reviewing journal articles and grant submissions, I find that the five technical fixes currently enjoying the greatest popularity are purposive sampling, grounded theory, multiple coding, triangulation, and respondent validation (table). The rest of this article outlines their limitations and provides a more realistic appraisal of their potential.
Rather than aspiring to statistical generalisability or representativeness,6 qualitative research usually aims to reflect the diversity within a given population.7 In the past qualitative research often relied on convenience samples, particularly when the group of interest was difficult to access. Purposive (or theoretical8) sampling, however, offers researchers a degree of control rather than being at the mercy of any selection bias inherent in pre-existing groups (such as clinic populations). With purposive sampling, researchers deliberately seek to include “outliers” conventionally discounted in quantitative approaches.9 It allows for such deviant cases to illuminate, by juxtaposition, those processes and relations that routinely come into play, thereby enabling “the exception to prove the rule.” 9 10
Some strategies claimed as examples of purposive sampling in effect involve hybrids, which retain elements of random or convenience sampling and which are unlikely to yield the spread of respondents required. When they are provided at all, details of sampling are often dealt with in the methods section of papers and are disregarded in the analysis section, which often consists of little more than a description of undifferentiated themes that emerged during data analysis. For example, we are likely to be told in the methods section that a third of the sample were men, but the analysis section does not discuss how their perspectives differed from those of female respondents.
Such approaches do not use qualitative datasets to full advantage. That would involve applying the constant comparative method11 to continuously compare the views and experiences of respondents who have been selected precisely—indeed, purposively—in order to illuminate subtle but potentially important differences. In other words, samples may have been selected purposively, but they are not being used purposefully to interrogate the data collected.
In its purest form the grounded theory approach to data analysis alleges that all explanations or theories are derived from the dataset itself rather than from a researcher's prior theoretical viewpoint.12 In practice, however, you are unlikely to obtain research funding without having carried out a thorough literature review or having formulated some idea of the content of the data you are likely to collect.
According to many researchers who invoke the concept of grounded theory, coding categories reflect the content of data collected rather than the questions on the interview schedule or focus group topic guide and often use concepts or vocabulary borrowed from respondents. However, few published papers yield the surprises likely to be a feature of analyses driven entirely by respondents' concerns, and the terminology and theories to which papers appeal generally bear an uncanny resemblance to current disciplinary concerns and debates.
Bryman and Burgess have criticised the use of grounded theory as “an approving bumper sticker” invoked to confer academic respectability rather than as a helpful description of the strategy used in analysis.13 Melia claims that most researchers use a pragmatic variant, whereby they can achieve added value by identifying new themes from the data alongside those that could have been anticipated from the outset.14 All too often, however, the tension between these two different sorts of insight—and its potential to illuminate the topic being studied—is not explored in the presentation of findings.
In the absence of an attempt to systematically analyse the commonalities and contradictions reflected in the data, many researchers produce an artificially neat and tidy account that is descriptive rather than analytical and which militates against formulating in-depth analyses. Uncritical adoption of grounded theory can result in explanations tinged with the “near mysticism” that Melia derides in the original text on grounded theory.12 A sleight of hand produces a list of “themes,” and we are invited to take it on trust that theory somehow emerges from the data without being offered a step by step explanation of how theoretical insights have been built up.
Multiple coding concerns the same issue as the quantitative equivalent “inter-rater reliability” and is a response to the charge of subjectivity sometimes levelled at the process of qualitative data analysis. Although multiple coding does not usually demand complete replication of results, it does involve the cross checking of coding strategies and interpretation of data by independent researchers. While I would caution against multiple coding of entire datasets (on the grounds of economy in both cost and effort), some element of multiple coding can be a valuable strategy. It can be useful to have another person cast an eye over segments of data or emergent coding frameworks, and this is a core activity of supervision sessions and research team meetings.15
Although six experienced researchers who independently coded one focus group transcript showed substantial agreement, Armstrong et al found considerable variation in the ways that they packaged coding frameworks (including the language used).16 This is not surprising, given the complexity of qualitative data and the range of disciplinary backgrounds and interests of qualitative researchers. Indeed, Mauthner et al have shown how researchers' original interpretations may shift when they revisit previously collected data.17
However, the degree of concordance between researchers is not really important; what is ultimately of value is the content of disagreements and the insights that discussion can provide for refining coding frames. The greatest potential of multiple coding lies in its capacity to furnish alternative interpretations and thereby to act as the “devil's advocate” implied in many of the checklists1–3 in alerting researchers to all potentially competing explanations. Such exercises encourage thoroughness, both in interrogating the data at hand and in providing an account of how an analysis was developed. Whether this is carried out by a conscientious lone researcher, by a team, or by involving independent experts is immaterial: what matters is that a systematic process is followed and that this is rendered transparent in the written research project.
The current heavy reliance on triangulation in grant applications testifies both to the respect accorded to this concept and to its perceived value in demonstrating rigour. Triangulation addresses the issue of internal validity by using more than one method of data collection to answer a research question. In principle, it sounds eminently feasible to combine, say, observational fieldwork and interviews or focus groups in order to get a broader view. However, triangulation is difficult to perform properly: data collected using different methods come in different forms and defy direct comparison. This is true for different types of qualitative data, such as interview and focus group transcripts, as well as for the more obvious differences between qualitative and quantitative data.
The production of similar findings from different methods merely provides corroboration or reassurance; the absence of similar findings does not, however, provide grounds for refutation. This is because different methods used in qualitative research furnish parallel datasets, each affording only a partial view of the whole picture.
Triangulation relies on the notion of a fixed point, or superior explanation, against which other interpretations can be measured. Qualitative research, however, is usually carried out from a relativist perspective, which acknowledges the existence of multiple views of equal validity.1 Therefore, it does not readily lend itself to the production or observance of such a hierarchy of evidence.18
Richardson suggests that it is more helpful to conceive of complementary rather than competing perspectives and offers the term “crystallisation” as an alternative to triangulation.19 Qualitative research, with its distinctive approach to harnessing the analytical potential of exceptions, allows a research question to be examined from various angles. As Mays and Pope conclude, comprehensiveness may be a more realistic goal for qualitative research than is internal validity.20 According to this approach, apparent contradictions (or exceptions) do not pose a threat to researchers' explanations; they merely provide further scope for refining theories.
Given the current focus on consumerism, respondent validation, which involves cross checking interim research findings with respondents, has a ready appeal. Respondents' reactions to emerging findings can certainly help refine explanations—as can key informants'—but several commentators have questioned whether it is always appropriate. 20 21 As Mays and Pope point out, researchers seek to provide an overview whereas respondents have individual concerns, and this can result in apparently discrepant accounts.20
Sometimes researchers choose to disregard their own interpretations and to accept those of respondents at face value. This can be cosy but may lead to collusion: Atkinson has warned of the dangers of “romanticising” respondents' accounts.22 Respondent validation exercises, such as reading of drafts, make considerable demands on participants' time and, depending on the research topic and content of transcripts, can even be exploitative or distressing.23
Respondent validation can be particularly valuable in action research projects, where researchers work with participants on an ongoing basis to facilitate change. Most health services research, however, involves a one-off data collection exercise, in which respondent validation may be more trouble than it is worth.
Although some of the technical fixes discussed here may seem appealing in the face of the dual imperatives of securing grant funding and publication, each has limitations. Reducing qualitative research to a list of technical procedures, however extensive, is overly prescriptive and results in “the tail wagging the dog.” None of these technical fixes, in itself, confers rigour. They can strengthen the rigour of qualitative research only if they are embedded in a broad understanding of qualitative research design and data analysis. Otherwise we run the risk of compromising the unique contribution that systematic and thoughtfully carried out qualitative research can make to health services research.
This article is based on a presentation to the British Sociological Association's Regional Medical Sociology Group in London in March, 2000. I am grateful to those who attended for their constructive feedback; also to Helen Richards and Graham Watt for helpful comments on an earlier draft.
Competing interests None declared.