Commentary: Validity of assessing change through audit

BMJ 1994; 309 doi: (Published 02 July 1994) Cite this as: BMJ 1994;309:18
  1. Ken MacRae

    The difference between audit and research was neatly encapsulated by the subheading to a leading article by Smith: “Research is concerned with discovering the right thing to do; audit with ensureing that it is done right.”

    Penney et al have reported an audit of the management of induced abortion in hospitals in Scotland, comparing the results obtained in a first round of six months in 1992 with those obtained in a second round of two months in 1993. 2The authors conclude that four statistically significant improvements were detectable when all 10 hospitals were taken together and that no overall deteriorations occurred in any elements of care. A rather different picture emerged when individual hospitals were considered. Considering 15 of the 16 criteria of care, giving 150 comparisons of the two rounds of audit, it was concluded that major improvements occured in 31 comparisons and significant deteriorations occurred in 11. Despite acknowledging that some of the improvements recorded might have been unrelated to the audit feedback exercise, Penney et al believe that audit was instrumental in producing the dramatic improvement in the frequency of mifepristone priming before midtrimester abortion.

    The problem with drawing any such conclusion from such evidence is not whether it true but whether it is valid. Methodologically, a comparison of two audit rounds is like a clinical trial using historical controls. In 1977 Pocock addressed the temptation to draw conclusions from non-randomised comparisons in clinical trials when no sources of bias can be identified. 3He reviewed a number of cancer chemotherapy trials in the United States and found 19 pairs carried out by cooperative groups in which the same control treatment had been used in consecutive randomised trials. Most of the trials had over 100 patients per treatment, and no obvious sources of bias suggesting differences between the trials in each pair were identifiable. Despite this, the death rates with the soame treatment in the two successive trials showed changes ranging from a decrease of 46% to an increase of 24%. four of the 19 pairs showed changes significant at the 5% level. Pocock concluded: “Such marked evidence of differences between trials indicates that any comparison of treatments not within an RCT [randomised controlled trial] must be deemed highly suspect.”

    Logic of a clinical trial

    It is worth reviewing the logic on which drawing a causal inference is based in a clinical trial. An apparent difference between two treatments has four possible explanations, but these are not mutually exclusive. Firstly, the difference could be “real,” indicating a difference due to the difference in treatment. Secondly, it may simply have occurred by chance even though the two treatments are equally effective (or ineffective). Thirdly, the patients in the two treatment groups might be different — that is, there might be an allocation or group membership bias. Fourthly, there could be a difference in the way that the outcome of treatment was judged — that is, an assessment or measurement bias.

    The logic of a clinical trial is that we can conclude that a difference is real only if we can eliminate the other three explanations. The role of chance is considered by performing a statistical test to give a P value. The usual convention is that if P is less than 0.05 chance may be rejected as a reasonable explanation of the difference. A particular problem arises when several P values have been calculated, as it becomes ever more likely that low values will occur even if the true differences in all the comparisons made are zero. Though a low P value is necessary in order to draw a cause and effect conclusion, it is not enough. Allocation and assessment bias also need to be considered. Allocation bias is addressed usually by allocating patients randomly to the treatment groups. Even if the comparison is of two groups treated concurrently, trials using inferior methods of allocation are not acceptable to the BMJ.4 Assessment bias is prevented either by having an objective outcome measure or by blinding the assessor to the treatment identity (hence when both the patient and the clinician are so blinded the trial is “double blind”).

    Because in non-randomised, non-blind comparisons allocation and assessment bias have not been addressed methodologically an attempt is usually made to address them logically. That is a rational argument is advanced that there is no reason for any such biases to exist or be large enough to explain the differences seen. Low P values are cited in support of the causal conclusion.

    Is the BMJ operating a double standard in requiring much more rigorous methodology for treatment comparisons which are explicitly labelled as “research” but not applying such rigorous criteria for studies not so labelled but which attempt to draw causal conclusions? I think it is.


    1. 1.
    2. 2.
    3. 3.
    4. 4.
    View Abstract