On “concordance” and “discordance”
10 January 2007
In their valuable paper Perel et al.  present an outline of a systematic review  concerning the degree of agreement between animal experimentation (or “vivisection” [6, p. 30, n. 1]) and clinical trials in six fields of medical inquiry:
(a) thrombolysis in acute ischaemic stroke (AIS),
(b) bisphosphonates to prevent and treat osteoporosis,
(c) antenatal corticosteroids to prevent neonatal respiratory distress syndrome,
(d) corticosteroids in traumatic head injury,
(e) tirilazad in acute ischaemic stroke,
(f) antifibrinolytics in haemorrage.
The authors conclude that in cases (a), (b), (c) there was «concordance» or «similar outcomes», while in (d), (e), (f) there was not. Most disquietingly -- but not unexpectedly for whoever knew  and , or bothered to examine with some care a sample of the vivisectionist literature -- in all six cases the quality of the animal experiments examined turned out to be «poor». This was indicated by violation of such basic requirements as randomization, adequate allocation concealment, and blinded assessment of outcome (cf. table 1 in ).
Although this dire balance sheet is in itself a sufficient refutation of the standard case for animal experiments as a scientific (rather than psychological) foundation for clinical research, a closer examination of the meaning of “concordance” and “discordance” is rewarding.
In case (a) the authors found evidence of publication bias (i. e. articles with positive results more likely to have been published), which implies that the available data are statistically hard to make sense of. Indeed, a general difficulty facing systematic reviewers is that «The proportion of the work that gets published in a form that is available to the public (rather than just being available to industry and the regulatory authorities) is unknown» ; a good idea would be to contact the regulatory agencies but, to cite the authors’ own example, the Home Office in United Kingdom has no accessible register of animal experiments licensed under the Animals Act 1986 .
In case (c) there was concordance as to reduction of respiratory distress syndrome in animals and in neonates, but not as to mortality, which was reduced in neonates and yet was increased in animal models, and significantly so in ewe models (the pooled odds ratio was 12.5, with 95% confidence interval 1.9 to 79.2, «with no evidence of significant heterogeneity [...]»). Clearly mortality is such a crucial parameter, that, for those who take animal experiments seriously, results in even one animal model indicating correlation of a treatment with increase in mortality should lead to immediate withdrawal of that treatment from clinical use and experimentation. For antenatal corticosteroids we know, from the clinical trials, that this decision would have resulted in the loss of human lives.
The authors in their full report cite, as evidence of «lack of communication» between the animal experimenters and clinical researchers, the «conclusion of one animal head injury study that “timely high-dose dexamethasone treatment may help clinicians to manage head injury cases”», which was published six months after the publication of the CRASH trial. The CRASH trial had shown that this class of treatments actually increased mortality in head injury patients [2, p. 45].
While this example is enlightening as further evidence that animal experimentation, unaided by previous knowledge of what is the correct result that “must” be found, is no better -- and no less dangerous -- than divination, one should keep track, nonetheless, of the extent animal experimenters who “got it right” had read in their (perhaps suitably “elaborated”) data what they knew clinical investigators had already shown. For example, in the “concordant” cases (b) and (c), as the authors inform us in the full report, «The animal research continued after effectiveness in humans was established» . This raises further doubts on the true degree and quality of the purported «concordance».
Let us clear the ground from a possible misunderstanding, often lingering in the debate between supporters and critics of animal experimentation. That some degree of “concordance” between animal experiments and clinical trials will be found is no surprise at all. To say that animal experiments are unreliable, as indeed  confirms they are, does not mean that they give constantly the wrong indication. Were it so, they would be exceedingly useful, since by systematically denying what they indicate we should always get a correct answer to our medical questions. Unfortunately, life is not so easy, and here lies precisely the danger of all pseudoscientific practices .
The discordance in cases (d) and (e) is particularly worrying, as animal experiments failed to show the increase in mortality in case (d) and the risk of death and dependency in case (e). This means that by relying on the “indications” provided by animal experiments lives have been lost.
In case (e), however, the authors try to show that the discordance can be explained by using the results of “concordant” case (a). If successful, this attempt might vindicate, at least in one example, the image of animal experimentation as a potentially reasonable, self- correcting endeavor. But is it so? Let us see.
According to the authors, the discordance in the tirilazad case between animal experimentation on AIS and clinical trials may have arisen from the difference in the time delays, between onset of AIS and treatment, in humans (median 5 hours) and in animal models (median 10 minutes). This suggestion (also advanced in ) needs to be carefully assessed.
At a first level, the remark that excessive delay to treatment, particularly in the case of AIS, might reduce the efficacy of a treatment seems quite plausible. However, can the thrombolysis animal experiments, where there was “concordance”, be used as a sufficient ground to think that “similar” benefits (or lack of them) in animals and, respectively, humans are obtained with quantitatively “similar” delays to treatment? In fact, «there is some evidence that it might take many hours for damage to develop in human brains» (). Moreover, as can be read in the complete report [2, p. 26], in the tirilazad experiments «there was no significant relationship between delay to treatment and efficacy», although «maximum efficacy was seen when treatment was given before the onset of ischaemia, with efficacy appearing to fall thereafter with time». Thus even after the systematic review (, ), the evidence for prescribing similar delays to treatment in animal experiments and in human trials in order to get clinically relevant animal data is far from unambiguous. It is a prudent bet that whatever lessons will be learned from the tilirazad failure -- and from the failure of more than 37 supposedly neuroprotective drugs tested in more than 114 clinical trials after encouraging results on animals  --, they will come from careful study of the human, rather than animal, data.
In general, if one wishes to proceed in a truly scientific way, one cannot but refer to the mechanism of the pathological process and to the mode of action of the drug – if known. More precisely, if one can assume that
(1) in both animal models and humans an essential component of the pathological process acts by the same mechanism and
(2) with the same time scale, or otherwise with a known scale factor; and that
(3) the mode of action of the drug in both animal models and humans is to interfere in the same way with that component of the pathological process, and to affect in the same way any other organs involved;
then one might correctly draw from animal experiments likely conclusions about the efficacy and adverse reactions of the drug on humans.
Once the argument is made explicit it is easy to see what the trouble is. The trouble is that even if we knew that (1) and (2) hold – and unfortunately this is hardly ever the case, also because of the well-known sensitivity of living organisms to small differences, both intraspecific and interspecific ([6, pp. 15-32]) –- still (3) can be assessed only after both the animal experiments and the human trials have been performed. To put it shortly, we can judge the correctness of the inference from animals to humans only with hindsight: “similarity” between animal data and clinical results is normally the effect of a posterior and largely arbitrary reconstruction. This means that animal experiments cannot be used to (scientifically) predict results on humans. And yet the assumption of predictive power is about the only justification of animal experiments which can boast some degree of public acceptance.
3 Combining data from different species
A fundamental question which all systematic reviewers must face is how to combine results from different animal species in a way that is relevant to still another species. Two of the authors stated in 2002: «Consistent results across species and models would provide some reassurance that humans beings might respond in the same way» (; see also ). What is the meaning of «across species»? In how many, or in which, species and strains should one get consistent results in order to be rationally (not merely psychologically) reassured? It is useful to remember that, among mammalians only, about 4,237 different species are known (I take the figure from a book published in 1994), and as already mentioned there are many cases (discussed for instance in  and ) where inconsistency occurs even inside one species, with outcomes depending on strains, sex of animals, food, cages, laboratory environment etc. Taking into account comorbidities, although prima facie reasonable, would only exponentially magnify the intractability of the problem.
I am sure this point need not be belabored, at a time when huge resources are being invested in the development of drugs tailored on the genetic profile of the individual (human) patients. The main reason for such gigantic research effort is, of course, the recognition that most drugs are effective only on a minority -- sometimes a very small one -- of patients (, ).
It is on this very basic issue that science and vivisection part ways. There is simply no scientifically validated procedure for combining data from experiments on different non-human species and to get information relevant to humans. The burden of the proof is on those who claim that such a validation has ever been provided. For example, as ECVAM (European Centre for the Validation of Alternative Methods) director Thomas Hartung recently stated, «The [animal] toxicity tests that have been used for decades are “simply bad science”» . There is no reason to think that systematic reviews, in themselves, can remedy this situation. For all we know, to combine results from several species might be like the procedure used to solve the legendary problem of the length of the nose of the Emperor of China, as described by physicist Richard P. Feynman [11, pp. 295-6]: one asked Chinese people what they thought this length was, and then averaged answers. All very well -- except that no one in China had ever been permitted to see the Emperor.
In conclusion, it is not surprising that animal experimentation does not inform human health care : there is no known scientific way in which it could do that. The usefulness of systematic reviews of animal studies lies in bringing home this important truth, while the suggestion to adopt «an iterative approach to improving the relevance of animal models to clinical trial design»  seems in the light of all the available evidence unwarranted.
 Perel P., Roberts I., Sena E., Wheble P., Briscoe C., Sandercock P., Macleod M., Mignini L. E., Jayaram P., Khan K. S., “Comparison of treatment effects between animal experiments and clinical trials: systematic review”, BMJ, 15 December 2006
 --, “Testing treatment on animals: relevance on humans”,
 Roberts I., Kwan I., Evans P., Haig S. “Does animal experimentation inform human healthcare? Observations from a systematic review of international animal experiments on fluid resuscitation”, BMJ, vol. 324, 23 February 2002, pp. 474-6.
 Pound P., Ebrahim S., Sandercock P., Bracken M. B., Roberts I., “Where is the evidence that animal research benefits humans?”, BMJ,, vol. 328, 28 February 2004, pp. 514-7.
 Green A. R., Odergren T., Ashwood T., “Animal models of stroke: do they have value for discovering neuroprotective agents?”, TRENDS in Pharmacological Sciences, vol. 24 (8), August 2003, pp. 402-8
 Croce P. Vivisection or science? An investigation into testing drugs and safeguarding health, London: Zed Books, 1999.
 Sandercock P., Roberts I., “Systematic reviews of animal experiments”, The Lancet, vol. 360, August 24, 2002, p. 586.
 Mamone-Capria M., “Pseudoscienza nella scienza biomedica contemporanea: il caso della vivisezione”, Biologi Italiani, June 2003, 33(6), pp. 10-27
 Connor S., “Glaxo chief: Our drugs do not work on most patients”, The Independent, 8 December 2003.
 Smith R., “The drugs don’t work”, BMJ, 13 December 2003, vol. 327
 Abbott A., “More than a cosmetic change”, Nature, 10 November 2005, vol. 438, pp. 144-6.
 Feynman R.P., “Surely You’re Joking, Mr. Feynman!”, Unwin Paperbacks 1986.
 Mignini L. E., Khan K. S. , “Methodological quality of systematic reviews of animal studies: a survey of reviews of basic research”, BMC Medical Research Methodology, 13 March 2006, 6:10.
Competing interests: None declared
Competing interests: None declared
University of Perugia, 06123 Perugia
Click to like: