Comparison of treatment effects between animal experiments and clinical trials: systematic reviewBMJ 2007; 334 doi: https://doi.org/10.1136/bmj.39048.407928.BE (Published 25 January 2007) Cite this as: BMJ 2007;334:197
All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
Perel et al report that randomisation and blinding were rarely
reported in animal experiments. This was our conclusion from a comparison
of clinical and laboratory studies in anaesthetic journals (1). Sample
size was also smaller (median 6 vs 19), and failure in some aspect of the
study protocol was less likely to be reported in laboratory studies than
in clinical studies(2 vs 43).
It would have been nice to have been cited.
1 Watters MPR, Goodman NW. Comparison of basic methods in clinical
studies and in vitro tissue and cell culture studies in three anaesthesia
journals. Br J Anaes 1999;82:295-298.
Author of a paper on a similar subject that was not cited.
Competing interests: No competing interests
We read with great interest the article by Perel et al. This article
is timely as it highlights the unsatisfactory surrogate of many animal
model systems for human disease. One particular area of medicine where
human research has needed models is for Barrett’s oesophagus, which is
perhaps the Western world’s commonest premalignant lesion 1. Unfortunately
the clinical evidence to assess the value of therapeutic interventions
like acid suppression takes many years 10 or more in large multi-million
trials like the ASPirin Esomeprazole Chemoprevention Trial (AspECT) 2.
Several animal models, especially rodents 3, 4, and even the dog 5, 6, are
frequently used. The trouble with these models is that the rodent foregut
is different from that in man. Furthermore some of these models give rise
to lesions that don’t recapitulate human precursors or histological
malignant lesions. These lesions often don’t even exhibit the same genetic
alterations in man and even between sub-species show great variability. In
addition the surgical rerouting of acid and bile causes simultaneous
alterations in hypoxic injury, inflammation, bacterial colonisation not to
mention unphysiological biochemistry. Moreover most models also rely on
the use of potent carcinogens that further perturb stem cell adaptive
responses. It is therefore important that animal models of human disease
should be characterised and graded according to their relevance across a
spectrum of parameters such as physiological, pathophysiological and
genetic determinants. In conclusion therefore while some surrogate in vivo
models may inform on the mechanisms of human as well as animal disease
many others are potentially a menace and may actually slow our progress.
1. Devesa SS, Blot WJ, Fraumeni JF, Jr. Changing patterns in the
incidence of esophageal and gastric carcinoma in the United States. Cancer
2. Jankowski JA, Moayyedi P. Aspirin as chemoprevention for Barrett's
esophagus: a large RCT underway in the UK (original research
correspondence). J Natl Cancer Inst 2004;96:885-7.
3. Pera, M., Brito, M. J., Poulsom, R., Riera, E., Grande, L., Hanby,
A., and Wright, N. A. (2000) Duodenal-content reflux esophagitis induces
the development of glandular metaplasia and adenosquamous carcinoma in
rats. Carcinogenesis 21, 1587-1591.
4. Su, Y., Chen, X., Klein, M., Fang, M., Wang, S., Yang, C. S., and
Goyal, R. K. (2004) Phenotype of columnar-lined esophagus in rats with
esophagogastroduodenal anastomosis: similarity to human Barrett's
esophagus. Lab Invest 84, 753-765.
5. Bremner, C. G., Lynch, V. P., and Ellis, F. H., Jr. (1970)
Barrett's esophagus: congenital or acquired? An experimental study of
esophageal mucosal regeneration in the dog. Surgery 68, 209-216.
6. Kawaura, Y., Tatsuzawa, Y., Wakabayashi, T., Ikeda, N., Matsuda,
M., and Nishihara, S. (2001) Immunohistochemical study of p53, c-erbB-2,
and PCNA in barrett's esophagus with dysplasia and adenocarcinoma arising
from experimental acid or alkaline reflux model. J Gastroenterol 36, 595-
Competing interests: No competing interests
Editor—The Perel et al. study, “Comparison of treatment effects between animal experiments and clinical trials: systematic review,” is narrow in both size and scope.1 It examines only six examples of one particular aspect of animal research. The authors themselves noted in the conclusion of the original version of this paper published on the University of Birmingham website that their sample size “was far too small to give precise statistical estimates of the extent of concordance” between human and animal studies.2
Unfortunately, small sample size was not the only constraining factor of this study. The authors were also limited by their narrow view of animal research. This paper only examined immediate preclinical testing of new drug therapies, but animal research aids medical science in many more ways. In addition to these tests, animal studies play a role in the initial development of candidate drugs, and the development and testing of medical devices (e.g. pacemakers) and surgical procedures (e.g. heart surgery). Even more vital, animal research informs clinical research by building the foundation of biological knowledge. Basic research that expands our understanding of how life systems function indicates to clinicians not only what direction to pursue but what directions are possible.
Although animal research informs clinical research, its circumstances and experimental goals differ from those of clinical research. Thus their protocols and experimental designs necessarily differ. Animal studies generally seek a mechanism of action for treatment, rather than treatment efficacy. They are usually conducted on defined, genetically homogenous subjects with near perfect compliance, as opposed to the large scale diversity of genetics and behavior of a clinical population. Some clinically necessary procedures, such as double-blinding, serve little purpose in an animal study, since rats are not susceptible to the placebo effect. Furthermore, accepted standards for animal welfare as well as many national and institutional protocols insist that sample sizes of animal studies be small. Despite these differences, the protocol used by Perel et al. to determine that the animal studies were of “poor” quality was based, for the most part, on standards meant for large clinical trials. 1
The authors also claimed in their ‘Methods’ section that they “were unaware of the results of the animal studies when selecting the six interventions.”1 However, the first Basic Principle of the World Medical Association Declaration of Helsinki states that clinical research “should be based on adequately performed laboratory and animal experimentation.”3 This means that for any clinical study, one may reasonably assume that animal studies showed a positive result. Otherwise, the clinicians would be ethically remiss for placing humans at risk. By only examining animal studies that advanced to human trials, the authors ignore the many other animal studies that stopped known hazards from being tested in humans.
Animal research may not be a perfect predictor of clinical results, but it is much better than going directly to human trials without any preliminary screening. Just as computer simulations and cell cultures reduce the number of animal studies that are necessary, animal studies hone the list of therapeutic possibilities further to a selection of reasonably safe expectations for clinical research. Will the authors next examine via systematic review the concordance of cell culture studies to clinical trials?
Timothy I. Musch, PhD
Kansas State University
College of Veterinary Medicine
Chair, Animal Care and Experimentation Committee, American Physiological Society
Robert G. Carroll, PhD
East Carolina University
Brody School of Medicine
Armin Just, MD PhD
University of North Carolina at Chapel Hill
Department of Cell & Molecular Physiology
Pascale H. Lane, MD
University of Nebraska Medical Center
Department of Pediatrics
William T. Talman, MD
University of Iowa College of Medicine
Department of Veterans Affairs Medical Center
Chair of the FASEB Animal Issues Committee
1. Perel P, Roberts I, Sena E, Wheble P, Briscoe C, Sandercock P, Macleod M, et al. Comparison of treatment effects between animal experiments and clinical trials: systematic review. BMJ, 2006; doi: 10.1136/bmj.39048.407928.BE (published 15 December 2006) ‹http://www.bmj.com/cgi/rapidpdf/bmj.39048.407928.BEv1›
2. Perel P, Roberts I, Sena E, Wheble P, Sandercock P, Mcleod M, Mignini L et al. Testing treatment on animals: relevance to humans. University of Birmingham Department of Public Health and Epidemiology Website. 2006 ‹http://www.pcpoh.bham.ac.uk/publichealth/nccrm/PDFs%20and%20documents/Publications/JH18_Final_Report_May_2006.pdf›
3. World Medical Association Declaration of Helsinki: Recommendations Guiding Medical Doctors in Biomedical Research Involving Human Subjects. U.S. Food and Drug Administration Website. 1989 ‹http://www.fda.gov/oc/health/helsinki89.html›
Competing interests: 1. Perel P, Roberts I, Sena E, Wheble P, Briscoe C, Sandercock P, Macleod M, et al. Comparison of treatment effects between animal experiments and clinical trials: systematic review. BMJ, 2006; doi: 10.1136/bmj.39048.407928.BE (published 15 December 2006)
Marco Mamone-Capria is right to question the relevance of animal
experiments to human health, and so is K. Archibald that it is clincal
research in humans that advances medical progress. Lawrence Altman MD,
medical correspondent to the new York Times, spent 30 years researching
medical discoveries and his book 'Who Goes First' writes; "Whenever a
doctor discovers a new drug, treatment or therapy, or tries to unravel a
mystery of physiology or disease, human experimentation is necessary. Even
after thousands of animal experiments our biologic uniqueness requires
human expermentation". Many other researchers are aware of the fallacies
of animal experiments but reluctant to denounce them for fear of
jeopardising careers of funding source. The bulk of research is funded by
the pharmaceutical industry which must produce a continual flow of new
drugs in order to survive. 'Alternative' methods of research are ignored
as this would limit the number of products reaching the market, thereby
affecting profits, the ultimate goal of research today. Considering the
increasing use of animals in research shouldn't we be seeing an
improvement in health? In fact the opposite is true. Years of experiments
on billions of animals has not produced a single cure, nor stemmed the
huge increase in diseases. In fact we have we have more sickness and
disease today than at any time in history and drug related diseases have
reached epidemic proportions. The statistics speak for themselves.
Competing interests: No competing interests
Marco Mamone-Capria's observations are clear and irrefutable - though
proponents of animal experimentation will undoubtedly refute them!
The waters of this debate have become muddied by the sheer volume of
publications making the implicit assumption that the animal experiments
involved in any particular field of research were predictive and were
therefore responsible for guiding the direction of that research. But it
is probably more often the case that clinical observation drives the
direction of research, for which 'validatory' animal experimentation is
often then awarded the credit.
A good illustration of this is provided by BMJ's Medical Milestones
supplement, celebrating key advances since 1840s. All 15 stories were an
enthralling read, with many important take-home messages. One lasting
impression left on me was how they were driven by the ability of the human
mind to harness both serendipitous and diligently-sought discoveries to
the purpose of solving human health problems, as perceived through careful
observation of patterns of disease and individual responses to disease.
The key observations and discoveries were necessarily clinical in
nature. Yet we are constantly told that 'virtually every medical
breakthrough has relied upon animal experimentation, without which medical
progress would cease.' In light of these 15 stories, that claim rings very
Of course, animal experimentation has been involved in much medical
progress - but has it been the driver of that progress and would future
progress be impossible without it? Of course not.
250 MPs and 83% of GPs in a nationwide survey (see
www.curedisease.net) want to see an independent scientific evaluation of
the clinical relevance of animal experimentation. Yet the Government
prefers the advice of those campaigning to prevent an evaluation which
would be in all our interests. As the champions of evidence based medicine
as a Medical Milestone point out: the results of evidence based medicine
often clash with the agenda of special interest groups.
Director of Europeans for Medical Progress: an independent organisation devoted to rigorous scientific analysis of animal experimentation to assess the balance of help or harm to human health: www.curedisease.net
Competing interests: No competing interests
In their valuable paper Perel et al.  present an outline of a
systematic review  concerning the degree of agreement between animal
experimentation (or “vivisection” [6, p. 30, n. 1]) and clinical trials in
six fields of medical inquiry:
(a) thrombolysis in acute ischaemic stroke (AIS),
(b) bisphosphonates to prevent and treat osteoporosis,
(c) antenatal corticosteroids to prevent neonatal respiratory
(d) corticosteroids in traumatic head injury,
(e) tirilazad in acute ischaemic stroke,
(f) antifibrinolytics in haemorrage.
The authors conclude that in cases (a), (b), (c) there was
«concordance» or «similar outcomes», while in (d), (e), (f) there was not.
Most disquietingly -- but not unexpectedly for whoever knew  and ,
or bothered to examine with some care a sample of the vivisectionist
literature -- in all six cases the quality of the animal experiments
examined turned out to be «poor». This was indicated by violation of such
basic requirements as randomization, adequate allocation concealment, and
blinded assessment of outcome (cf. table 1 in ).
Although this dire balance sheet is in itself a sufficient refutation
of the standard case for animal experiments as a scientific (rather than
psychological) foundation for clinical research, a closer examination of
the meaning of “concordance” and “discordance” is rewarding.
In case (a) the authors found evidence of publication bias (i. e.
articles with positive results more likely to have been published), which
implies that the available data are statistically hard to make sense of.
Indeed, a general difficulty facing systematic reviewers is that «The
proportion of the work that gets published in a form that is available to
the public (rather than just being available to industry and the
regulatory authorities) is unknown» ; a good idea would be to contact
the regulatory agencies but, to cite the authors’ own example, the Home
Office in United Kingdom has no accessible register of animal experiments
licensed under the Animals Act 1986 .
In case (c) there was concordance as to reduction of respiratory
distress syndrome in animals and in neonates, but not as to mortality,
which was reduced in neonates and yet was increased in animal models, and
significantly so in ewe models (the pooled odds ratio was 12.5, with 95%
confidence interval 1.9 to 79.2, «with no evidence of significant
heterogeneity [...]»). Clearly mortality is such a crucial parameter,
that, for those who take animal experiments seriously, results in even one
animal model indicating correlation of a treatment with increase in
mortality should lead to immediate withdrawal of that treatment from
clinical use and experimentation. For antenatal corticosteroids we know,
from the clinical trials, that this decision would have resulted in the
loss of human lives.
The authors in their full report cite, as evidence of «lack of
communication» between the animal experimenters and clinical researchers,
the «conclusion of one animal head injury study that “timely high-dose
dexamethasone treatment may help clinicians to manage head injury cases”»,
which was published six months after the publication of the CRASH trial.
The CRASH trial had shown that this class of treatments actually increased
mortality in head injury patients [2, p. 45].
While this example is enlightening as further evidence that animal
experimentation, unaided by previous knowledge of what is the correct
result that “must” be found, is no better -- and no less dangerous -- than
divination, one should keep track, nonetheless, of the extent animal
experimenters who “got it right” had read in their (perhaps suitably
“elaborated”) data what they knew clinical investigators had already
shown. For example, in the “concordant” cases (b) and (c), as the authors
inform us in the full report, «The animal research continued after
effectiveness in humans was established» . This raises further doubts
on the true degree and quality of the purported «concordance».
Let us clear the ground from a possible misunderstanding, often
lingering in the debate between supporters and critics of animal
experimentation. That some degree of “concordance” between animal
experiments and clinical trials will be found is no surprise at all. To
say that animal experiments are unreliable, as indeed  confirms they
are, does not mean that they give constantly the wrong indication. Were it
so, they would be exceedingly useful, since by systematically denying what
they indicate we should always get a correct answer to our medical
questions. Unfortunately, life is not so easy, and here lies precisely the
danger of all pseudoscientific practices .
The discordance in cases (d) and (e) is particularly worrying, as
animal experiments failed to show the increase in mortality in case (d)
and the risk of death and dependency in case (e). This means that by
relying on the “indications” provided by animal experiments lives have
In case (e), however, the authors try to show that the discordance
can be explained by using the results of “concordant” case (a). If
successful, this attempt might vindicate, at least in one example, the
image of animal experimentation as a potentially reasonable, self-
correcting endeavor. But is it so? Let us see.
According to the authors, the discordance in the tirilazad case
between animal experimentation on AIS and clinical trials may have arisen
from the difference in the time delays, between onset of AIS and
treatment, in humans (median 5 hours) and in animal models (median 10
minutes). This suggestion (also advanced in ) needs to be carefully
At a first level, the remark that excessive delay to treatment,
particularly in the case of AIS, might reduce the efficacy of a treatment
seems quite plausible. However, can the thrombolysis animal experiments,
where there was “concordance”, be used as a sufficient ground to think
that “similar” benefits (or lack of them) in animals and, respectively,
humans are obtained with quantitatively “similar” delays to treatment? In
fact, «there is some evidence that it might take many hours for damage to
develop in human brains» (). Moreover, as can be read in the complete
report [2, p. 26], in the tirilazad experiments «there was no significant
relationship between delay to treatment and efficacy», although «maximum
efficacy was seen when treatment was given before the onset of ischaemia,
with efficacy appearing to fall thereafter with time». Thus even after the
systematic review (, ), the evidence for prescribing similar delays
to treatment in animal experiments and in human trials in order to get
clinically relevant animal data is far from unambiguous. It is a prudent
bet that whatever lessons will be learned from the tilirazad failure --
and from the failure of more than 37 supposedly neuroprotective drugs
tested in more than 114 clinical trials after encouraging results on
animals  --, they will come from careful study of the human, rather
than animal, data.
In general, if one wishes to proceed in a truly scientific way, one
cannot but refer to the mechanism of the pathological process and to the
mode of action of the drug – if known. More precisely, if one can assume
(1) in both animal models and humans an essential component of the
pathological process acts by the same mechanism and
(2) with the same time scale, or otherwise with a known scale factor;
(3) the mode of action of the drug in both animal models and humans
is to interfere in the same way with that component of the pathological
process, and to affect in the same way any other organs involved;
then one might correctly draw from animal experiments likely
conclusions about the efficacy and adverse reactions of the drug on
Once the argument is made explicit it is easy to see what the trouble
is. The trouble is that even if we knew that (1) and (2) hold – and
unfortunately this is hardly ever the case, also because of the well-known
sensitivity of living organisms to small differences, both intraspecific
and interspecific ([6, pp. 15-32]) –- still (3) can be assessed only after
both the animal experiments and the human trials have been performed. To
put it shortly, we can judge the correctness of the inference from animals
to humans only with hindsight: “similarity” between animal data and
clinical results is normally the effect of a posterior and largely
arbitrary reconstruction. This means that animal experiments cannot be
used to (scientifically) predict results on humans. And yet the assumption
of predictive power is about the only justification of animal experiments
which can boast some degree of public acceptance.
3 Combining data from different species
A fundamental question which all systematic reviewers must face is
how to combine results from different animal species in a way that is
relevant to still another species. Two of the authors stated in 2002:
«Consistent results across species and models would provide some
reassurance that humans beings might respond in the same way» (; see
also ). What is the meaning of «across species»? In how many, or in
which, species and strains should one get consistent results in order to
be rationally (not merely psychologically) reassured? It is useful to
remember that, among mammalians only, about 4,237 different species are
known (I take the figure from a book published in 1994), and as already
mentioned there are many cases (discussed for instance in  and )
where inconsistency occurs even inside one species, with outcomes
depending on strains, sex of animals, food, cages, laboratory environment
etc. Taking into account comorbidities, although prima facie reasonable,
would only exponentially magnify the intractability of the problem.
I am sure this point need not be belabored, at a time when huge
resources are being invested in the development of drugs tailored on the
genetic profile of the individual (human) patients. The main reason for
such gigantic research effort is, of course, the recognition that most
drugs are effective only on a minority -- sometimes a very small one -- of
patients (, ).
It is on this very basic issue that science and vivisection part
ways. There is simply no scientifically validated procedure for combining
data from experiments on different non-human species and to get
information relevant to humans. The burden of the proof is on those who
claim that such a validation has ever been provided. For example, as ECVAM
(European Centre for the Validation of Alternative Methods) director
Thomas Hartung recently stated, «The [animal] toxicity tests that have
been used for decades are “simply bad science”» . There is no reason
to think that systematic reviews, in themselves, can remedy this
situation. For all we know, to combine results from several species might
be like the procedure used to solve the legendary problem of the length of
the nose of the Emperor of China, as described by physicist Richard P.
Feynman [11, pp. 295-6]: one asked Chinese people what they thought this
length was, and then averaged answers. All very well -- except that no one
in China had ever been permitted to see the Emperor.
In conclusion, it is not surprising that animal experimentation does
not inform human health care : there is no known scientific way in
which it could do that. The usefulness of systematic reviews of animal
studies lies in bringing home this important truth, while the suggestion
to adopt «an iterative approach to improving the relevance of animal
models to clinical trial design»  seems in the light of all the
available evidence unwarranted.
 Perel P., Roberts I., Sena E., Wheble P., Briscoe C., Sandercock
P., Macleod M., Mignini L. E., Jayaram P., Khan K. S., “Comparison of
treatment effects between animal experiments and clinical trials:
systematic review”, BMJ, 15 December 2006
 --, “Testing treatment on animals: relevance on humans”,
 Roberts I., Kwan I., Evans P., Haig S. “Does animal
experimentation inform human healthcare? Observations from a systematic
review of international animal experiments on fluid resuscitation”, BMJ,
vol. 324, 23 February 2002, pp. 474-6.
 Pound P., Ebrahim S., Sandercock P., Bracken M. B., Roberts I.,
“Where is the evidence that animal research benefits humans?”, BMJ,, vol.
328, 28 February 2004, pp. 514-7.
 Green A. R., Odergren T., Ashwood T., “Animal models of stroke:
do they have value for discovering neuroprotective agents?”, TRENDS in
Pharmacological Sciences, vol. 24 (8), August 2003, pp. 402-8
 Croce P. Vivisection or science? An investigation into testing
drugs and safeguarding health, London: Zed Books, 1999.
 Sandercock P., Roberts I., “Systematic reviews of animal
experiments”, The Lancet, vol. 360, August 24, 2002, p. 586.
 Mamone-Capria M., “Pseudoscienza nella scienza biomedica
contemporanea: il caso della vivisezione”, Biologi Italiani, June 2003,
33(6), pp. 10-27
 Connor S., “Glaxo chief: Our drugs do not work on most patients”,
The Independent, 8 December 2003.
 Smith R., “The drugs don’t work”, BMJ, 13 December 2003, vol.
 Abbott A., “More than a cosmetic change”, Nature, 10 November
2005, vol. 438, pp. 144-6.
 Feynman R.P., “Surely You’re Joking, Mr. Feynman!”, Unwin
 Mignini L. E., Khan K. S. , “Methodological quality of
systematic reviews of animal studies: a survey of reviews of basic
research”, BMC Medical Research Methodology, 13 March 2006, 6:10.
Competing interests: No competing interests