Rethinking credible evidence synthesis

BMJ 2012; 344 doi: (Published 17 January 2012)
Cite this as: BMJ 2012;344:d7898
  1. Peter Doshi, postdoctoral fellow 1,
  2. Mark Jones, statistician2,
  3. Tom Jefferson, researcher3
  1. 1Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
  2. 2University of Queensland School of Population Health, Brisbane, Australia
  3. 3The Cochrane Collaboration, Rome, Italy
  1. Correspondence to: P Doshi pnd{at}
  • Accepted 1 December 2011

After publication of a Cochrane review into the effectiveness of oseltamivir in 2009, the reviewers got access to thousands of pages of previously unavailable data. Peter Doshi and colleagues describe how it shook their faith in published reports and changed their approach to systematic reviews

Government regulators and systematic reviewers both aim to generate an accurate understanding of the effects of interventions. However, the methods they use, and the evidence they consider, are different. Although both focus on randomised clinical trials for determining safety and efficacy, the mostly academic community of systematic reviewers generally get their data from published reports of clinical trials. By contrast, regulators such as the US Food and Drug Administration (FDA) evaluate a far more diverse array of data including large, computerised datasets for each clinical trial as well as a trial’s protocol and clinical study report. This is probably the most complete, single report of a trial. (Regulators can also conduct clinical site inspections, request confidential and private details such as case report forms, and inspect manufacturing facilities.) This means that while regulators and systematic reviewers may assess the same clinical trials, the data they look at differs substantially. In our latest Cochrane systematic review of the neuraminidase inhibitor class of influenza antivirals, we used a new method based on data previously seen only by regulators—over 22  000 pages of clinical study reports and over 2700 pages of regulatory comments.1 The results changed our understanding of oseltamivir (Tamiflu) and our feelings about the viability of reliable evidence synthesis in general.

Obtaining clinical study reports

Our methodological focus on clinical study reports was enabled by an important historical precedent in response to our previous Cochrane update of neuraminidase inhibitors in 2009.2 After we concluded that the published evidence did not support the effectiveness of oseltamivir in reducing the complications of influenza such as pneumonia, the manufacturer, Roche, announced that full study reports would be made available on a password protected site to “physicians and scientists undertaking legitimate analyses.”3 Like many systematic reviewers, we had never heard of or seen a clinical study report, but a few weeks later we were in possession of 3195 pages of study reports for 10 treatment trials of oseltamivir.

Although Roche had promised full study reports, it provided only the first “module” of each report, whereas the table of contents indicated that the complete reports comprised 4-5 modules (fig 1). We obtained further details of the reports from the European Medicines Agency (EMA), which sent us 25 453 pages in response to a Freedom of Information request (table 1). The study reports that the EMA provided included module 2 (but not modules 3-5), which gives details of the trial protocol and amendments.

Fig 1 Table of contents for clinical study report (trial ID: WV15799)

View this table:

Table 1 Data from clinical study reports on oseltamivir obtained from European Medicines Agency

Are published papers trustworthy?

Inclusion of data from unpublished trials is important in evidence synthesis in order to reduce problems such as publication bias. Initially, we hoped to use the clinical study reports to supplement evidence from the published literature, since we knew that about 60% of patient data from phase III oseltamivir treatment trials is not published. However, discrepancies between the reporting of trials in clinical study reports sent to regulators versus published papers led us to lose confidence in the journal reports (table 2).4 In one case, the published version of a trial unambiguously states that “there were no drug-related serious adverse events,” while the clinical study report lists three that were possibly related to oseltamivir. (Roche has suggested that the non-reporting was justified.5)

View this table:

Table 2 Discrepancies in abstracting trial knowledge

Also, many details found in clinical study reports can simply never make it into journals because of space limitations. The degree of synthesis is evident from the differing length of reports for the same clinical trial. For example, the published version of one cardiac safety trial of 400 patients is seven pages long6 compared with 8545 pages for the full clinical study report.

We decided that to derive credible results, we needed more detail, not less. We therefore decided to review only those trials for which we could read the corresponding clinical study report.4

Clinical study reports: a new world of evidence

Entering this new world of evidence reminded us of a somewhat obvious truth: a trial is not a glossy journal reprint. It is a real life experiment done by humans on humans, full of rough edges and unexpected events that with varying degrees of precision may get planned, executed, recorded, classified, transcribed, inputted into computers, summarised, and analysed. Reporting of the trial does not produce one document but myriad outputs, such as conference posters, abstracts, promotional materials, journal articles, and clinical study reports, and the outputs are designed for multiple audiences (clinicians, regulators, the public, etc) (fig 2). Of central importance to evidence synthesis are the questions of what level of detail for trials is sufficient to draw robust conclusions about the effect of interventions and what resources are necessary to do so successfully. Gaining access to more than 22 000 pages of oseltamivir study reports and other regulatory documents allowed us a first attempt at answering these questions.

Fig 2 Raw data are transformed through relatively opaque processes of filtering, distillation, and synthesis into multiple reports of vastly different length and level of detail

In reading the reports, we realised how careful—even forensic—our approach would have to be in order to evaluate trials accurately. The fragility of basic assumptions surprised us. We lost faith in the idea that the many trials reported as “placebo controlled” were in fact controlled by inert, visually indistinguishable placebos. Clinical study reports showed that in many trials, the placebo capsules contained two chemicals (dehydrocholic acid and dibasic calcium phosphate dihydrate) that the oseltamivir capsules did not (fig 3). We could find no explanation for why these ingredients were only in the placebo, and Roche did not answer our request for more information on the placebo content.

Fig 3 Details of the formulation of study interventions (oseltamivir and placebo capsules) in clinical study report for trial WV15799 (*Ro 64-0796 is oseltamivir)

Members of the review team also identified an imbalance in the numbers of patients in each treatment arm who were classified as infected with influenza. Although participants had been randomised in the expected proportions (usually 1:1), those receiving oseltamivir had a 21% reduced odds of showing a fourfold increase in influenza antibody titre, suggesting that oseltamivir may inhibit the body’s ability to mount a normal immune response to influenza. This raises questions regarding oseltamivir’s true mode of action and probably makes all analyses of the population thought to be infected with influenza (the primary efficacy analysis population) as invalid because the groups are no longer comparable. If confirmed, the product information sheet’s statement that “treatment with Tamiflu did not impair normal humoral antibody response to infection”7 is wrong, with possible implications for vaccine administration. Not only were these details not present in any published paper we could find, but neither were they mentioned in the over 2700 pages of FDA and EMA regulatory reviews and comments available to us.

Post-protocol hypotheses versus data mining

Did the extra chemicals in the placebo capsules matter? Was the seeming effect of oseltamivir on antibody response real? These questions emerged from our findings during the review process, and to answer these questions would entail data analyses that we had not declared at the outset of our review.8 This presented a conundrum. One of the methodological strengths of the Cochrane approach is the prospective registration of a review protocol: a peer reviewed statement of intention detailing exactly how reviewers plan to conduct their review. It helps to keep reviewers honest and accountable. But the reality is that the process of reading lengthy and complex clinical study reports led to questions that were unforeseeable beforehand. And the only way to answer them was to conduct analyses we did not declare in our original protocol.

Wary of potential similarity between such post-hoc analyses and simple data mining in a data rich clinical study report, we carefully defined our post-protocol hypotheses before their formal analysis and reported all agreed analyses regardless of outcome. Furthermore, we clearly defined them as “post-protocol hypotheses” and treated them as amendments to our protocol in the same way clinical trial protocols are changed with formal amendments.

Some people may still find fault with this method, but the need for an iterative methodology seems clear. We suspect that an optimal solution to this “catch 22” may take some time to establish as access to data increases amid worries of irresponsible data mining.

Human resources problem

The EMA has announced that it will make clinical study reports publicly available after it has made a regulatory review decision “over the next few years.”9 10 This raises the possibility that systematic reviews like ours could soon become routine. But even if all clinical trial data of sufficient detail were available to reviewers, the problem remains of having the time and knowledge to review it. Systematic review methods are built on assumptions about the nature of data that are driven by what is available in journal publications; the universe of unpublished and regulatory data is by comparison uncharted since few investigators outside of regulatory agencies are even aware of their existence. Even fewer are aware of their form, and we are only beginning to develop robust review methods to assess these documents.

Also, because systematic review teams often have tight deadlines, short cuts may be tempting. Our new Cochrane review update of oseltamivir engaged the equivalent of two whole time researchers (a junior and a senior) for 14 months. Although this was partly because of our unfamiliarity with reading clinical study reports, the main reason was the sheer volume of reading involved. When trial data are already available in tabular form—for example, in a computerised database of individual patient data—it is easier to run calculations than it is to verify the soundness of the underlying data. Recently, two independent investigators published estimates of oseltamivir’s effect on complications after obtaining “full access to efficacy and safety data” from Roche.11 (The study was requested by Roche after publication of our December 2009 systematic review.) But they apparently did so without obtaining a complete list of oseltamivir clinical trials or the trial protocols for the trials they did analyse, and they did not notice that complications like pneumonia lacked a standardised definition, making their calculations—apparently based on an analysis of electronic datasets12—questionable.13


A difficult but equally important hurdle relates to the fundamental purpose of evidence synthesis: to inform clinical, policy, and consumer decision making. Routine, in-depth access to clinical study reports raises hopes that outside researchers will be able to make trustworthy, unbiased assessments of the effects of medical interventions. But of what benefit is such research if it does not inform policy and practice? In the oseltamivir case, the gap between evidence and policy seems especially large.

In both the United States and United Kingdom, official pandemic plans assumed oseltamivir could halve complications and hospital admissions.14 15 16 This claimed effect, however, was based on a Roche supported meta-analysis by Kaiser et al.17 Although meta-analysis is generally thought to produce high quality evidence, eight of the 10 trials in the Kaiser analysis were unpublished.

In December 2009, we expressed serious doubts about the credibility of the evidence for oseltamivir because of the inaccessibility of these unpublished trials. Nevertheless, influential organisations such as the US Centers for Disease Control and Prevention (CDC) and European Centre for Disease Prevention and Control continued to cite the Kaiser et al meta-analysis.18 19 Neither agency seems to have done an independent analysis of all available evidence, even after Roche’s public offer to provide full clinical study reports. Their stance is more worrying given that another US agency unambiguously holds the opposite opinion. The FDA, which has reviewed the oseltamivir trial programme in perhaps more detail than anyone outside of Roche, states that “Tamiflu has not been shown to prevent such complications [serious bacterial infections].”7 The FDA even sent Roche a warning letter in 2000 instructing it to “immediately cease dissemination of promotional materials” containing “false or misleading” claims, including statements about a reduced risk of influenza complications.20 The FDA has, however, not challenged the CDC’s claims.

Although we are confident about the conclusions we have drawn from our review of thousands of pages of regulatory documents, many of our results are necessarily tentative. We cannot be certain whether the trial evidence supports the conclusion that oseltamivir reduces complications. Nor do we understand if and how oseltamivir changes immune response, and important questions about its mode of action remain. But we think a critical analysis of the full clinical study reports—comprising tens of thousands of pages we have not yet seen—may provide critical answers. The EMA has confirmed that it does not hold these additional study reports (email from Xavier Luria, 24 May 2011). However, we believe the FDA does hold these documents, and our pending Freedom of Information request of 21 January 2011 may finally provide them.

Open access to all relevant trial data is a necessity to make ethical decisions in healthcare. Defining the terms of “open access” and which data are “relevant” requires broad based discussion among all stakeholders. We think that full, unabridged, and properly anonymised clinical study reports would be a good start. But while we are debating these questions, signs abound that the data are slowly becoming more accessible—efforts to record the existence of all trials through central registries seem to be succeeding,21 and we are not the only independent investigators who have obtained detailed trial data.22 23 24 With the possible rush in detail, we will need new methods of reliable evidence synthesis to identify and appraise the really important information from that which is merely there. To this end we will continue to develop our Cochrane review.


  • 1997 First documented human cases of avian influenza H5N1 occur in Hong Kongw1

  • 1999 WHO publishes its first pandemic influenza plan,w2 written in collaboration with the European Scientific Working Group on Influenza (ESWI), a group “funded entirely by Roche and other influenza drug manufacturers”w3

  • 1999-2000 Oseltamivir approved by FDA for treatment and pre-exposure prophylaxis of influenza

  • 2000 FDA sends Roche a warning letter about “misleading efficacy claims” in Roche promotion material, including claims about a reduction in complications.20 Subsequent product labels state Tamiflu has not been shown to prevent serious bacterial infections.7 w4

  • 2002 WHO calls for nations to stockpile influenza antiviralsw5

  • 2002 EMA issues oseltamivir market authorisation throughout European Union

  • 2003 Kaiser et al meta-analysis concludes that oseltamivir reduces complications17

  • 2003 US adds oseltamivir to its strategic national stockpile15

  • 2003-04 Human cases of avian influenza H5N1 resurface in Asiaw6

  • 2004 US releases first major draft pandemic plan, citing Kaiser paper in support of claim that oseltamivir reduces complications15

  • 2004 WHO releases comprehensive guidelines on antivirals, citing Kaiser paper in support of claim that oseltamivir reduces lower respiratory tract complicationsw7

  • 2005 UK announces it will stockpile 14 million doses of oseltamivir

  • 2005 Media pressure on governments to prepare for a likely avian influenza pandemic increases after the ESWI influenza conference in Malta in September

  • 2005 US analysis finds evidence of personal stockpiling of oseltamivir consistent with spike in media references to avian influenzaw8

  • 2006 Cochrane systematic review claims oseltamivir reduces complications after including data from Kaiser et al meta-analysisw9

  • 2009 WHO declares new H1N1 influenza virus a “pandemic”w10

  • 2009 Paediatrician Keiji Hayashi contacts Cochrane Collaboration, raising questions about Kaiser meta-analysis2

  • 2009 Roche makes offer of data on condition that we sign a legal agreement promising not only confidentiality of the data but of the agreement itself. (We decline)w11

  • 2009 Cochrane review update finds insufficient evidence to conclude that oseltamivir reduces complications, after being unable to obtain data to verify Kaiser et al meta-analysis2

  • 2009 Roche says we were “wrong to exclude the data” that we could not obtain without signing a confidentiality agreement,w12 and promises to release “full study reports”3

  • 2009 Our team applies for and downloads 3195 pages of study reports, but each report’s table of contents shows that all reports provided are incomplete

  • 2010 Roche approaches Harvard School of Public Health professors Miguel Hernán and Marc Lipsitch to perform an independent analysis of the Kaiser et al dataset. Results are presented in December 2010 and published in June 201111 w13

  • 2010 EMA changes its policy on access to documents 9 10

  • 2011 We request oseltamivir and Relenza (zanamivir) clinical study reports from EMA (see

  • 2011 EMA sends the Cochrane group 25 453 pages of clinical study reports for 19 trials

  • 2012 Publication of updated Cochrane review based on clinical study reports and other regulatory documents1

Glossary: the scope of clinical trial data

Raw data comprise any records created in preparing for and carrying out clinical trials—trial methods with protocol, investigator notes, individual patient data, ethics committee reports, clinical case notes, management committee’s minutes, transcripts/videos of meetings, contracts, book keeping, etc

Clinical study reports are a distillation and summary of the raw data from a trial, but, importantly, are unabridged reports that can be thousands of pages in length. They should report a trial’s background and rationale, methods, results, and discussion and also include important study documents such as the analysis plan, randomisation schedule, study protocol (with a list of deviations and amendment history), detailed case histories for patients who have adverse events, example case report forms, and list of ethics committees who approved the research. The International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use has attempted to standardise the layout of clinical study reportsw14

Published trials are those for which a primary publication appears in the scientific press, typically written in the structure of Introduction, Methods, Results, and Discussion (IMRAD). Published trials usually fit on a few sheets of paper and as such are an extreme synthesis of the clinical study report

Unpublished trials are those trials for which no primary publication appears in the scientific press. Information may, however, appear as secondary publications in the so called grey literature, scientific congresses, conference posters, or other abstracted form

Abstracts are the most extreme, syntheses of a trial, usually not longer than 200 words, and are comparable in content and level of detail to conference posters

Individual patient data refer to datasets in which data are presented for each participant rather than in aggregate form. They are commonly stored in computerised databases

Regulatory data contextualise other types of data about a trial. They include, for example, correspondence between a regulator and drug manufacturer. This and other regulatory data such as regulators’ medical and statistical reviews can be essential to understanding individual trials within the context of a trial programme and can be explored through thematic analysis


Cite this as: BMJ 2012;344:d7898


  • We thank our coauthors of the Cochrane review of neuraminidase inhibitors. We also thank Yuko Hara for helpful comments on the draft manuscript. PD is funded by an institutional training grant from the Agency for Healthcare Research and Quality #T32HS019488.

  • Contributors and sources: TJ has 18 years’ experience as a Cochrane reviewer. MJ has 20 years’ experience as a statistician and has been a Cochrane reviewer for the past 6 years. PD joined the Cochrane review team in 2009. The idea for this article arose during our efforts to review documents obtained from European, US, and Japanese regulators.

  • Competing interests: All authors have completed the ICJME unified declaration form at (available on request from the corresponding author) and declare no support from any organisation for the submitted work. TJ was an ad hoc consultant for Hoffman-La Roche in 1998-99. He is occasionally interviewed by market research companies for anonymous interviews about phase 1 or 2 products unrelated to neuraminidase inhibitors.

  • Provenance and peer review: Not commissioned; externally peer reviewed.