Searching for unpublished data for Cochrane reviews: cross sectional studyBMJ 2013; 346 doi: http://dx.doi.org/10.1136/bmj.f2231 (Published 23 April 2013) Cite this as: BMJ 2013;346:f2231
- 1The Nordic Cochrane Centre, Rigshospitalet, Dept 7810, Blegdamsvej 9, DK-2100 Copenhagen Ø, Denmark
- 2Clinical Pharmacy and Health Policy Studies, University of California San Francisco, San Francisco, CA, USA
- Correspondence to: J B Schroll
- Accepted 26 March 2013
Objective To describe the experiences of authors of Cochrane reviews in searching for, getting access to, and using unpublished data.
Design Cross sectional study.
Setting Cochrane reviews.
Participants 2184 corresponding authors of Cochrane reviews as of May 2012.
Main outcome measure Frequencies of responses to open ended and closed questions in an online survey.
Results Of 5915 authors contacted by email, 2184 replied (36.9% response rate). Of those, 1656 (75.8%) had searched for unpublished data. In 913 cases (55.1% of 1656), new data were obtained and we received details about these data for 794 data sources. The most common data source was “trialists/investigators,” accounting for 73.9% (n=587) of the 794 data sources. Most of the data were used in the review (82.0%, 651/794) and in 53.4% (424/794) of cases data were provided in less than a month. Summary data were most common, provided by 50.8% (403/794) of the data sources, whereas 20.5% (163/794) provided individual patient data. In only 6.3% (50/794) of cases were data reported to have been obtained from the manufacturers, and this group waited longer and had to make more contacts to get the data. The data from manufacturers were less likely to be for individual patients and less likely to be used in the review. Data from regulatory agencies accounted for 3.0% (24/794) of the obtained data.
Conclusions Most authors of Cochrane reviews who searched for unpublished data received useful information, primarily from trialists. Our response rate was low and the authors who did not respond were probably less likely to have searched for unpublished data. Manufacturers and regulatory agencies were uncommon sources of unpublished data.
Selective reporting of trials is common.1 Thus despite the existence of hundreds of thousands of published randomised trials and thousands of updated Cochrane reviews, the true benefits and harms of many interventions are still unknown.
Recent studies have reported successes in obtaining details, including results, of unpublished clinical trials from licensing authorities and health technology agencies.2 3 4 These sources have the potential to reduce reporting biases in reviews of drug interventions. The inclusion of unpublished or inadequately reported data in meta-analyses generally leads to more reliable effect estimates.5 However, only a little over 10% of the Cochrane reviews from 2000-06 included unpublished trials.6
Unpublished data include complete trials that have never been published as well as specific outcomes that are not reported in published trials. For this study we considered data published even if published only in conference abstracts, research reports, and dissertations.
The Cochrane handbook suggests searching for unpublished data from the following sources: local experts, pharmaceutical companies, national and international trial registers (for example, clinicaltrials.gov), company trial registers, subject specific trial registers, and trial results registers.7 Regulatory agencies are not mentioned in the handbook and no guidance is given on how to obtain data or protocols from such agencies. It is also unclear how the different sources should be prioritised—that is, which sources are most likely to supply useful data.
Many review authors have obtained unpublished trial protocols, reports, additional summary data, or individual patient data from a variety of sources. We provided an overview of the experiences of Cochrane review authors in searching for, getting access to, and using unpublished information from trials.
We conducted an online survey of corresponding authors of Cochrane reviews and protocols. The survey contained closed and open ended questions.
We gathered information on how trial characteristics and data were obtained, types of data (for example, whole trials, missing outcomes, and additional analyses), difficulties encountered, and how the data were used. Our previous experience suggested that unpublished data are obtained in a wide variety of—and sometimes unexpected—ways, indicating that open ended and qualitative questions would provide useful information that we could not collect using only structured questions.
We retrieved a list of all corresponding authors of Cochrane reviews and protocols through the Cochrane Collaboration Information Management System (Archie). This information was imported into an online survey application (SurveyMonkey) and we invited all authors by email to participate. If the invitees did not respond within 10 days, we sent a reminder. A second reminder was sent after 20 days, and a final one after 30 days. Respondents who only partially filled in the survey also received a reminder. We collected data from the 21 May to 8 August 2012.
We reported frequencies of responses for each question response choice. Partial responses were also included. During data collection, but before we analysed the data, we hypothesised that drug manufacturers might differ from the other data sources. We used a χ2 test to compare the proportions of the recorded characteristics between manufacturers and non-manufacturers. We dichotomised scales with more than two categories.
The survey was tested by 10 pilot testers. Their comments were implemented in the final version.
The respondents were asked to answer the questions in relation to a review in which they had been directly involved. If the respondents had been involved in several reviews, they were encouraged to choose one that included searching for unpublished data and that had resulted in experiences that could possibly benefit other review authors. Respondents who did not search for or obtain unpublished data were asked to give a reason. Respondents who did search for and obtain data were asked to provide a citation for their work and to state their primary source of unpublished data. They could choose between manufacturers, regulatory agencies, investigators, commercial and non-commercial trial registers, funders, ethics committees, and others. For their primary source they were asked to provide a name, year of query, number of attempts at getting the data, delay until the data were obtained, method of communication, reasons for thinking that data might be available, details on the data obtained, and whether the data were used in their review. Respondents could also provide information on secondary sources of unpublished data. Finally, they were asked whether they investigated a drug intervention, what the biggest difficulties were in obtaining unpublished data, and if they had any additional comments. The survey contained 82 questions but took less than five minutes to complete, as not all questions were relevant for each specific case (see supplementary file).
We sent the questionnaire to 5915 corresponding authors of Cochrane reviews and protocols; 2184 replied (response rate 36.9%), 1889 of whom completed all questions in the survey (figure⇓). Most of the dropout occurred when the respondents were asked to provide a citation to the work they authored (n=194).
Percentages can add to more than 100%, as several of the response options were not mutually exclusive. Of the 2184 respondents, 528 (24.2%) did not search for unpublished data. The reasons, “not expecting success,” “not expecting reliable data,” and “too time consuming,” each accounted for around 20% of the replies (table 1⇓). The most common reason given was “other” (n=265, 52.4%) of which the majority specified that the review was still in an early phase and that the search had yet to be performed (n=177). Box 1 lists other reasons, which can be categorised into the following groups: only wanted to include published data and therefore deliberately chose not to search for unpublished data (11 authors), found published data and therefore did not think it was necessary to search for unpublished data (n=5), did not know how to search for unpublished data (n=26), thought searching for unpublished data was the responsibility of the trial search coordinator in the Cochrane review group in question (n=5), tried to search but failed (n=12), and simply stated it was not relevant (n=14).
Box 1: Quotations to highlight reasons for not searching for unpublished data
One of our inclusion criteria is that the data must be published
We are still in the process of data extraction of papers, if we do get very little publications we might think about unpublished data
Was not aware there was any unpublished data in my topic area
Did not know it was possible to get
Haven’t tried yet. Assumed this was done as part of the searching process done by the Cochrane group searcher
Where I have previously asked for unpublished data, authorship has been requested
Among the 1656 authors who did search for unpublished data, 730 (44.1%) never obtained any, 913 obtained data, and 13 did not reply to this question. The most common reason for not obtaining data was never receiving a response (66.2% of 717; an additional 13 did not specify a reason, table 1). The second most common reason was that the contacted person did not have the data (39.3% of 717). By analysing the comments in the “other” category, we found additional common reasons for not obtaining data: no unpublished studies were found, investigators were reluctant to release data until the study was published, commercial confidentiality, promised data but never delivered, and author’s contact information could not be found. In some instances, authors only wanted to deliver data if they became coauthors of the review. See box 2 for more examples.
Box 2: Quotations from “other” category for reasons why data were never obtained
Got a response to say they would look for data, but then no further response
Drug company responded that data were confidential
Drug company stated it could not be used for research, only for formulary decision making
Said they were preparing for future publication
Respondent said they did not think the information I requested was relevant/helpful to the review question (this was a drug sponsored trial for which I requested subgroup data)
Respondent said it was unnecessary for my clinical question
Big pharma said they didn’t regard the question of sufficient clinical value to warrant disclosing the data
It was too long since the original studies were published. Some authors were uncontactable and we had answers from authors who had thrown away the data we needed
A total of 676 respondents gave details on 794 sources that provided data. The most common data source was trialists, accounting for 73.9% of the 794 (table 2⇓). Only 6.3% of the data came from manufacturers, 3.0% from regulatory agencies, and 6.3% from non-commercial trial registers. The “other” category accounted for 8.3% (n=66), where the most common sources were dissertations and conferences (which these authors regarded as unpublished, contrary to our definition of published data). Journal editors, Cochrane review groups, the World Health Organization, librarians, consumer support groups, and Google searches also contributed. The respondents did not contact any sources not already listed in the Cochrane handbook.6 The most common regulatory agencies that provided data were the Food and Drug Administration (n=11) and the European Medicines Agency (n=4).
The most common way to approach sources of information was by email (table 1). Using websites was the most common approach specified by respondents that chose the “other” category. In 75.2% of the 794 cases, 1-3 contacts were enough, but in 6.4% of the cases (n=51) more than 10 contacts were necessary to get the data (table 1).
Unpublished data were provided in less than a month in 53.4% (n=424) of the cases (table 1), but in 9.1% (n=72) of the cases, the authors had to wait for more than six months. The most common reason why authors contacted a specific source was that they knew a trial had been conducted (61.1% of the 794, table 1). The idea to contact a specific data source only came from the Cochrane handbook in 4.2% of the 794 cases. Authors quite often specified that they learnt about unpublished data at conferences, either through personal contacts or abstracts. Trial registers were also used to identify unpublished studies or missing outcome data. Published papers with poor reporting could also be used to find missing outcome data by contacting the authors of the papers. Other sources were whistleblowers, peer reviewers who drew attention to unpublished trials, and meta-analyses of unpublished trials, sometimes done by the manufacturers—for example, pooling premarket studies to increase power. One respondent routinely contacted all manufacturers of a drug and another respondent always approached corresponding authors to confirm the validity of data extraction and to query unpublished data.
In 44.3% of the cases where data were obtained (295/666), the authors investigated a drug intervention. The time involved in searching for unpublished data was the most challenging element (41.0% of 666, table 1). Poor organisation and readability of the data was challenging for 20.9% and 9.8% of the 666 respondents who obtained data, respectively. Thirty seven per cent had no problems and 16.4% specified challenges not covered by the standard answers. The most common reasons were that the authors did not receive data or did not receive a reply (see examples in box 3).
Box 3: Quotations about main challenges in incorporating unpublished data in reviews
Data for one trial were provided in an old database format that was very difficult to access and navigate
Delineating what was useable from what was not, especially as we were not replying on study design as a filter—this made it a nightmare
We just weren’t sure what we had been sent was right there were discrepancies between published report and data provided. When I asked the author for clarification they did not respond
In the one case that I received IPD [individual patient data], I didn’t use this because the amount of data was overwhelming and would have taken too much of my time to decipher
The most common outcome data authors obtained were unpublished summary data from already published trials: this was supplied by 50.8% of the 794 data sources that provided data (table 3⇓). Missing data on outcomes (28.5%) and individual patients (20.5%) was also common. Data on harms were rare (8.4%). A total of 17.5% of the 794 cases had obtained “other” data, which were mostly data on methodological quality (randomisation, blinding, etc). Respondents also acquired subgroup analyses, theses, information about ongoing trials, and reports of protocol modifications that had not been reported. However, some data were partial, redacted, and subject to confidentiality agreements. Most used the acquired data in their review (82.0% of 794, table 1). The most common reason for not using the data was that they were in an unusable form (6.3%). Eight per cent chose “other” and the majority specified they had not used the data because their review was ongoing.
In around a third (267 of 794) of the cases the authors got information from previously unpublished trials, and in around two thirds (562 of 794, not mutually exclusive) they got additional data from already published trials. It could be suspected that the strategy for accessing unpublished trials compared with unpublished data differed. We performed a post hoc subgroup analysis and found no difference between the number of contacts needed before the author received data.
Drug and device manufacturers
The authors who obtained data from drug and device manufacturers were more likely to have to contact them 10 or more times compared with authors obtaining data from other sources (24% v 5%, P<0.001, table 4⇓). They also more frequently waited for more than one month (74% v 45%, P<0.001), and more frequently the contact was in person or by telephone (36% v 13%, P<0.001). Manufacturers less frequently supplied individual patient data than other sources (12% v 26%, P=0.02). Data from non-manufacturers were more often used and it was more common that the respondents reported that there were no difficulties compared with manufacturers. However, these differences were not significant (P=0.07 in both instances).
A large proportion (around three quarters) of Cochrane review authors searched for unpublished data. A large fraction of those who did not search for unpublished data did so because their work was still ongoing, but another large fraction abstained from searching because they did not expect success. Searching for unpublished data from already published trials is problematic because authors may be difficult to locate and rarely respond.8 Around 20% of authors refrained from searching unpublished data because they did not expect them to be reliable.
In our survey, 55.1% of those who searched for data obtained them and most (82.0%) used these data in their review. This suggests that the methodological rigor (or quality) of the data are adequate even though some of the data were of a nature where risk of bias assessment was pointless (additional point estimates, standard deviations, etc). Other studies have also evaluated the reliability of published and unpublished trials without finding differences.9 10
The respondents’ last common concern was that searching for unpublished data was time consuming. This is not necessarily the case. One study8 found that when the source was contacted by email, the reply arrived within a median of one day. In our survey, more than half of the authors had received their data within a month. But even though it might be time consuming, completely omitting searching for unpublished or inadequately reported data is a risky strategy, as such data will generally be less positive than published data.1 2 4 5 11 Several respondents refrained from searching unpublished data because they found published studies, but this strategy cannot be recommended as, on average, it leads to biased reviews.
Some respondents thought that the trial search coordinator in Cochrane review groups searched for unpublished data, which may not be the case. Trial registers should always be consulted and this could be done by the search team. However, querying authors for missing outcomes or missing data and additional studies can only be done by the authors of the review, who have in-depth knowledge of the literature. Lastly, some respondents abstained from searching because of previous demands for authorship. This is, hopefully, rarely the case and should not discourage authors from searching for unpublished trials.
Almost half of the respondents who sought unpublished data obtained none. The most common reasons were that they never received a reply or were told that no data were available. Another 54 were told it was too much trouble to deliver the data. In a few cases, confidentiality and lack of interest in helping were the obstacles. We have experienced a drug company that only wanted to deliver unpublished data to a Cochrane review if they saw the draft manuscript. This was obviously unacceptable as the delivery of data should not depend on what the drug company or any other data source thinks about the preliminary results.
It was surprising that only 6.3% of authors got data from drug and device manufacturers. When our respondents tried to obtain data from manufactures they experienced longer waits, received fewer individual patient data, needed to make more requests, encountered more difficulties, and were less likely to be able to use the data. Owing to the low response rate in this study these associations should be interpreted with caution. It is nevertheless of concern that only 6.3% of authors received data from manufacturers as a large proportion of research funded by drug manufacturers remains unpublished.12 The respondents who were successful more often contacted manufacturers by telephone or verbally (36%) than they did the non-manufacturers (13%). From respondents’ comments we learnt that authors often knew that manufacturers had data because one of their own authors had been involved in the trials. On at least two occasions, respondents were told by drug manufacturers that their clinical question was not sufficiently relevant for the data to be released (see box 2). It has been well documented that manufacturers often refuse to share data.13
We had expected that research ethics committees and funders would rarely be a source of information, but it was unexpected that company owned trial registers and non-commercial trial registers in particular were also rarely a source of information (0.9% and 6.3%, respectively). The company owned trial registers might not contain relevant information, and non-commercial trial registers should be used more.
The authors often became aware of unpublished data through colleagues and websites. Only 4.2% got the idea from the Cochrane handbook to ask a specific source for data, despite the handbook containing a detailed section about searching for unpublished data.
Regulatory agencies are uncommon sources of data even though the FDA website has contained a lot of valuable data for decades, and even though the EMA opened up its archives in 2010.14 Among the respondents who searched for data in 2011 and 2012 only 5% got data from regulatory agencies compared with 3% for our entire population.
Among the 24 authors who obtained data from regulatory agencies, only seven got full reports and only one unique review incorporated the data in the review. Some authors might not be aware of the amount of accessible data at regulatory agencies. We therefore suggest that the Cochrane handbook should mention regulatory agencies as a source of unpublished data and provide specific guidance on how to search the websites of the FDA and EMA as they are difficult to access.
Almost 21% of authors got individual patient data, primarily from trialists. Authors should be encouraged to request this type of data, and it can probably be done without compromising the response rate.15
How to obtain data
The respondents in our survey most commonly sent emails to corresponding authors, agencies, and companies, and this has also given the best response rates previously.15 A combined approach with both email and letter might be even better.15 Asking specific compared with open ended questions improves response rates.15 This was also the experience of several of our respondents.
Guidance on how vigorously authors should search for unpublished data is needed. Our results suggest that authors should routinely ask trialists for more data when conducting a review. Searching a trial register (clinicaltrials.gov or similar) is also a good idea and is not time consuming. For data on drugs and devices, we suggest that the authors contact the regulatory agencies first, as it is time consuming and generally disappointing to go to the manufacturers.
Limitations of this study
The response rate in our study was low. A substantial part of our emails may have reached inactive email boxes or been caught by spam filters. We sent our survey to busy authors, and previous research has shown that time is a barrier and some people routinely bin surveys.16 Our respondents were probably more likely to search for unpublished data than the authors who did not reply. Our sample might therefore not be representative of all authors of Cochrane reviews.
Most authors who searched for unpublished data received useful data, primarily from trialists. Manufacturers and regulatory agencies were seldom sources of unpublished data.
What is already known on this topic
Unpublished data are less positive than published data
Omitting unpublished data in meta-analyses can bias the results
What this study adds
Authors of Cochrane reviews often search for unpublished data (75.8% in our sample) and around half of the authors who did, succeeded
Drug and device manufacturers infrequently provide data
Drug regulatory agencies should be used more
Cite this as: BMJ 2013;346:f2231
Contributors: PCG and LB conceived the study. The protocol was drafted by JBS; LB and PCG contributed. JBS created the online survey, sent out the invitations, and tabulated the data. All authors analysed the data. JBS drafted the manuscript; PCG and LB contributed. All authors had full access to all the data in the study. JBS is guarantor and takes responsibility for the integrity of the data and the accuracy of the data analysis. All authors have approved the final manuscript.
Funding: This study was funded by the Cochrane Collaboration Methods Innovation Fund. The funder had no influence on the study design, interpretation of data, or the decision to publish the results.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years, no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: This study was certified as exempt from human subjects review by the University of California human research protection programme (reference No 037504).
Data sharing: Anonymised datasets are available on request from the corresponding author at [email protected].
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/.