Assessment of publication bias, selection bias, and unavailable data in meta-analyses using individual participant data: a database survey

BMJ 2012; 344 doi: http://dx.doi.org/10.1136/bmj.d7762 (Published 3 January 2012)
Cite this as: BMJ 2012;344:d7762
  1. Ikhlaaq Ahmed, postgraduate student1,
  2. Alexander J Sutton, professor of medical statistics2,
  3. Richard D Riley, senior lecturer in medical statistics3
  1. 1MRC Midlands Hub for Trials Methodology Research, School of Health and Population Sciences, University of Birmingham, Birmingham B15 2TT, UK
  2. 2Department of Health Sciences, University of Leicester, Leicester LE1 7RH, UK
  3. 3School of Health and Population Sciences, University of Birmingham
  1. Correspondence to: R D Riley r.d.riley{at}bham.ac.uk
  • Accepted 7 November 2011

Abstract

Objective To examine the potential for publication bias, data availability bias, and reviewer selection bias in recently published meta-analyses that use individual participant data and to investigate whether authors of such meta-analyses seemed aware of these issues.

Design In a database of 383 meta-analyses of individual participant data that were published between 1991 and March 2009, we surveyed the 31 most recent meta-analyses of randomised trials that examined whether an intervention was effective. Identification of relevant articles and data extraction was undertaken by one author and checked by another.

Results Only nine (29%) of the 31 meta-analyses included individual participant data from “grey literature” (such as unpublished studies) in their primary meta-analysis, and the potential for publication bias was discussed or investigated in just 10 (32%). Sixteen (52%) of the 31 meta-analyses did not obtain all the individual participant data requested, yet five of these (31%) did not mention this as a potential limitation, and only six (38%) examined how trials without individual participant data might affect the conclusions. In nine (29%) of the meta-analyses reviewer selection bias was a potential issue, as the identification of relevant trials was either not stated or based on a more selective, non-systematic approach. Investigation of four meta-analyses containing data from ≥10 trials revealed one with an asymmetric funnel plot consistent with publication bias, and the inclusion of studies without individual participant data revealed additional heterogeneity between trials.

Conclusions Publication, availability, and selection biases are a potential concern for meta-analyses of individual participant data, but many reviewers neglect to examine or discuss them. These issues warn against uncritically viewing any meta-analysis that uses individual participant data as the most reliable. Reviewers should seek individual participant data from all studies identified by a systematic review; include, where possible, aggregate data from any studies lacking individual participant data to consider their potential impact; and investigate funnel plot asymmetry in line with recent guidelines.

Introduction

Meta-analysis combines the quantitative evidence from related studies to summarise a whole body of research on a particular clinical question, such as whether a treatment is effective. A known threat to the validity of meta-analysis is publication bias, which occurs when studies with statistically significant or clinically favourable results are more likely to be published than studies with non-significant or unfavourable results.1 2 3 4 Other related biases exist on the continuum towards publication,5 such as time lag bias6 7 (where studies with unfavourable findings take longer to be published), language bias8 (where non-English language articles are more likely to be rewritten in English if they report significant results), and selective outcome reporting9 (where non-significant study outcomes are entirely excluded on publication). All these biases lead to meta-analyses which synthesise an incomplete set of the evidence and produce summary results potentially biased towards favourable treatment effects.10 11

Methods to detect publication related biases and assess their potential impact have been well documented for meta-analyses that use extracted aggregated study results (such as treatment effect estimates).2 4 12 13 14 However, there are relatively few articles that consider biases for meta-analyses that use individual participant data,15 16 17 18 where the raw, individual level data are obtained for each study and used for synthesis. Individual participant data can be considered the original source material, and—as it allows trial results to be derived directly and independent of study reporting—it (theoretically at least) has potential to reduce publication bias in meta-analysis, especially when it is obtained for unpublished trials.18 For this, and many other reasons documented previously in the BMJ,16 meta-analyses using individual participant data are generally considered the most reliable approach to evidence synthesis,15 19 20 21 22 but this does not guarantee they are bias-free.

When reviewers identify and seek individual participant data from only published trials, publication related biases can affect the subsequent analysis. Burdett et al23 found that meta-analyses of individual participant data tended to give more favourable treatment effects when excluding data from trials in the “grey literature” (that is, unpublished trials, trials published in non-English language journals, and trials reported as meeting abstracts, book chapters, and letters). But publication related biases are not the only mechanism that may cause an incomplete and potentially biased set of evidence within meta-analyses of individual participant data; two further concerns are data availability bias and reviewer selection bias.

Data availability bias may occur if individual participant data are unavailable for some studies and their unavailability is related to the study results.24 As with publication bias, this situation leads to a set of available studies that do not reflect the entire evidence base. The impact of availability bias is hard to predict. If researchers of studies with non-significant or clinically unimportant results are more likely to have destroyed or lost their individual participant data, this will bias meta-analyses toward a favourable treatment effect. Conversely, if researchers of studies with favourable findings do not provide their individual participant data because they want to use them further—for example, to examine subgroup effects or an extended follow-up—this may lead to meta-analyses being biased towards a lower treatment effect.

Reviewer selection bias can occur if reviewers deliberately seek only individual participant data from a subset of existing studies and this subset does not reflect the entire evidence base.25 This is a particular concern when relevant studies are not identified by a systematic review but rather through contacts or friends in their research field, and when the selection takes place with knowledge of individual study results. The impact of selection bias on a given meta-analysis could vary, and may (directly or indirectly) be affected by the selectors’ knowledge of the subject, their research contacts and existing collaborations, and their informed opinion about the research question of interest. Note that agreement to pool individual participant data before knowing the results of studies is less of a concern, and collaborations towards meta-analysis beginning at the onset of individual studies have been encouraged under the term “prospective meta-analysis.”26

The aim of this article was to survey recently published meta-analyses of individual participant data to empirically examine the potential for publication bias, data availability bias, and reviewer selection bias. We then investigated whether the authors of the meta-analyses seemed aware of these issues. We have used two case studies from our survey to show how such biases may affect clinical conclusions.

Methods

Identification and classification of relevant articles

We used an existing database of 383 meta-analyses of individual participant data published between 1991 and March 2009. This database was established using a systematic review of published articles in Medline, Embase, and the Cochrane Library as described elsewhere,16 that aimed to identify all published meta-analyses of individual participant data. We searched the database to identify recent meta-analyses of randomised controlled trials. We focused on meta-analyses published between 2007 and March 2009 that aimed to establish whether an intervention was effective. Articles synthesising observational studies or a mixture of randomised trials and observational studies were excluded, as were those synthesising randomised trials but not evaluating an intervention effect (such as those investigating development of a prognostic model).

We decided a priori that a sample of about 30 articles would be suitable for uncovering whether the aforementioned biases are a concern and whether authors raise awareness of them. Using the article abstracts, IA classified all articles as a “meta-analysis of randomised trials,” “unclear,” or “not relevant.” IA took all the meta-analysis articles published in 2008 and 2009 and kept randomly sampling additional articles from 2007 until we had a total of 30 articles of a meta-analysis of randomised trials. RDR then checked all these classifications; any discrepancies between IA and RDR were resolved by discussion between all authors. Any articles classed as “unsure” by IA were discussed by all authors and a final classification decision made.

IA then obtained the full text of those articles classed as a meta-analysis of randomised trials and further classed each as “evaluating an intervention,” “unclear,” or “not evaluating an intervention.” As before, these classifications were checked by RDR and discrepancies resolved via discussion between all authors. This resulted in a final set of relevant articles.

Data extraction

For each relevant article IA read the full publication and extracted information to answer the following questions:

  • How did the reviewers identify the trials for which individual participant data were sought?

  • Did the authors seek to obtain grey literature trials, and how many (if any) were included in their primary meta-analysis?

  • What proportion of requested trials actually gave their individual participant data?

  • If relevant, what reasons were given as to why trials did not provide individual participant data?

  • If relevant, was the potential impact of trials not providing individual participant data considered in the primary meta-analysis and, if so, how and what was concluded? If not, was data availability bias raised as a potential concern?

  • Was the potential for publication bias considered in the primary meta-analysis and, if so, how and what was concluded? If not, was publication bias even discussed as a potential concern?

All extracted information was checked by either RDR or AJS.

Statistical assessment of publication bias

For each meta-analysis article containing at least 10 trials, we aimed to examine the potential for publication bias by using a contour enhanced funnel plot and a statistical test for asymmetry. A contour enhanced funnel plot displays trial treatment effect estimates (x axis) against some measure of their precision such as standard error (y axis). When no publication bias is present the plot should show a funnel-like shape, with estimates spanning down from the larger trials symmetrically in both directions with increasing variability. Asymmetry in a funnel plot (also known as small study effects27) is potentially indicative of publication biases,13 but other sources of heterogeneity may also induce asymmetry in a funnel plot.13 If there is asymmetry and studies are perceived to be missing in those contour regions of non-statistical significance, there is greater likelihood that the asymmetry is due to publication bias. For each funnel plot, we chose a test for asymmetry in accordance with recent recommendations,13 and a P value <0.10 was taken to indicate statistical evidence of asymmetry.

Results

The first author classified 73 articles until 30 were deemed a meta-analysis of individual participant data from randomised trials. The second and third authors checked all these 73 article classifications, and subsequently articles with a primary objective to evaluate an intervention were also identified. This produced a final set of 31 articles that were deemed relevant and included in our in depth assessment below.28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 The flow chart for the selection and classification of articles is shown in fig 1.

Fig 1 Flow chart for identification of relevant articles describing meta-analyses using individual participant data from randomised trials that evaluated an intervention

Publication bias

In our survey, nine of the 31 articles mentioned seeking individual participant data from trials in the grey literature; all nine reported success and included grey literature data in their primary meta-analysis. For the remaining 22 articles (71%) publication bias is more of a concern, as 20 sought individual participant data only from fully published trials and in the other two it was unclear whether individual participant data from grey literature was included.

Despite the threat of publication bias, only 10 of the 31 articles discussed (9 articles) or examined statistically (1 article) the potential for publication bias in their meta-analysis. Seven of these 10 articles infer that the threat of publication bias was low. For example, De Backer et al comprehensively use “a range of techniques to find unpublished trials: searches of trials registers, contacts with other researchers and contact with the manufacturing company. None of these approaches revealed evidence of unpublished trials … this is in contrast with documented publication bias for other products in this field.”51

Data availability bias

In our survey, 30 of the 31 articles stated the number of trials providing individual participant data out of the total number requested. Fourteen of the 30 reported obtaining data from all the trials requested; the mean percentage of trial data obtained was 87% and the median was 91% (range was 60–100%). Fourteen of the 30 reported obtaining individual participant data for fewer than 90% of the trials requested, and 10 (33%) of the 30 articles reported obtaining less than 80%. The reasons for unavailability of individual participant data included trial data being lost or destroyed and trial authors not being contactable, unwilling to collaborate, or unable to send their data.

Availability bias is thus a potential concern in the 16 meta-analyses that did not have all the individual participant data requested. Twelve of these reported the percentage of total patients across all trials covered by the individual participant data obtained. This ranged from 66% to 98%, with five (42%) of the 12 analyses obtaining individual participant data for less than 80% of the total patients. The proportion of the total events covered by the available individual participant data was rarely reported.

Five of the 16 articles with unavailable individual participant data (31%) never mention availability bias as a potential limitation, and only six (38%) examine statistically how trials lacking individual participant data might affect the conclusions of the meta-analyses presented. All these six conclude that including the trials lacking individual participant data does not change the statistical or clinical conclusions. For example, Vale et al obtained aggregate results for three of their 10 trials with missing individual data and conclude: “incorporating them into the meta-analysis did not materially change the results.”34

Reviewer selection bias

In our survey, 22 of the 31 articles performed a systematic review to identify all relevant trials, for which individual participant data were then requested; selection bias is thus not a concern in these articles. However, in the other nine articles (29%) selection bias is a potential issue, as the identification of relevant trials was either not stated (six articles) or based on a selective, non-systematic approach (three articles). For example, Sakamoto et al state that they used a “meticulous search” to identify the five trials in their meta-analysis,54 but this search is not described. In contrast, Papakostas et al clearly include only eligible studies sponsored by GlaxoSmithKline, but note the potential for other trials in their Methods (“To our knowledge, only two other studies comparing bupropion with an SSRI were not included”) and their Discussion (“it is quite possible that studies sponsored by other sources have been conducted but have not been yet published or presented at major scientific meetings”).55

Detailed investigation of biases

There were eight meta-analyses that contained 10 or more trials, and in four of these we could extract suitable information to investigate funnel plot asymmetry (potential publication bias). A test for asymmetry was significant (P<0.1) in one,50 and non-significant in the other three.34 47 52 We now take two of these (one without asymmetry47 and the one with asymmetry50) to show our funnel plot assessments in detail and to demonstrate a possible approach for dealing with trials lacking individual participant data.

High dose chemotherapy for treatment of non-Hodgkin’s lymphoma

Greb et al reviewed whether high dose chemotherapy with autologous stem cell transplantation as part of first line treatment improves survival in adults with aggressive non-Hodgkin’s lymphoma.47 By a systematic review, they identified 15 randomised trials comparing high dose versus conventional chemotherapy. They sought individual participant data from all 15 trials, so selection bias is not a concern. However, publication and availability biases are a threat, as all the trials were fully published and individual participant data were unavailable for five of them (33%). Greb et al examined both these issues,47 and we now summarise their work and extend it by examining contour enhanced funnel plots.

A fixed effect meta-analysis of the 10 trials with individual participant data gives a summary hazard ratio of 1.14 (95% confidence interval 0.98 to 1.34; I2=4%), providing weak evidence that high dose chemotherapy has a modest increase in the hazard of death over time (top part of fig 2). To investigate availability bias, Greb et al managed to extract hazard ratio estimates for four of the five trials lacking individual participant data.47 An updated, fixed effect meta-analysis of the 14 trials (10 with individual participant data, four without) now gives a summary hazard ratio of 1.05 (0.92 to 1.19; I2=30%), slightly closer to the null value of 1 since the trials without individual participant data have treatment effect estimates more favourable towards high dose chemotherapy than the trials with individual participant data (bottom part of fig 2), though nearly all trial confidence intervals overlap 1 (the value of no treatment effect). An alternative random effects analysis gives the same conclusion.

Fig 2 Fixed effect meta-analysis by Greb et al,47 which compared high dose with conventional chemotherapy for survival of patients with aggressive non-Hodgkin’s lymphoma: of the 14 trials included in the analysis, 10 provided individual participant data, and four provided only aggregate results

To investigate potential publication bias, we consider the contour enhanced funnel plot of the 10 trials with individual participant data plus the four trials lacking individual participant data (fig 3). Visually, this shows only minor asymmetry (with or without inclusion of the studies lacking individual participant data), and Egger’s test for asymmetry is not significant (P=0.14). Thus a publication bias mechanism is not a major cause for concern here.13

Fig 3 Contour enhanced funnel plot of the 14 trials included in the meta-analysis of Greb et al47: 10 trials provided individual participant data, four provided only aggregate results

In summary, the consideration of aggregate data from studies not providing individual participant data and the investigation of publication bias have strengthened the original clinical conclusion from the analysis of individual participant data only that high dose chemotherapy does not affect overall survival. Publication bias does not seem to pose a threat to this meta-analysis, and the pooled effect estimate moves slightly closer to 1 when those studies for which individual participant data were not available are considered.

Early glycoprotein IIb/IIIa inhibitors in primary angioplasty

De Luca et al performed a meta-analysis of individual participant data from randomised trials to evaluate the benefits of early versus late use of glycoprotein IIb/IIIa inhibitors in patients undergoing primary angioplasty for ST segment elevation myocardial infarction.50 A primary angiographic end point was whether patients achieved a preprocedural Thrombolysis in Myocardial Infarction Study (TIMI) grade 3 flow distal embolisation. A systematic review identified 14 relevant trials, and individual participant data were sought from them all, so selection bias is not a concern. However, availability and publication biases are a threat, as individual participant data were unavailable for three trials (21%), and all 11 trials providing individual participant data were fully published.50 De Luca et al did not consider statistically the potential impact of studies lacking individual participant data and did not investigate publication bias. We now extend their work accordingly.

A random effects meta-analysis of the 11 trials with individual participant data gives an odds ratio of 2.06 (1.48 to 2.86), with a 95% prediction interval for the odds ratio in an individual clinical setting from 1.03 to 4.89 (fig 4); this indicates that early use of glycoprotein IIb/IIIa inhibitors was associated with a significantly improved TIMI grade 3 flow. To investigate availability bias, we managed to extract odds ratios for two of the three trials not providing individual participant data (fig 5).56 57 Including them alongside the 11 studies with individual participant data in an updated random effects meta-analysis (fig 4) has a minimal impact of the summary odds ratio estimate (2.02 (1.45 to 2.81)) but increases the extent of between-trial heterogeneity (I2=40%), leading to a 95% prediction interval which now includes 1 (0.85 to 4.81), implying early use may not be superior in every clinical setting.58

Fig 4 Random effects meta-analysis of the 11 studies with individual participant data considered by De Luca et al50 (evaluating the effects of early or late use of glycoprotein IIb/IIIa inhibitors for patients achieving TIMI grade 3 flow after primary angioplasty) with investigation of the impact of two additional studies lacking individual participant data

Fig 5 Contour enhanced funnel plot for the 11 trials considered in the meta-analysis of individual participant data by De Luca et al50 plus two trials lacking individual participant data. The solid line indicates the summary result from a meta-analysis of just individual participant data trials (odds ratio 2.06); the dotted line indicates the summary result from a meta-analysis of individual participant data combined with aggregate data from two studies lacking individual participant data (odds ratio 2.02)

To investigate potential publication bias, we examined the contour enhanced funnel plot of the 11 trials with individual participant data plus two trials lacking individual participant data (fig 5). This shows asymmetry, with small studies systematically having larger effect sizes than the larger studies (Peters’ test for asymmetry, P=0.016). This potentially suggests missing studies on the (bottom) left hand side of the plot. Since such studies would predominantly be in the region of statistical non-significance close to an odds ratio of 1 (that is, no difference between early and late use of glycoprotein IIb/IIIa inhibitors) or less than 1 (that is, early use is not beneficial), this adds strength to the notion that publication bias mechanisms may be operating here, biasing the meta-analysis result in favour of early use. Indeed, when we use a regression method to adjust for this asymmetry,14 59 the adjusted summary odds ratio is 1.18 (0.79 to 1.76) and non-significant. The asymmetry remains (P=0.045) even when the FINESSE-ANGIO trial57 is removed, which De Luca et al suggested was of lower quality than the other trials in the meta-analysis.50

In conclusion, although De Luca et al performed a thorough systematic review (that included searching conference abstracts) and clearly raise awareness that trials lacking individual participant data were excluded, our investigations reveal additional heterogeneity when studies lacking individual participant data are included and an asymmetric funnel plot consistent with publication bias. These issues were not identified in the original publication by De Luca et al.50 In light of this, we recommend further research is needed to identify the causes of heterogeneity (perhaps factors such as study quality and study definitions of “early”) and establish whether they contribute to the asymmetric nature of the plot.

Discussion

Though they can be time consuming and expensive, meta-analyses of individual participant data have considerable potential advantages over a traditional meta-analysis of extracted aggregate data.16 These include the ability to use consistent inclusion-exclusion criteria and statistical methods in each trial; to use up to date follow-up information, which is potentially longer than that used in the original trial publications; to obtain results for unpublished or poorly reported outcomes; and to increase power to detect differential treatment effects (that is, subgroup effects, treatment-covariate interactions). For these reasons, meta-analysis of individual participant data is increasingly popular, with an average of 49 published a year between 2005 and 2009.16

However, our survey of existing meta-analyses of individual participant data from randomised trials shows that individual participant data from the grey literature are often not included, individual participant data are commonly unavailable, and a selective, non-systematic approach is sometimes used to identify relevant trials. These problems raise the threat of publication, availability, and selection biases, respectively, but many reviewers neglect to examine or discuss them. Such shortcomings warn against uncritically accepting all meta-analyses of individual participant data as optimal without due thought as to how the data were chosen, whether data from unpublished studies were obtained, and whether data were obtained from all studies requested.

Strengths and limitations of study

We recognise that our survey contained only a modest sample of 31 meta-analyses of individual participant data and that, as we did not question review authors directly, methodological deficiencies identified in the meta-analyses are impossible to disentangle from their reporting standards (for example, some reviewers may have investigated publication bias but not reported this). However, we consider our findings sufficient to show that there needs to be greater recognition and investigation of potential biases in meta-analysis of individual participant data.

Recommendations for avoiding and assessing bias in meta-analyses

In a text box we make recommendations for dealing with biases in meta-analyses of individual participant data. All such endeavours should be clearly reported in the publication describing the meta-analysis according to recent reporting guidelines.16 Though we have focused on meta-analyses of randomised trials, such guidance is also relevant to syntheses of individual participant data from observational studies.17 For example, funnel plot asymmetry has been shown in a meta-analysis by the Emerging Risk Factors Collaboration (ERFC), which included individual participant data from 31 studies of cardiovascular disease.17 Further, in a meta-analysis of individual participant data from studies of prognostic factors in lung cancer by Trivella et al,60 10 of the 38 research groups contacted did not provide the individual participant data requested.

Recommendations for avoiding and assessing publication related biases, data availability bias, and reviewer selection bias in individual participant data meta-analyses

  • Meta-analyses of individual participant data should ideally be informed by a rigorous systematic review that searches for both published and unpublished studies

  • Researchers should seek individual participant data for all relevant studies identified (or at least those of highest quality)

  • When some individual participant data cannot be obtained, the impact of this on meta-analysis conclusions should be investigated by means of including the aggregate data from the studies lacking individual participant data,24 65 66 though this may not always be possible (for example, if suitable aggregate data are not available or if individual participant data are required for complex statistical modelling)

    • This is especially important when the number of studies with individual participant data is small or the proportion of individual participant data missing is large (for example, when individual participant data for >10% of trials or >10% of patients or events in all the trials are unavailable)

    • Where the inclusion of studies lacking individual participant data seems to have an important statistical or clinical impact, it may be helpful to compare the characteristics of the studies with individual participant data and of those without and to see if there are any key differences (such as in their quality, follow-up length, statistical methods, etc)

  • The potential for publication bias should be considered, with assessment of funnel plot asymmetry (with and without studies lacking individual participant data) adhering to the guidelines published recently in the BMJ13

Our survey found that most (71%) articles do not include individual participant data from the grey literature, emphasising why obtaining individual participant data does not automatically remove the potential for publication bias in meta-analysis. It was disappointing to find that grey literature was sought in only 29% of the meta-analyses. In reviews that use extracted aggregated study results there is a similar problem: Song et al found that grey literature was explicitly sought in only 50% of treatment reviews, 30% of diagnostic reviews, 32% of risk-factor reviews, and 8% of genetic reviews, and furthermore, although 34% of 300 reviews explicitly searched for grey literature, only 13% included them.11

The potential for publication bias should thus be examined wherever possible in meta-analyses of individual participant data (box ). In particular, assessment of funnel plot asymmetry, and thus potential publication bias, should be routinely used in meta-analyses synthesising 10 or more trials, and we refer readers to more detailed guidelines in the BMJ about this.13 Our survey shows that funnel plot investigations are currently rare in meta-analyses of individual participant data and publication bias is often not even discussed. Publication bias is also often neglected in standard meta-analyses of aggregate data: for instance, a recent review found that only 7 of 75 Cochrane reviews investigated publication bias or explained why not,61 and the wider review by Song et al found that potential publication bias was discussed more often in genetic reviews (70%) than in treatment reviews (32%), diagnostic reviews (48%), and risk factor reviews (42%).11

For any meta-analysis, the aim should be to obtain individual participant data or suitable aggregate data for all trials rather than selecting a potentially biased subset.62 Meta-analyses need to be inclusive rather than exclusive; for example, a meta-analysis of individual participant data by the Early Breast Cancer Trialists’ Collaborative Group involved over 400 named collaborators,63 who commendably provided individual participant data for 42 000 women from 78 randomised treatment comparisons. To avoid reviewer selection bias, meta-analyses should ideally be informed by rigorous systematic reviews that search for published and unpublished studies, and we encourage researchers to seek individual participant data for all relevant studies identified (or at least those of highest quality). The possible exception to this is for trials where suitable aggregate data can already be extracted from trial publications, as, other things being equal (such as length of follow-up, number of included patients, etc), such aggregate data will be sufficient64 and so individual participant data are not needed.16 However, because of the advantages of having individual participant data, reviewers aiming to use individual participant data will usually prefer to obtain individual participant data for as many trials as possible.

Our survey found that 33% of the meta-analyses between 2007 and 2009 obtained less than 80% of the individual participant data requested. This builds on earlier reviews of availability of individual participant data,20 24 which found that 24% of 175 meta-analyses published up to 2005 obtained less than 80% of the individual participant data requested.24 Thus, there is no indication that availability of individual participant data is improving over time, though we note that the UK MRC Clinical Trials Unit seems more consistently successful.18 When reviewers are unsuccessful in obtaining individual participant data for some trials, it does not necessarily follow that a meta-analysis of the subset of trials with individual participant data is more desirable than a meta-analysis using suitable aggregate data from all trials. Indeed, the reviewers face a conundrum: the meta-analysis of individual participant data may be prone to data availability bias, but the meta-analysis of aggregate data from all trials may be limited by, for example, shorter follow-up time and inconsistent inclusion criteria and statistical methods in each study (the very reasons why the individual participant data were originally sought).

In such situations we recommend that, ideally, all synthesis options are reported and each of their limitations noted: that is, the meta-analysis of individual participant data from a subset of trials, the meta-analysis of aggregate data from all trials, and a meta-analysis that combines the individual participant data with the aggregate data from the trials lacking individual participant data. The last approach has been recommended to allow reviewers to investigate the potential impact of trials lacking individual participant data on the conclusions from the meta-analysis of individual participant data,24 and this was illustrated in our two detailed examples (figs 2-5), where we obtained suitable aggregate data from trials lacking individual participant data and added them to the meta-analyses and funnel plot assessments. Statistical approaches that synthesise both individual participant data and aggregate data are potentially valuable,24 65 66 though we recognise the extraction and inclusion of aggregate data become more difficult when going beyond the overall treatment effect, such as the assessment of differential treatment effects across individuals,65 and may only serve to amplify why individual participant data were desired.

It may also be worth comparing the characteristics (such as quality) of studies lacking individual participant data and those with individual participant data. For example, McCormack et al compared hernia trials that provided individual participant data with those not providing such data and concluded: “Other than the availability of unpublished data, there were no clear differences in trial characteristics between those with or without individual participant data.”67 We also did not identify any clear differences in the characteristics of studies with or without individual participant data in our two detailed examples (figs 2 and 4), but a broader investigation of any differences in a wide range of fields would be informative. In situations where there are differences (such as the studies lacking individual participant data being of poorer quality or having different inclusion criteria, statistical methods, etc) this may lead to different summary results and increased between-trial heterogeneity in a meta-analysis combining studies with and without individual participant data compared with an analysis of individual participant data only. In a sensitivity analysis reviewers could investigate whether any indication of bias (such as different sizes of estimates from studies with individual participant data and from those without, or evidence of funnel plot asymmetry, etc) remains when studies with individual participant data are standardised to match those lacking individual participant data as far as possible (such as in terms of length of follow-up, statistical analysis methods, inclusion criteria, etc).

Finally, we recognise it is clearly best to prevent biases occurring in the first place, so we strongly support calls for data sharing68 and transparency of research through study protocols, study registers,1 69 and complete reporting.70

What is already known on this topic

  • Publication related biases hide relevant trials and their results, and potentially lead to meta-analyses being biased toward favourable treatment effects

  • This problem has received little attention in meta-analyses that use individual participant data

What this study adds?

  • A survey of 31 meta-analyses of individual participant data from randomised trials published between 2007 and 2009 reveals that only 29% included trials from “grey literature” (such as unpublished trials or trials published only as conference abstracts), thus publication bias is still a concern for many meta-analyses, but this was often not discussed by authors

  • A third of the meta-analyses obtained less than 80% of the individual participant data requested, making them susceptible to data availability bias, but this was often not considered by authors

  • In 29% of the meta-analyses identification of relevant trials was either not stated or based on a selective, non-systematic approach, raising the possibility of reviewer selection bias

Notes

Cite this as: BMJ 2012;344:d7762

Footnotes

  • Contributorship: RDR conceived and supervised the study, alongside AJS. IA identified relevant articles for the survey, checked by RDR and AJS. IA performed data extractions and initial meta-analyses, checked by RDR and AJS. RDR obtained the aggregate data from the two non-individual participant data trials in example 2 and extended the meta-analysis accordingly. IA drafted the first version of the article, and this was revised by RDR and AJS.

  • Funding: IA is funded by the MRC Midlands Hub for Trials Methodology Research, of which RDR is its deputy director.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

  • Ethical approcal: Not required.

  • Data sharing: No additional data available.

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.

References

THIS WEEK'S POLL