CCBYNC Open access
Research

Discrepancies in autologous bone marrow stem cell trials and enhancement of ejection fraction (DAMASCENE): weighted regression and meta-analysis

BMJ 2014; 348 doi: http://dx.doi.org/10.1136/bmj.g2688 (Published 28 April 2014) Cite this as: BMJ 2014;348:g2688
  1. Alexandra N Nowbar, cardiovascular scientist,
  2. Michael Mielewczik, postdoctoral biologist,
  3. Maria Karavassilis, clinical medical student,
  4. Hakim-Moulay Dehbi, statistician,
  5. Matthew J Shun-Shin, academic clinical fellow in cardiology,
  6. Siana Jones, cardiovascular physiologist,
  7. James P Howard, physician,
  8. Graham D Cole, BHF clinical research training fellow,
  9. Darrel P Francis, professor of cardiology
  10. on behalf of the DAMASCENE writing group
  1. 1International Centre for Circulatory Health, National Heart and Lung Institute, Imperial College London, London W2 1LA, UK
  1. Correspondence to: A N Nowbar alexandra.nowbar09{at}imperial.ac.uk
  • Accepted 24 March 2014

Abstract

Objective To investigate whether discrepancies in trials of use of bone marrow stem cells in patients with heart disease account for the variation in reported effect size in improvement of left ventricular function.

Design Identification and counting of factual discrepancies in trial reports, and sample size weighted regression against therapeutic effect size. Meta-analysis of trials that provided sufficient information.

Data sources PubMed and Embase from inception to April 2013.

Eligibility for selecting studies Randomised controlled trials evaluating the effect of autologous bone marrow stem cells for heart disease on mean left ventricular ejection fraction.

Results There were over 600 discrepancies in 133 reports from 49 trials. There was a significant association between the number of discrepancies and the reported increment in EF with bone marrow stem cell therapy (Spearman’s r=0.4, P=0.005). Trials with no discrepancies were a small minority (five trials) and showed a mean EF effect size of −0.4%. The 24 trials with 1-10 discrepancies showed a mean effect size of 2.1%. The 12 with 11-20 discrepancies showed a mean effect of size 3.0%. The three with 21-30 discrepancies showed a mean effect size of 5.7%. The high discrepancy group, comprising five trials with over 30 discrepancies each, showed a mean effect size of 7.7%.

Conclusions Avoiding discrepancies is difficult but is important because discrepancy count is related to effect size. The mechanism is unknown but should be explored in the design of future trials because in the five trials without discrepancies the effect of bone marrow stem cell therapy on ejection fraction is zero.

Introduction

Autologous bone marrow stem cells offer an exciting opportunity for improvement of left ventricular function, reverse remodelling, and scar size reduction1 in patients with ischaemic heart disease.2 Results, however, have been conflicting. The reason for the differences between the various trials of effect on left ventricular function has so far not been identified. Meta-analyses have confirmed a significant positive effect on average but have found no clear explanation for the conflicts between individual trials.1 3 4

It has recently been discovered that some pioneering trials of autologous bone marrow stem cells have unexplained discrepancies that cast doubt on their validity.5 It was not possible to report this directly in the journals that published the trials.5 Discrepancies in reports have never been systematically explored as a possible explanatory variable for the effect size of autologous bone marrow stem cells on ejection fraction.

We examined reports of the randomised controlled trials of bone marrow stem cell therapy for discrepancies of design, methods, or results and examine the relation between number of discrepancies and effect sizes reported. We defined a discrepancy as two (or more) reported facts that cannot both be true because they are logically or mathematically incompatible.

Methods

Search strategy and eligibility criteria

We searched Embase and PubMed (1966 to April 2013), using the following search strategy: (“bone marrow cell” OR “bone marrow cells” OR “stem cells” OR “stem cell” OR “progenitor cell” OR “progenitor cells”) AND (“myocardial infarction” OR “coronary artery disease” OR cardiomyopathy OR “heart failure”) AND random*.

We also manually searched citation lists1 5 and PubMed links to related citations. We included trials that met the following criteria:

  • Trial reporting the effect on mean ejection fraction of infusion of autologous stem cells derived from bone marrow in patients with acute or established cardiac disease

  • At least one publication by the authors described it as randomised

  • Available through our institution and in a language understood well by at least one investigator.

For each trial identified by this method, we used the international standard randomised controlled trial number registry (isrctn.org), ClinicalTrials.gov registry, PubMed, Google, and manual evaluation of references to search for other reports from that trial published until end of April 2013.

Data extraction

Two authors (ANN and SJ) extracted data from each trial, with disputes resolved by a third author (MJS). When the ejection fraction was measured by more than one imaging technique (magnetic resonance imaging (MRI), echocardiography, radionuclide imaging, left ventriculography), we used the data from the technique specified as the primary endpoint. If this was not defined, we used the technique that was highlighted in the abstract or (if the abstract was not specific) given priority in the conclusion or (if not mentioned in either) given priority in the results. When the ejection fraction effect size was reported at multiple time points, we used that of the longest follow-up.

We defined the ejection fraction effect size6 as the change in ejection fraction in the active arm minus the change in ejection fraction in control arm. We used this if it was stated directly in the trial. If it was not directly stated, we calculated it from the changes provided in each arm or, when these were not provided, from the baseline and follow-up values in each arm.

We used the standard error of the effect size if it was stated explicitly in the trial. If the confidence interval of the effect size was given instead, we extracted the standard error. If only the standard errors or standard deviations or confidence intervals of the changes in each arm were provided, we then extracted the two standard deviations and sample sizes and used them to calculate the standard error of the estimate.

Detection of discrepancies

The trials were then examined for discrepancies, which were categorised into the following three types5:

  • 1. Discrepancies in the design—for example, conflicting statements as to whether the study was randomised (tabulated in appendix 1)

  • 2. Discrepancies in methods and baseline characteristics—for example, sample or subgroup sizes that could not be an integer number of patients (listed in appendix 2)

  • 3. Discrepancies in results—for example, conflicts between tables and figures or impossible values (listed in appendix 3).

Eight authors (ANN, DPF, GDC, HD, JPH, MM, MJS, SJ) read all the reports, except those of the four trials for which the discrepancies had already been found and published.5 Proposed discrepancies were discussed. A discrepancy was declared valid for inclusion in the study only if no member of the group could find a valid explanation.

Contradictions in numerical values were considered as discrepancies but errors in spelling or grammar were not. If the same conflicting statements appeared more than once (for example, a trial repeatedly described as randomised in one publication, and repeatedly described as accepter-rejecter in another), this was considered a single discrepancy.

Trials, and their reports, were coded with a “t” number or “r” number, respectively. Appendix 4 provides a decoded list of trials and reports with web links to the sources. Each discrepancy was numbered with a three digit code after the two digit “t” number. The first digit of the discrepancy code was allocated according to the type of discrepancy—that is, 1, 2, or 3, as listed above.

Of the trials, four had already undergone this process of identification and checking of discrepancies therefore for these we used the discrepancies as previously published by our group5 (t07, t08, t21, and t49). Because the present publication focuses on counting discrepancies, where our previous report5 listed more than one discrepancy on a single row, we now separated these on to individual rows.

Assessing risk of bias

Risk of bias in the included trials was assessed with the Cochrane Collaboration’s risk of bias assessment tool (see appendix 5). Each trial was assessed by two independent observers and any differences resolved by a third observer.

Data analysis

We visualised the relation between the ejection fraction effect size and the discrepancy count with a scatter plot and quantified it with Spearman’s rank correlation coefficient. It was further visualised with a histogram with trials grouped by the number of discrepancies in intervals of 10. Means were weighted by sample size. We constructed a funnel plot of the data and used Egger’s test to assess asymmetry.7

A meta-analysis was conducted for trials that provided sufficient data to weight the effect size estimates by a function of the reciprocal of the square of the standard error of the effect size estimate. We pooled the data on ejection fraction effect size for this subset of trials using a random effects model and present them as weighted mean differences with 95% confidence intervals.

As part of an exploratory analysis we performed univariate and multivariate linear regression analyses including the discrepancy count, sample size, and the five specific domains of the Cochrane risk of bias tool as predictor variables for the effect size. Any aspect of bias that was agreed to be “unclear” was treated as a “no.” The multivariate model was built by stepwise backwards selection based on the Akaike information criterion.8

Data analysis was carried out with R (version 3.0.2, R Foundation),9 the graphics package ggplot2 (version 0.9.0),10 and the meta-analysis package metafor (version 1.9).

Results

Figure 1 shows how we identified trials. We identified 49 randomised trials reporting the effect of bone marrow stem cells on ejection fraction for cardiac disease (appendix 6). Length of follow-up ranged from three months to 65 months, with modal duration of six months. Each trial had between one and 13 reports. There were 133 reports in total (see appendix 4 for the list of trials and reports). We identified one study during the search in which the focus was on safety alone (appendix 7).

Figure1

Fig 1 Identification of randomised controlled trials of autologous bone marrow stem cells for heart disease (EF=ejection fraction)

Discrepancies in design, methods, or results

We identified 604 instances of discrepancy (appendices 1-3) within a trial report or between reports of that trial. We identified 44 discrepancies in the reports of the study on safety. There were many types of discrepancy, as shown by the examples in table 1. Table 2 shows examples from the trials of phenomena that are unusual but were not counted as discrepancies.

Table 1

 Spectrum of discrepancies in published reports of autologous bone marrow stem cell trials and enhancement of ejection fraction

View this table:
Table 2

 Unusual phenomena not listed as discrepancies in published reports of autologous bone marrow stem cell trials and enhancement of ejection fraction

View this table:

Many aspects of the reports contained discrepancies. Even the primary endpoint was not spared (t09/308, t14/301, t19/305). Sometimes the discrepancies seemed to affect whether the difference between trial arms was significant (t10/301). Effect size, defined as the increment in ejection fraction from bone marrow stem cell therapy, ranged from −3.9 to 14 percentage units. Numbers of discrepancies in individual trials across all their reports ranged from 0 to 89.

There was a significant correlation between the number of discrepancies and the reported ejection fraction effect size (Spearman’s r=0.4, P=0.005, fig 2). There were only five studies with no discrepancies, and these showed a mean effect size of −0.4%, with this average weighted by sample size. The 24 trials with one to 10 discrepancies showed mean effect size of 2.1%; the 12 with 11 to 20 discrepancies showed mean effect size of 3.0%; the three with 21-30 discrepancies showed mean effect size of 5.7%; and five high discrepancy trials, with over 30 discrepancies each, showed a mean effect size of 7.7% (fig 3).

Figure2

Fig 2 Correlation between number of discrepancies in trial’s reports and ejection fraction (EF) effect size. Dot area is proportional to trial’s sample size (Spearman’s r=0.4, P=0.005)

Figure3

Fig 3 Mean ejection fraction (EF) effect size by number of discrepancies in trial’s reports. Error bars here show only SE of mean effect size weighted for sample size across trials in each category. Formal meta-analytic confidence intervals, which fully integrate sample size and uncertainty within each trial, are available only for subset of trials (see appendix 10)

Publication bias

The funnel plot (appendix 8) did not show significant asymmetry (Egger’s test P=0.4) that would suggest publication bias.

Discrepancies and risk of bias

The results of the exploratory univariate and multivariate analyses are presented in appendix 9. Only the number of discrepancies (P<0.001) and sequence generation (P=0.03) remained significant contributors to the effect size (adjusted R2=0.38, P<0.001).

Meta-analysis of studies providing information on uncertainty of effect size

We could adequately extract the standard error of the effect size estimate in only 31 trials to allow a formal meta-analysis to be conducted using this information (appendix 10). The weighted mean effect size was 0.0 (95% confidence interval −4.67 to 4.65) for trials with no discrepancies; 1.9 (0.30 to 3.57) for trials with 1-10 discrepancies; 4.6 (1.64 to 7.61) for trials with 11-20 discrepancies; 4.4 (−0.97 to 9.75) for trials with 21-30 discrepancies; and 10.4 (8.44 to 12.36) for trials with more than 30 discrepancies.

Discussion

Whenever we present scientific information we risk introducing conflicting statements that form discrepancies. Our study shows that scientists who achieve progressively better consistency of reporting find progressively smaller effects on ejection fraction of treatment with bone marrow stem cells. In trials with a discrepancy count of zero, the ejection fraction effect seems to be zero.

Study limitations

We were unable to blind ourselves to effect size because this was embedded within the report itself. There might be additional unidentified discrepancies. Our work involved developing newly derived mathematical limits on what is possible (appendix 11). There could be other such limits that have not yet been established. We invite readers to contribute either new discrepancies or new general methods for identifying the impossible.

Our method of counting discrepancies is imperfect because there is no universally accepted convention. We have tried to be consistent (appendices 1-3) but are open to suggestions from readers. Some readers might consider what we list as a single discrepancy to be several (for example, multiple repetitions of the same contradiction) or what we consider several to be just one (arguing, for example, that multiple discrepancies in a table might have been values from a different trial pasted into a manuscript accidentally).

We have taken a simple approach of including all trials, including some that we have previously identified as containing discrepancies and that showed a large effect of stem cells.5 With exclusion of these four trials, Spearman’s rank correlation coefficient in figure 2 is lower at 0.3 (P=0.03) but similar. We were able to examine only studies whose results have been reported. We examined clinical trial registries but many trials seem to have sped through to publication, with the registration step skipped.

In some cases it was not clear whether a trial was randomised. Previous meta-analysts have handled the confusion in contradictory ways, with some trials being classified as randomised in one meta-analysis and non-randomised in others.1 15 16 Our policy considered a trial eligible if an author of the primary report stated at any stage that the trial was randomised. We recognise that other conventions for inclusion would also have been possible. This will remain a challenge as long as primary authors find the distinction puzzling.

Our main analysis is weighted simply by sample size because this was available in every trial. In formal meta-analysis it is ideal to weight by a function of the reciprocal of the square of the standard error of the effect size estimate, but more than a third of the trials did not provide this information. For those trials that did provide sufficient data to weight the effect size estimates in this way, we conducted a formal meta-analysis (shown in appendix 10).

We have not attempted to control for the fact that some trials issued more reports than others, and different reports gave information to different levels of depth. It is difficult to satisfactorily control for this because sometimes the multiple reports are pure duplication, sometimes they cover non-overlapping information, and sometimes they are contradictory.

We excluded five reports because they were in Chinese language journals to which we had no access.17 18 19 20 21 22 23 We do not know to what extent this access limitation might have biased our results. Although it might have been possible to use additional routes to obtain the full text, and then arrange a translation, the translation process could always be suspected as a source of imperfection.

Results in context of other similar work

There seem to be no other studies exploring the relation between effect size of bone marrow stem cell therapy and the number of discrepancies in the reports. Several meta-analyses have covered bone marrow stem cell therapy for increasing ejection fraction but have not discussed the discrepancies in the reports. They all concluded that the average effect was a significant increase in ejection fraction.1 3 This was reiterated in a Cochrane review4 and the recent MESS (meta-analysis of cardiac stem cell studies) meta-analysis.24

Our findings expand on these meta-analyses. Our study concurs that viewing all the studies together as a single entity, there is on average a positive effect on ejection fraction. However, we found that the positivity was not consistent across the spectrum of discrepancy count. The studies with the most discrepancies seem to be contributing most to the positivity, while the studies with no discrepancies show a zero effect. Averaging effect size across all studies might therefore not be wise because it does not reflect their varying factual accuracy.

Standard meta-analyses include quality assessment, but this does not seem to involve identifying or quantifying factual discrepancies.5 Well conducted meta-analyses have somehow classed many studies with numerous discrepancies as high quality.1 3 25 26 27

Discrepancy count seems additive to a traditional assessment of risk of bias (appendix 9). If our findings are verified by other workers in other specialties, then addition of discrepancy checking, and ideally cross checking with raw data, might make meta-analysis more illuminating.

Possible explanations

We do not know the cause of the discrepancies. We have asked for resolution of over 150 discrepancies through journals.5 None were resolved, although we found it triggered correspondence from lawyers.

One possibility is that authors might feel pressure for results to match expectations. One signal of a misguided desire to please is the phenomenon of directed editing of rounded percentages to force them to add up to 100%. In reality, correctly rounded percentages should often not add up to 100% when there are many categories.28 The effect of even a little bias can be surprisingly dramatic.29 30

Secondly, exciting new treatments might be reported before full checking. One sign of this, in the neighbouring specialty of cardiomyocyte-derived stem cell therapy, is the insertion of the word “randomised” into the title of the journal publication31 that was not present in the manuscript finalised by the authors32 on Pubmed Central. There were seven controls in total, but after subtraction of the four who were not randomised and one who was randomised to stem cells but refused treatment, the number of randomised controls was only two. For the Lancet this was a new low.

Thirdly, bone marrow stem cell therapy might be less effective when it is carried out in a rigidly standardised way. Centres with less attention to detail might incorporate an unnoticed contaminant that enhances the effect of treatment. These centres might produce reports with more discrepancies. In support of this, just over a fifth of the trials (t07, t08, t09, t11, t21, t27, t33, t35, t40, t43, t44, t49) showed ejection fraction effects of 7% or more, but these trials accounted for more than half of all the discrepancies.

The final possibility is that in the reports with the fewest discrepancies, the ejection fraction effect might also have been measured with least error. If so, the true effect of bone marrow stem cells on ejection fraction is zero.

Implications for correctness of values reported in trials

When trials provide full data, serious errors in reporting can come to light, such as omitted patients,33 reclassification of causes of death,34 35 or studies based on fictitious data.36 37 As full data disclosure is rare, readers currently cannot estimate how many trial reports are incorrect. It is essential that there is open access to data.38

In our study, the reported standard deviation of the NYHA score (New York Heart Association classification for chronic heart failure) offers a unique window into correctness of reporting, which does not require raw data. If NYHA data for individual patients are fabricated, then the means and standard deviations will remain mathematically possible.

It is only when standard deviations are not correctly calculated from real NYHA values that mathematically impossible values can arise. Of 11 trials reporting a standard deviation of NYHA, the values in five (45%) are mathematically impossible.

The NYHA score is simple to measure, and the standard deviation is simple to calculate. Ejection fraction effect is more complex to measure and its statistical significance is more complex to calculate. We are concerned that, if simple calculations on simple variables are definitely incorrect in almost half of trials, then the more subtle statistical statements regarding more subtle variables might in most cases also be incorrect.

Implications for interpreting trial design

Readers need to know whether a trial is randomised or not, but the reports were sometimes vague or even contradictory. Some trials were initially reported as accepter-rejecter (non-randomised) and later as randomised5 (t21, t41, t49). In one, a later publication recalled the existence of a placebo control group (t07r3). In this specialty, patients’ voluntary choice is sometimes considered a form of randomisation, a policy accepted by some journals (t21/102).5 39 Identical tables and identical figures have inexplicably been presented as results of different studies,5 with different names, different designs (randomised versus accepter-rejecter), and different sample sizes (t07r1, t07r10).

Journals could resolve such discrepancies but currently do not consider this a priority5 40 (t07r1, t07r10, t21r5, t21r11, t49r1, t49r3).

Implications for safety of bone marrow stem cell therapy

The safety of bone marrow stem cell therapy is underlined by a large report focusing on this.41 42 43 44 Unfortunately it too contains many discrepancies (appendix 7), including impossible percentages and conflicts between tables and figures, perhaps because the reassuring findings had to be made available with urgency.

Implications for clinicians and researchers

If patients ask for advice on which bone marrow stem cell trial to enter, we want to maximise the benefit to them, while maximising their contribution to reliable evidence for future patients. Unfortunately, these seem to be in conflict (fig 3).

Sometimes researchers feel that only findings of positive effects indicate scientific success. But meticulously reported studies reporting neutral effects are vital contributions to science. It is more valuable to have a reliable report of a small improvement than an unreliable report of a large improvement. Error-free reporting is difficult to achieve, as we have found in our own experience.45 Only 10% of these trials were reported without introducing discrepancies. We consider these to be the greatest scientific successes, even though the effect size was unfortunately zero.

Several lessons can be drawn for the design of future trials of bone marrow stem cells. Prior registration on a public clinical trial registry was not universal and would have been helpful in distinguishing unambiguously between trials that were multiply published or merely identical by coincidence.

We recommend that reports include a spreadsheet of all the data used for construction of the tables, so that incorrect values could be more easily identified. Readers should accept that authors cannot avoid errors; in turn authors should correct errors promptly and indicate clearly when later reports incorporate corrections. It should be remembered that the 604 discrepancies listed in appendices 1-3 are unlikely to be all the errors. They are only those detectable by us without any information beyond the published reports. Disclosing the individual patient data could help to correct more errors.

It is important for studies using change in ejection fraction as an endpoint to be properly designed to resist error and to have adequate sample size to combat the effects of biological variability. Left ventricular ejection fraction is a mutable variable, which in some modalities is easily manipulated innocently by clinicians who have prior beliefs on what a realistic value should be for a particular patient. Sample size planning can sometimes be erroneously omitted when clinicians are enthusiastic to “demonstrate the effectiveness” of a treatment seen as exciting.

Conclusions

It is difficult to avoid discrepancies in clinical trial reports. Trials with progressively fewer discrepancies tend to find progressively smaller effects on ejection fraction of bone marrow stem cell therapy. The reason for this association is unknown. The few trials for which the discrepancy count was zero had a stem cell effect size that was also zero.

What is already known on this topic

  • Autologous bone marrow stem cell therapy has been reported to substantially increase cardiac function

  • The trials have differed in the effect sizes they reported, for reasons that are not clear

What this study adds

  • Many reports of trials of bone marrow stem cell therapy contain factual discrepancies

  • The number of discrepancies in a trial was significantly associated with the reported effect size

  • Trials with over 30 discrepancies report a large effect size. Classes of trials with fewer discrepancies have found progressively smaller effect sizes, culminating in discrepancy-free trials reporting an effect size of zero

Notes

Cite this as: BMJ 2014;348:g2688

Footnotes

  • Notes added at proof stage: The institution of t07, t08, t21, and t49 is recently reported to have identified evidence of misconduct46 and has notified the city prosecutor. The institution of the SCIPIO trial31 is recently reported to have requested that the publication be retracted.47

  • Contributors: DPF and GDC designed the study, examined the trials, and drafted and revised the paper. MM designed the study, performed the search, examined the trials, and drafted and revised the paper. ANN performed the search, examined the trials, analysed the data, and drafted and revised the paper. MJS and HD examined the trials, analysed the data, and revised the paper. JPH and SJ examined the trials and revised the paper. MK performed the search and revised the paper. DPF is guarantor.

  • Funding: This study was not funded by any external organisation. All authors are associated with Imperial College London. For full disclosure, we report that DPF is supported by a British Heart Foundation senior clinical research fellowship FS/10/038 and GDC is a British Heart Foundation clinical research training fellow (FS/12/12/29294). Neither the institution nor any funder had any role in devising, conducting, analysing or reporting this study.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

  • Ethical approval: Not required.

  • Data sharing: The complete list of identified discrepancies is shown in appendices 1-3. These are freely available from the corresponding author in editable form on request. We welcome and will make public any corrections or updates from readers.

  • Transparency declaration: The guarantor affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/.

References