The rules of retractionBMJ 2010; 341 doi: https://doi.org/10.1136/bmj.c6985 (Published 07 December 2010) Cite this as: BMJ 2010;341:c6985
- Melanie Newman, reporter, the Bureau of Investigative Journalism
Years after regulators banned GlaxoSmithKline’s antidepressant paroxetine (Seroxat) for under-18s, two academics are fighting for a paper claiming the drug is safe and effective for adolescents to be withdrawn.
The 2001 paper in the Journal of the American Academy of Child and Adolescent Psychiatry (JAACAP)1 concludes that paroxetine is “generally well tolerated and effective” for treatment of major depression in adolescents.
The paper gives a misleading impression of the trial’s results and the journal should retract it, say Jon Jureidini, associate professor of psychiatry at the University of Adelaide, and Leemon McHenry, lecturer in philosophy at California State University.
The efficacy claim was based on just 15% of the trial’s outcomes, they argue.
The academics’ stance is supported by internal GSK documents released during personal injury lawsuits against the company. The documents show that company employees and public relations advisers also saw the trial data as having failed to prove that the drug worked in adolescents.
Despite this, JAACAP’s editors maintain that as the paper contains no inaccuracies and negative findings are included in a results table, there are no grounds for its withdrawal.
“Evidence about medicine will be reliable only if the sponsor company and investigators design, conduct and report the results of clinical trials with integrity,” say Jureidini and McHenry.
The pair believe that journal editors are too reluctant to retract papers when the extent of this influence is revealed. Editors are “jeopardising their scientific standing and moral responsibility to prescribers and patients,” by failing to retract, they argue.
A recent Thomson Reuters analysis5 of journals covered by its Science Citation Index Expanded showed that retractions increased 10-fold between 1990 and 2008, but remained rare events against the numbers of articles produced each year.
In 1990 five out of 690 000 journal articles produced were retracted, compared with 95 retractions out of 1.4 million papers published in 2008.
The International Committee of Medical Journal Editors (ICJME) advises retraction in cases of scientific fraud or where an error is “so serious as to vitiate the entire body of work,”6 implying that this approach should not be used in cases of debate as to whether data have been interpreted correctly.
The National Library of Medicine, perhaps the most authoritative source of advice on retraction, also defines the circumstances when retraction may occur fairly narrowly: it advises that articles may be withdrawn because of “pervasive error or unsubstantiated or irreproducible data” due to either misconduct or honest error.
Last year the Committee on Publication Ethics (COPE) widened the scope for retraction, advising editors to retract if “they have clear evidence that the findings are unreliable.” The COPE guidelines stressed the main purpose of retraction: to “correct the literature and ensure its integrity” rather than to punish miscreant authors.
Despite this many academics continue to equate withdrawal of a paper with misconduct so editors are reluctant to retract without extremely strong evidence of wrongdoing. The threat of legal action weighs heavily, especially on the smaller journals. Liu’s research7 associating higher retraction rates of some high impact journals with lower standards of pre-publication review may also mean editors see retraction as reflecting badly on their stewardship.
Three journals were warned about convicted fraudster Erich Poehlman by his university, which had uncovered evidence of data fabrication. Two of the journals refused to retract Poehlman’s articles.8
Jureidini says: “I’ve been surprised how hard it’s been to get editors to take action to improve the quality of their journals. They prefer to turn a blind eye.”
Study 329, a study of 275 adolescents, was one of three clinical trials conducted by SmithKline Beecham (as GSK was then known) in the mid 1990s.
Study 329’s results showed that paroxetine was no more effective than the placebo according to measurements of eight outcomes specified by Martin Keller, professor of psychiatry at Brown University, when he first drew up the trial.
Two of these were primary outcomes: the change in total Hamilton Rating Scale (HAM-D) score, and the proportion of “responders” at the end of the eight week acute treatment phase (those with a ≥50% reduction in HAM-D, or a HAM-D score ≤8). The drug also showed no significant effect for the initial six secondary outcome measures.
The drug only produced a positive result when four new secondary outcome measures, which were introduced following the initial data analysis, were used instead. Fifteen other new secondary outcome measures failed to throw up positive results.
An internal SmithKline Beecham document discussing these results and those of another trial that had failed to show paroxetine’s effectiveness noted that it would be “commercially unacceptable to include a statement that efficacy had not been demonstrated.” The document also referred to a target to “effectively manage the dissemination of these data in order to minimize any potential negative commercial impact.”
SmithKline Beecham commissioned medical communications company Scientific Therapeutics Information to produce a manuscript. An employee of Scientific Therapeutics Information, Sally Laden, drew up a first draft.
The manuscript was then sent to the Journal of the American Medical Association, which rejected it after peer reviewers highlighted methodological and other problems. One peer reviewer noted that “the main finding of the study was the high placebo response rate,” and said more attention could have been given to “discussion of the fact that the bulk of the effect in this study was the result of good clinical management and not the medication.” Another peer reviewer raised concerns about the study’s authorship and suggested the authors should state that they were “all granted full access to the data set to verify the accuracy of the report” and that all were in full agreement with the manuscript as submitted.
The paper was rewritten and sent on to JAACAP. The journal’s peer reviewers noted that the results did not “clearly demonstrate efficacy for paroxetine” and asked whether, given that 50% of placebo treated teenagers improved, selective serotonin reuptake inhibitors were “an acceptable first-line treatment.”
JAACAP nevertheless accepted the paper.
The final published version reports that Study 329 found “significant efficacy in one of the two primary endpoints,” although on both outcome measures initially defined as primary, no significant result had been found. Jureidini and McHenry suggest that this statement was arrived at by conflating one of the secondary outcomes (remission) that had been found to be positive with a primary outcome.
The paper also listed 11 serious adverse events, including five cases of “emotional lability (eg suicidal ideation/gestures)” in the paroxetine group compared with just two serious adverse events among the patients on placebo. Jureidini and McHenry argue that as most of the five “emotionally labile” paroxetine patients had self harmed or reported emergent suicidal ideas, the claim in the JACAAP abstract that paroxetine was safe was misleading. GSK maintains that the small numbers of patients involved meant the emotional lability findings were not significant and that the drug’s association with suicidality did not become clear until a later meta-analysis of the data from several trials was analysed. The authors stated in the paper that they believed only one of the adverse events suffered by the paroxetine group was related to the treatment.
In 2002, on examination of all GSK’s adolescent trial data, including Study 329, UK regulators identified a “signal” of increased risk of suicidal thoughts and behaviours and issued a warning not to prescribe paroxetine to anyone under 18.9
By June 2010 the JACAAP paper had been cited in well over 200 articles, many of which cited Study 329 as evidence that paroxetine was effective in treating adolescent depression. The paper had previously been used to support GSK’s marketing campaign, before regulators banned the drug for under-18s. Reprints were attached to a memo for drugs reps selling Paxil (Seroxat) encouraging them to use the paper to promote its “remarkable efficacy and safety in the treatment of adolescent depression.”
Jureidini wrote to JAACAP in 2002 highlighting the paper’s selective reporting and questioning the editor’s decision to publish it. The journal published his letter but did not respond to his criticisms. McHenry also contacted JAACAP in 2005, pointing out that conflict of interest and authorship policies had been violated: Professor Keller and some of the other 22 listed authors had worked for GSK, but this had not been declared, while Sally Laden, who drew up the first draft, was listed as providing “editorial assistance.”
Both academics called for the article’s retraction in December 2009, arguing that the conflation of primary and secondary outcomes represented falsification of data and accusing GSK of intending to deceive by concealing negative data.
GSK denies this, saying: “GSK remains firm in the belief that we acted properly and responsibly in the conduct of our clinical trials programme, documentation and submission of results from studies of paroxetine to regulators, and in communicating important safety information.” A spokeswoman added that the JACAAP paper was submitted “prior to any association of suicidality.”
In his response to the academics’ call JACAAP editor-in-chief Andres Martin said the former editor’s decision to publish the paper despite the reviewers’ misgivings “conformed to best publication practices prevailing at the time.” He added that he had given “serious consideration and due diligence” to the request that the paper be withdrawn but had found no evidence of scientific errors “nor any justification for retraction according to current editorial standards and scientific publication guidelines.”
Jureidini and McHenry say that JACAAP’s editorial decision in this case is at odds with the ideals of scientific rigour and ethical integrity10 it promulgates. “JACAAP was the most important instrument through which the results of Study 329 were misrepresented to physicians,” they say.
But in the eyes of many, exaggerated claims and publication bias are not sufficient to justify retraction.
Liz Wager, chair of COPE, declines to comment on the paper but warns that cases should be judged on the transparency standards of the day. “Things have changed in the last few years.”
The US requirement, in place since 2008, for all trials to be registered, including their pre-specified outcome measures, will make cherry picking harder, she says. Some journal editors are also asking contributors to register their trials and make primary data available for scrutiny.
Wager adds: “If you look at the early hormone replacement studies, all sorts of claims were made. It wasn’t until the big randomised trials that we began to realise the true picture, but nobody is suggesting the original papers should be withdrawn.” A “conspiracy of hope,” in which doctors, pharmaceutical companies, and patients allow themselves to give a drug the benefit of the doubt because they want it to work so much, tends to skew results, she suggests.
Her predecessor at COPE, Harvey Marcovitch, suggests little research is published that is entirely free from bias, or “honest error.” “There are very many papers where, if you looked at the data, you could argue that the conclusions are not justified,” he says. “If you used retraction whenever that happened you’d be continuously retracting.”
Jureidini argues the Seroxat case goes beyond overenthusiastic endorsement. “They conflated two different measures in a way which was misleading,” Jureidini says.
For the editor who is trying to decide whether, in hindsight, acceptable highlighting of positive results tipped over into unacceptable misrepresentation, there is no authoritative guidance at hand.
“Neither the COPE guidelines nor the National Library of Medicine advice covers the situation where authors haven’t been scrupulously transparent in the conclusions they derive from their data,” says Marcovitch. “Arguably, they are a bit feeble. There’s nothing there to deal with widespread manipulation of the publishing process.”
Editors of the major journals also appear at the end of a chain of research production that has a huge amount of money invested in it; they are the final gatekeepers of information between industry and the public. And their resources are minimal.
For Jureidini, that excuse isn’t good enough. “Do we need so many journals if they can’t do their job properly?” he asks. “Maybe we need fewer of them.”
If thousands of retractions would result from their request, McHenry adds, then “this is precisely what is in order to clean up academic medicine.”
Jureidini and McHenry endorse former BMJ editor Richard Smith’s recent proposal to banish industry trials from journals and oblige pharmaceutical companies to post trial results on their websites, leaving journals free to examine the raw data and conclusions independently. If that were done, the pair argue, “journals would no longer be subject to the complaint that they have become little more than the marketing arm of the pharmaceutical industry.”
Cite this as: BMJ 2010;341:c6985
Competing interests: The authors has completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the author) and declares: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years, no other relationships or activities that could appear to have influenced the submitted work.
Provenance and peer review: Commissioned, externally peer reviewed.