Burden of proof: combating inaccurate citation in biomedical literatureBMJ 2023; 383 doi: https://doi.org/10.1136/bmj-2023-076441 (Published 06 November 2023) Cite this as: BMJ 2023;383:e076441
- Nicholas Peoples, MD student1,
- Truls Østbye, vice chair (research) and professor2,
- Lijing L Yan, professor and head of non-communicable disease research3
- 1Baylor College of Medicine, Houston, TX, USA
- 2Family Medicine and Community Health, Duke University, Durham, NC, USA
- 3Global Health Research Center, Duke Kunshan University, Kunshan, Jiangsu Province, China
- Correspondence to: firstname.lastname@example.org
Up to 25% of all citations in the general scientific literature are inaccurate and mislead physicians, academics, and policy makers
The emergence of artificial intelligence (AI) powered large language models such as ChatGPT has the potential to both enable and mitigate inaccurate citation on a scale not previously possible
Researchers need new strategies to ensure that scientific references function as an accurate web of knowledge
We make the case that peer reviewed journals consider adopting a required statement on the integrity of cited literature, using the adoption of required conflict of interest statements as a proof of concept
Even without a name, it is a devil we all know: an article cites a source that does not support the statement in question, or, more commonly, the initial reference sends the reader down a rabbit hole of references, the bottom of which is difficult to find and interpret. This causes two problems. Firstly, it may propagate data that are false, misinterpreted, or both, spurring “academic urban legends” that become circulated as truth.1 This delays true results from reaching the literature and allows incorrect ideas to masquerade as facts. Second, it undermines respect for the process of literature review, effacing the foundation of good scientific inquiry into a mere box ticking exercise. This cheapens the value of background and discussion sections in scholarly articles and encourages trainees and young investigators to practise sloppy research.
These errors might be especially problematic for doctors and the general public, “who are not focused on the scientific study of a narrow research topic and thus are less prone to identify rhetorically misleading statements or outright factual errors.”2 Leung and colleagues, for example, document clear patterns of inaccurate citation that misrepresent the conclusions of a single paragraph statement in the New England Journal of Medicine on the safety of opiate use.34 They argue that these misrepresentations might have contributed to the North American opioid crisis “by helping to shape a narrative that allayed prescribers’ concerns about the risk of addiction associated with long term opioid therapy.”3
Recent estimates indicate citation error rates of 11-15% in the biomedical literature25 and up to 25% in the general science literature.6 In a review of 4912 citations, 38.4% of these errors were citing non-existent findings, 15.4% were incorrect interpretations of findings, and 20% were chains of inaccurate citations copied forward from paper to paper.5 This indicates that mis-citation is widespread. A surgical study7 was found to be misquoted by 40% of the articles that cited it,8 creating an unsupported but widely accepted guideline for how an orthopaedic procedure should be performed. This shows that mis-citation might also deeply mischaracterise individual scientific works. Finally, to understand how an entire scientific belief system might evolve, a 2009 study systematically mapped out the full citation chain for a particular scientific claim related to amyloid β. Among its findings was the “marked expansion of the belief system by papers presenting no data addressing it; and forms of invention such as the conversion of hypothesis into fact through citation alone.” 9 So, improper citation can even credibly distort the scientific consensus.
Rekdal offers granular insight into this process in his excellent analysis of the “iron content in spinach” myth (fig 1).1 In a 1981 article entitled “Fake!”,10 Terry Hamblin believed he was debunking an erroneous claim about the iron content of spinach but was unknowingly using incorrect information himself. Others then cited and transformed his ideas into even greater inaccuracies.111 Rekdal convincingly makes the case that even later authors, such as Larsson in 1995,12 borrowed the conclusions of articles that cited Hamblin without actually consulting the 1981 paper directly, further distorting the truth. In the end, Hamblin’s accidental rumour was only debunked in 2010 (some 30 years later),13 and he tried in vain to extinguish it to up until his death in 2012.114
This all makes for a poor report card, especially given that erroneous citation has been rampant since at least 1931 (and, judging by Shull’s exasperated remarks, with little progress in the time since).15 But modern figures risk becoming underestimates: emerging artificial intelligence (AI) powered large language models now enable inaccurate citation with an efficiency and scale not previously possible. On a broad scale, academia is clearly grappling with how to reconcile technological progress and traditional research ethics with concerns that AI may “hallucinate” sources or fabricate data.1617 So why do we turn a blind eye if such fabrications are human, so long as they are neatly packaged into scholarly citations that look and sound the part?18
Types of inadequate citation
Just as there is no universally accepted term—misquotation8 versus quotation errors,2 erroneous citation15 versus inadequate citation,12 and so on—there is no universally accepted classification scheme. Steven Greenberg developed a “vocabulary” for the “citation distortions” he encountered in his study of amyloid β, which offers one useful framework, but it primarily describes mechanisms and consequences of select forms of poor citation without exploring underlying causes, which also matter.9 Others discuss “common citation errors” but do not report the methodology for how they arrived at these categories.19 We propose that rigorous, systematic categorisation, a taxonomy and nomenclature of inadequate citation, or both, are important next steps. Box 1 provides a concise overview of some common documented types of mis-citation.
Selected examples of inadequate citation practices
Biased—preferentially citing certain sources, perhaps those of colleagues, because the author is simply more familiar with them, regardless of whether they are the best supporting works for the claim at hand19
Coercion: pre-submission—senior or principal investigator incorrectly adds a reference to a work created by a trainee (such as a PhD student), who does not feel empowered to decline or challenge this move20
Coercion: within submission—during peer review, a reviewer or editor instructs the author to cite publications co-authored by the reviewer or editor20
Editing—a work is initially cited correctly, but as the draft is edited and sentences are eliminated or consolidated, perhaps by multiple authors, a correctly cited study inadvertently becomes misquoted19
Missing—making a statement without citing the source19
Plagiarism—representing others’ work as one’s own by omitting citations
Self-citation—inappropriately citing one’s own work without sufficient justification
Temporal—Citing a work that was read, but some time ago, such that the author inaccurately recollects the findings25
Strategies to mitigate mis-citation
There are few checks and balances on erroneous citation. The first line of defence (pre-submission quality control) is a standard expectation of academic integrity. In the age of “publish or perish,” however, competing interests, such as enormous pressure on investigators to exhibit constant productivity, can make it tempting to cut corners in pursuit of expediency.62124 Market forces are in tune to this. Consensus, for example, is a new, AI powered search engine designed specifically for academics (https://consensus.app/search/). Given a search query, it will produce a list of 5-10 peer reviewed papers with a short synopsis of each. The more widely used ChatGPT has been found to fabricate both facts and legitimate sounding “scholarly sources” to provide a convincing—rather than factual—answer to a prompt.1617 Although these tools are multi-purpose, powerful, and certainly promising adjuvants for literature review, they also exponentially enable the ease and scale of producing inaccurate citations.
The second line of defence is peer review, which currently serves as the major safety net to catch mis-citations after submission. History has shown, however, that this can also be inefficient, inconsistent, and insufficient.6262728 (And, as others point out, peer review may even encourage mis-citation.)20 Moreover, the ultimate responsibility for auditing cited literature should not rest with peer reviewers, but with those who selected the literature to support their claims.
The third line of defence is post-publication review, which is often either inordinately difficult2930 or even directly opposed.31 Although non-replicable primary results have been caught and overturned after publication, there is little drive to do the same for inaccurate citations. This effectively allows them to live on into eternity. Although we are not the first to highlight these problems,1532 we argue that prevailing strategies are insufficient.
New tools in our toolkit
Just as AI might exacerbate erroneous citation, we suggest it could also be part of the solution. It might, for example, eventually be possible for AI programmes to be integrated into manuscript submission portals to confirm that all cited works actually exist. Even more useful, however, will be AI modalities developed to assess citations for accuracy and to flag potential discrepancies for review. This would not replace human review, but something with reasonable sensitivity and high specificity could provide an initial screen that alerts reviewers to instances of potential mis-citation without creating additional work.
Another idea is for journals to ask authors to attest that an article contains no inaccurate citations (box 2), much in the same way that they are expected to make a statement about conflicts of interest.
Example works cited statement
The author(s) certify that all works cited were read in full by at least one author at the time of writing this manuscript; are necessary to support the intellectual foundation of the work; were added without undue coercion; and do not reflect inappropriate self-citation. We affirm that this document cites primary sources whenever possible, transparently discloses the source of all secondary information presented, and accurately represents both the objective findings and earnest spirit of all works cited.RETURN TO TEXT
These ideals are implicit in any submission to a peer reviewed journal, so making them explicit presents no addition burden. The key question, then, is whether such declarations are useful. To answer this, we can compare older to more recent literature as journals have progressively adopted more rigorous standards. Specifically, conflict of interest (COI) statements are a useful analogy to our proposed works cited statement, because when there are no competing interests to declare, they function as an attestation. These statements became increasingly common as concerns grew about the influence of money and corporate interests in research. Estimates for the proportion of journals requiring a COI statement vary by discipline but consistently show a positive trend. Some commonly cited examples include 16% (220 of 1367 highly ranked scientific and biomedical journals) in 1997,33 33% (28 of 84 journals from 12 scientific disciplines) in 2007,34 89.7% (358 of 399 “high impact biomedical journals”) in 2011,35 and 96% (224 of 227 public and occupational health journals) in 2016.36 It comes as no surprise, then, that none of 47 trials on febrile neutropenia published from 1981 to 2000 contained a COI statement.37 (Additionally, only 29 reported that informed consent was obtained and only 22 reported approval of a research ethics committee.) To see how this compares with more recent literature, we reviewed a random sample of 100 research papers published in 2022 in the New England Journal of Medicine, the Journal of the American Medical Association, and The BMJ, finding that 100% of articles included COI statements (and likewise, statements on institutional review board approval and informed consent, when applicable) (see supplementary file on bmj.com).
A COI statement encourages transparency because inadequate disclosure carries consequences. Similarly, a “works cited statement” (box 2) might encourage diligence as it implies that references will be scrutinised, and mis-citation will be penalised. Assuming that most authors operate on good intent (striving for their work to be accurate) or self-interest (striving to avoid delays in publication), or both, the act of formal attestation might inspire greater diligence in crafting a reference list or double checking it for accuracy before submission. Even with inadequate disclosure, however, statements still offer important functionality. A 2014 phase 3 trial, for example, was publicly redacted when it was discovered that the authors lied in their COI statement.38 Here, the COI statement created a clear mismatch between the authors’ words and deeds. This alerted a vigilant reader, provided the journal with irrefutable justification for corrective action, and, ultimately, halted the dissemination of untrustworthy information. Similarly, a works cited statement found to have inadequate or inaccurate disclosure might raise a red flag and lead reviewers or readers to scrutinise a work more closely, potentially to catch other important flaws. If the author’s formal assurances for something as basic as a reference list cannot be taken at face value, what else might they have been mistaken or misleading about?
COI statements enable new inquiries into research bias. Landmark studies have shown, for example, a strong association between pharmaceutical industry funding and likelihood of reporting significant results.3940 Another study analysed 767 clinical trials and found a strong association between failure to disclose informed consent and poor methodological quality.41 A works cited statement, then, would enable researchers to ask similar questions about inaccurate citations, such as whether studies with an inaccurate works cited statement are associated with poor methodological quality or inflated results. Fact checking citation accuracy can be labour intensive, but there is a critical mass of people who do this sort of work already, reflected in the growing literature on mis-citation.1235689192021222324253032
So, declarations do seem to have beneficial value. They are not a perfect, systemic failsafe, but they do provide authors, editors, peer reviewers, and journal readers with additional opportunities to promote essential quality control. Similar cases can be made for the required declaration of institutional review board approval, author contributions, informed consent, and emerging declarations of the role of AI. As thousands of erroneous citations now raise legitimate questions about inaccuracies in published research, there is a clear case for adopting similar safeguards in this arena as well. To make such a declaration meaningful, however, it must be enforced.42
As a final strategy, we propose that journals add two internal questions for reviewers: were any improper citations noted during the review of this paper? And did you recommend the authors to cite any studies in which you are a coauthor or otherwise have a vested interest? If the answer to either question is yes, reviewers must provide specific details, which are sent to the editor. We propose that a submitted study with erroneous citations should be withdrawn from consideration if the errors are pervasive, mischaracterise the background or methods, inappropriately shape interpretation of the results, or otherwise betray a serious lack of expected due diligence. For manuscripts with less egregious mis-citation, we recommend that reviewers and editors still adopt a low threshold to mandate revisions, as “there is no good reason to allow . . . inexact and non-verifiable referencing to pervade scientific literature.”6 We further propose that an editor should closely scrutinise situations in which a reviewer has recommended citation of their own works.20 If already published when the errors are caught, authors should be asked to amend the work (without additional fees29), which is eminently possible in the digital era. Critically, we extend investigators the benefit of the doubt: many inaccurate citations are simply the result of honest mistakes. Thus, this higher standard is primarily meant to deter carelessness and promote good practice. As is the case broadly in medicine, primary prevention is often the best policy.
Mis-citation has heretofore been inadequately tackled. By acknowledging the fault lines in current practice, prioritising the development of a rigorous and standardised classification scheme for inadequate citation, and codifying accurate literature review into a routine pre-submission declaration, we can strive to better enshrine integrity into medical scholarship. We encourage journals to consider adopting a pre-submission declaration. It has the potential to deter inappropriate manuscript submissions and facilitate correction after publication, with little added cost or inconvenience. Over time, the higher standard might also instil greater respect among the next generation of young investigators for this fundamental pillar of scientific inquiry. Most importantly, if it yields improvement in overall citation quality, this will help the scientific literature better function as an accurate web of knowledge.
We thank Vianna Quach, Dianne Wade, and Alexandra Alvarez for expert editorial feedback on early drafts of this work.
Works cited statement: The authors certify that all works cited were read in full by at least one author at the time of writing this manuscript; are necessary to support the intellectual foundation of the work; were added without undue coercion; and do not reflect inappropriate self-citation. We affirm that this document cites primary sources whenever possible, transparently discloses the source of all secondary information presented, and accurately represents both the objective findings and earnest spirit of all works cited.
Funding: This work received no funding or financial support of any kind.
Contributors and sources: This work was conceived by NP, who wrote the first draft and acts as the guarantor for this work. LLY and TO provided critical review that helped shape the key intellectual output of this work. NP holds an MSc from Duke University in the US and is currently a 4th year MD student at Baylor College of Medicine and a Schwarzman Scholar at Tsinghua University. LLY is head of non-communicable disease research and tenured faculty at Duke Kunshan University in China. She has published over 150 studies in global health, many involving complex multi-country randomised controlled trials. TO is a physician and epidemiologist. He has published over 690 peer reviewed studies and holds professorships at Duke University, Duke Kunshan University, and Duke-NUS University in Singapore. This work is the product of three experienced, professionally and culturally diverse researchers who disdain corner cutting in research and want to advocate for integrity and quality in the biomedical sciences.
Patient involvement: No patients were involved in the creation of this manuscript.
Competing interests: We have read and understood BMJ policy on declaration of interests and have no interests to declare.