References that anyone can edit: review of Wikipedia citations in peer reviewed health science literatureBMJ 2014; 348 doi: http://dx.doi.org/10.1136/bmj.g1585 (Published 06 March 2014) Cite this as: BMJ 2014;348:g1585
- M Dylan Bould, staff anesthesiologist1,
- Emily S Hladkowicz, research assistant2,
- Ashlee-Ann E Pigford, research assistant2,
- Lee-Anne Ufholz, director3,
- Tatyana Postonogova, research associate4,
- Eunkyung Shin, research assistant5,
- Sylvain Boet, staff anesthesiologist6
- 1Department of Anesthesiology, Children’s Hospital of Eastern Ontario, University of Ottawa, 401 Smyth Road, Ottawa, ON, Canada, K1H 8L1
- 2Department of Anesthesiology, Ottawa Hospital Research Institute, Ottawa
- 3Health Sciences Library, University of Ottawa, Ottawa
- 4Allan Waters Family Simulation Centre, Li Ka Shing Knowledge Institute, St Michael’s Hospital, Toronto, ON, Canada
- 5Department of Surgery, St Michael’s Hospital, University of Toronto, Toronto
- 6Department of Anesthesiology, The Ottawa Hospital, University of Ottawa, Ottawa
- Correspondence to: M D Bould
- Accepted 10 February 2014
Objectives To examine indexed health science journals to evaluate the prevalence of Wikipedia citations, identify the journals that publish articles with Wikipedia citations, and determine how Wikipedia is being cited.
Design Bibliometric analysis.
Study selection Publications in the English language that included citations to Wikipedia were retrieved using the online databases Scopus and Web of Science.
Data sources To identify health science journals, results were refined using Ulrich’s database, selecting for citations from journals indexed in Medline, PubMed, or Embase. Using Thomson Reuters Journal Citation Reports, 2011 impact factors were collected for all journals included in the search.
Data extraction Resulting citations were thematically coded, and descriptive statistics were calculated.
Results 1433 full text articles from 1008 journals indexed in Medline, PubMed, or Embase with 2049 Wikipedia citations were accessed. The frequency of Wikipedia citations has increased over time; most citations occurred after December 2010. More than half of the citations were coded as definitions (n=648; 31.6%) or descriptions (n=482; 23.5%). Citations were not limited to journals with a low or no impact factor; the search found Wikipedia citations in many journals with high impact factors.
Conclusions Many publications are citing information from a tertiary source that can be edited by anyone, although permanent, evidence based sources are available. We encourage journal editors and reviewers to use caution when publishing articles that cite Wikipedia.
Launched on 15 January 2001, Wikipedia is self described as “a free, collaboratively edited, and multilingual internet encyclopedia supported by the non-profit Wikimedia Foundation.”1 As of 2012 Wikipedia is the largest online reference site,2 3 and it is reported to be the most used online healthcare resource globally.4 However, the assessment of Wikipedia as a credible source for information has been debated since its origin.5 The impermanent nature of Wikipedia entries and concerns about quality have been raised as important matters.6
The literature on Wikipedia has concentrated on evaluating content and ensuring that users have access to appropriate information.5 Only 13% of Wikipedia articles had identifiable errors when assessed by academics.7 Giles and colleagues found that the number of factual errors, omissions, or misleading statements in Wikipedia articles was comparable to the Encyclopaedia Britannica.8 In general, Wikipedia articles reference academic literature,9 10 and medical articles are overseen by WikiProject Medicine, which is driven by editorial oversight.2 4 Several studies have confirmed the accuracy of health specific Wikipedia articles and discuss the potential value in education of patients.2 3 8 11 12 13 14 15 Wikipedia has offered an innovative way to provide free access to information for people around the world.2 4 5 To further improve the quality of these articles, medical professionals are being encouraged to contribute to Wikipedia.2 16 17 18 Studies have also examined groups that use Wikipedia, noting that, in addition to patients and nursing students,2 3 10 19 medical students and residents commonly access Wikipedia to acquire health information.20 21 22 23 Although physicians have been discouraged from relying on Wikipedia,16 one study showed that in practice use of Wikipedia is as high as 70% among junior physicians.22 In general, consultation of Wikipedia is a growing trend among academics. For example, among more than 1000 authors in Nature, 17% reported that they consulted Wikipedia on a weekly basis.8 As the public and healthcare professionals increasingly accept Wikipedia as a source of information, new questions emerge such as the appropriateness of citing Wikipedia in academic publications.
The controversial nature of citing Wikipedia as a reference source for academic information is threefold. Firstly, a theoretical concern exists that anyone with access to the internet can alter Wikipedia. This raises questions about the spread of unintentional misinformation, which is fuelled by acts of intentional vandalism that have gained media attention and have challenged the public’s perception of Wikipedia.24 25 26 However, the use of a wiki model addresses the fear that that the information on Wikipedia is not guaranteed to be correct. The concept of a wiki is based on the assumption that the majority is correct: if someone writes an erroneous statement on Wikipedia, the probability is considered to be high that someone else from the majority that is presumed to be correct will identify the error. This model has many advantages and allows Wikipedia to be a free source of a vast amount of collaborative information.2 4 5 However, the scientific community still has concerns about the academic integrity of a model that in theory could be edited by anyone.
Secondly, the changing nature of Wikipedia makes permanent versions difficult to access. Although Wikipedia maintains a detailed history of previous Wikipedia pages, current academic citation systems rarely require a citation to include time stamped access information detailed to the second. As Wikipedia pages are in constant change, unlike paper encyclopaedias in which readers can confidently find the permanent cited source, readers might find confirming that the Wikipedia reference is the exact version cited challenging, particularly if detailed time stamps are not included.27
Thirdly, citing tertiary sources such as Wikipedia, which are resources that compile or provide digests of secondary sources, has literary problems.28 Secondary sources are books, articles, or unpublished literature that provide an interpretation of primary sources, which include original data, manuscripts, records, or documents. International guidelines state that authors should provide direct references to original research sources,29 so citing Wikipedia or any other tertiary source in the academic literature opposes literary practice. To date, only one study has examined the frequency of citation of Wikipedia in the academic literature.30 To our knowledge, no one has examined the frequency of Wikipedia citations in the health science literature and the ways in which Wikipedia citations are used. We aimed to evaluate the prevalence of Wikipedia citations in indexed health science journals, identify the health science journals that publish articles with Wikipedia citations, and determine how Wikipedia is being cited.
In February 2012 one investigator (LU) searched the ISI Web of Science and Scopus online databases to identify articles in the peer reviewed health science literature that had directly cited Wikipedia since its inception in 2001. She searched references of all articles in both databases on the same day for the word “Wikipedia” or any possible derivation to account for spelling errors. The word could appear anywhere in the reference, so not all articles necessarily referenced Wikipedia but could be articles about Wikipedia. To focus on indexed health science literature, we refined the results by using Ulrich’s Database to identify articles from journals indexed in at least one of Medline, PubMed, or Embase. We retrieved full text for the articles (including reviews, original research, editorials, letters to the editor, and case reports) identified in the search through the University of Ottawa library and associated Canadian inter-library loan services. We excluded citations when articles were not written in English or when full text was unavailable. Although citations originated in the health science literature, the topic of the citation could extend beyond health, so we included all Wikipedia pages that were cited in the academic literature. In the event that an article cited Wikipedia more than once, we considered all citations and created duplicate entries for that article.
We used Thomson Reuters Journal Citation Reports (JCR) to collected journals’ 2011 impact factors (a measure that reflects how frequently the average article in a journal has been cited in a particular period).31 32 JCR impact factors provide quantitative evidence about the position of one journal in relation to the competition and offer an approximation of the prestige of a journal. When interpreting the impact factor, one should consider only journals within the same subject category as the scores are relative.32 JCR coverage includes the world’s leading journals and offers a systematic, quantifiable means to critically evaluate them by measuring the influence and impact of research at the journal and category levels, as well as showing the relation between citing and cited journals. We did not recover other journal metrics because of the unique nature of JCR and its pre-eminence as a proxy to measure a journal’s importance within a field.
Data extraction/coding system
Using an iterative process, we developed a descriptive coding strategy. Three investigators (MDB, SB, ESH) reached consensus on the distinction between categories. After independently reviewing 15% of citations, the three investigators met as a team to discuss their coding strategies. They developed an initial coding strategy and recoded citations with the new guide. The team continued this iterative process of coding, meeting, and refining the guide before the three authors reached agreement and the coding strategy was finalised. Emergent sub-codes were integrated into the coding system as they arose. Using the coding system, one investigator (ESH) systematically coded all Wikipedia citations. To ensure coding reliability, another investigator (AEP) coded 100 random citations with the same coding system.
We used SPSS 17.0 for statistical analysis. We calculated descriptive statistics for the frequency of Wikipedia citations by thematic code, the impact factor of journals citing Wikipedia (median, range), and the frequency of these statistics by year. We used the intra-class correlation coefficient to assess the inter-rater reliability.
We recovered 2359 publications (2307 from Scopus and an additional 52 from Web of Science) in our search. After excluding duplicates, non-English journals, and articles that did not directly cite Wikipedia, we retrieved full text for 1433 articles from 1008 indexed journals, with 2049 Wikipedia citations. In total, 2011 impact factors were available for 1420 citations in 980 articles from 650 indexed journals and were found in the JCR database.
Table 1⇓ shows the frequency of Wikipedia citations since 2001, the nature of citations (thematic code), and related impact factor information (median and range). The intra-class correlation coefficient for the two coders was 0.91 (P<0.001), indicating a high degree of inter-rater reliability. More than half of the citations were coded as definitions (n=648; 31.6%) or descriptions (n=482; 23.5%). For the purpose of this study, of the 13 categories, we considered citations (n=82; 4.0%) from only two categories (Citations about Wikipedia, and Wikipedia used in methods) to be the most appropriate uses of Wikipedia, as in these cases Wikipedia was the original source of information. Furthermore, we recovered 97 (4.8%) citations in which Wikipedia was cited in place of an original research study.
The median impact factor of journals citing Wikipedia was 2.0 and has remained fairly consistent over time. However, the total number of Wikipedia citations has increased each year since 2004 except for between 2009 and 2010. The figure⇓ shows the number of articles that cite Wikipedia at least once by year up to 21 November 2013. Consistently since the inception of Wikipedia, journals with high impact factors have continued to publish references to Wikipedia. Table 2⇓ illustrates the journals with the top 25 highest impact factors that emerged in our study. These journals accounted for 2.2% (n=3) of cases in which Wikipedia was the most appropriate citations (as previously defined). Table 3⇓ specifies the titles and types of articles published in these journals, including the number of Wikipedia citations in each article. Journals with available impact factors that cited Wikipedia more than 10 times included Explore: The Journal of Science and Healing (n=53), Child’s Nervous System (n=26), BMC Bioinformatics (n=21), Theoretical Biology and Medical Modelling (n=20), Journal of Medical Internet Research (n=17), Health Information and Libraries Journal (n=13), BMJ (n=13), JALA—Journal of the Association for Laboratory Automation (n=12), and American Journal of Forensic Medicine and Pathology (n=11).
Our findings illustrate a relatively small but increasing frequency of citations of Wikipedia in indexed health sciences literature. We retrieved 2049 Wikipedia citations, in 1433 articles from 1008 journals indexed in Medline, PubMed, or Embase. Most of these citations occurred after December 2010 and were consistently found in journals with low impact factors and journals without impact factors, as well as in journals with high impact factors, including Nature, Science, and the BMJ. Using the descriptive coding strategy, we found a wide variety of uses for Wikipedia citations in the literature, the most common being definitions and descriptive statements.
Strengths and limitations
As the first study to describe the citation practices of Wikipedia (how, where, when, and so on) in indexed health science literature, our study adds insight into the role that Wikipedia may play in academic literature. We also provide information about the quality of the journals that cite Wikipedia, measured through impact factors from the Journal Citation Reports. The study searched citations from two well recognised databases (Scopus and Web of Science) and included all English language journals available in full text at Canadian university libraries, so some citations may have been excluded from the search because of the search criteria. However, the study used two comprehensive databases that probably captured most citations from journals with reaching influence as well as high impact factors. Although we report an increasing number of Wikipedia citations, the health science literature is expansive; therefore, papers that have cited Wikipedia remain a small minority of all published papers, albeit with increasing frequency.
Comparison with other studies
Active research on Wikipedia has examined the role of academic citations in Wikipedia, content validity, and interactions between Wikipedia and user groups.5 The only other study to examine Wikipedia in the scholarly references was completed by Park in 2011.30 In this bibliometric analysis, Park also used the ISI Web of Science and Scopus databases to identify the total number of studies (n=1746), as well as leading authors, their institutional affiliations, most frequent publication sources, main academic fields, and other statistics on the frequency with which scholarly articles cite Wikipedia. Our study took Park’s analysis further. Firstly, we used Ulrich’s database to refine search findings to be specific to health sciences literature. Secondly, we examined how Wikipedia is being cited. Both studies show a general increase in frequency of citations over time, suggesting that if Park’s data were to extend beyond 2010 they would continue to show increasing Wikipedia citations. While Park identified the publications that cited Wikipedia most often, our study also provides impact factors, adding information about journals’ impact.
Contextualisation and policy implications
Just because more researchers are citing Wikipedia does not necessarily justify it as a valid source of information for citation. Wikipedia itself has cautioned against using Wikipedia as a source,33 and some universities have gone so far as to ban students from citing Wikipedia.34 Recognising that learning modalities are changing and evolving increasingly towards online and e-resources,21 23 35 36 we believe that ensuring that peer reviewed academic literature aligns with the International Committee of Medical Journal Editors (ICMJE) guidelines is still important. Relevant to this study, the ICMJE guidelines state that “Readers should therefore be provided with direct references to original research sources whenever possible.”29 Although people have been argued that Wikipedia is comparable to an encyclopedia in terms of accuracy,8 37 this argument overlooks the literary problems associated with citing a tertiary source intended to direct researchers to an appropriate reference.34 We echo the findings of other researchers who believe that Wikipedia should not be cited when a more authoritative—that is, primary, permanent, peer reviewed, evidence based—source exists6; however, Wikipedia may be the most appropriate source to cite for a definition of Wikipedia and in situations in which Wikipedia is used as part of the scientific methods (for example, a search strategy). Outside of those rare instances, arguing that citation of Wikipedia in the academic literature is appropriate is difficult. Our “original research” category further illustrates how Wikipedia was often cited when the authors could have cited an original research study. For example, one article stated that “Some researchers propose that vitamin D supplementation may be beneficial in the treatment and prevention of some types of cancer,”38 and rather than citing the original sources provided on the Wikipedia page, the author instead cited Wikipedia.
Evidence based medicine requires that clinical decisions are made using professional expertise in conjunction with replicable evidence from systematic research.39 As Wikipedia entries are constantly changing, they may be difficult to access in the form that was originally cited by the time the work is published, reducing the reader’s ability to reproduce the author’s process.27 In a study by Peoples,6 most citations to Wikipedia did not reference the date and time at which the entry was visited. Although Wikipedia does have a history function that allows users to access previous versions of the page, this function may not be obvious to most users, and determining which version was accessed for citations that do not denote the date and time of the visit can be difficult. If Wikipedia is to be cited in the future, not only the date accessed but also the time should be included so that future researchers can access the entry as it was originally viewed. Thus, the process of peer review should include close attention to the references that appear in articles before recommending them for publication and potentially approving information that contradicts evidence based research.
Although the proportion of retrieved Wikipedia citations in the indexed health science literature was relatively small, the increasing trend of citation is still important to note. The study of how Wikipedia is used in the academic literature is relatively young, and beginning an open discourse around the appropriateness of using non-permanent Wikipedia citations in the academic literature seems timely. The relationship between academic publication and Wikipedia remains largely understudied, and international guidelines such as those of the ICJME, World Association of Medical Editors, and Council of Science Editors lack editorial guidance on the subject. Although our study does not provide evidence of harm by these citations, even those that substitute for a permanent primary research source, we emphasise the need for a consistent voice on how Wikipedia should be used in the academic literature. This study begs the broader question of “what is an appropriate reference source?” Is it something that is generated by the scientific method, is replicable, and undergoes rigorous peer review or is it something that is collaboratively generated, readily open to editing, and broadly accepted? We call attention to the need to work with the Wikipedia community to establish guidelines not only for reviewers and editors but also for academics drafting publications.
An increasing number of peer reviewed academic papers in the health sciences are citing Wikipedia. The apparent increase in the frequency of citations of Wikipedia may suggest a lack of understanding by authors, reviewers, or editors of the mechanisms by which Wikipedia evolves. Although only a very small proportion of citations are of Wikipedia pages, the possibility for the spread of misinformation from an unverified source is at odds with the principles of robust scientific methodology and could potentially affect care of patients. We caution against this trend and suggest that editors and reviewers insist on citing primary sources of information where possible.
What is already known on this topic
The use of Wikipedia as a source of academic information has been debated since its origin, but it is increasingly cited in peer reviewed health science literature
Although studies have examined the content and vailidity of Wikipedia, no evidence has been published about how this resource is being used in health science literature
What this paper adds
Although a few instances exist that may warrant using Wikipedia as a reference, Wikipedia is often cited when permanent, evidence based sources are available
Authors, reviewers, and editors should use caution when publishing articles that include Wikipedia citations
Cite this as: BMJ 2014;348:g1585
Contributors: MDB and SB were involved in the conception and design of the study, the analysis of the data, and critical review of the manuscript. ESH was involved in the conception and design of the study, the analysis and interpretation of data, and critical review of the manuscript. AEP was involved in the analysis and interpretation of data and drafted the manuscript. LU was involved in the conception and design of the study, interpretation of the data, and critical review of the manuscript. TP and ES were involved in collection and analysis of data and critically revised the manuscript. All authors had full access to all data and reviewed the final version to be published. MDB and SB are the guarantors.
Funding: The study was funded by the Children’s Hospital of Eastern Ontario Department of Anesthesiology, University of Ottawa, and the Ottawa Hospital Department of Anesthesiology, University of Ottawa. The funding sources had no role in the study design; the collection, analysis, and interpretation of data; or the writing of the article and the decision to submit it for publication.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not needed. All the data presented are in the public domain. Citation information and full text articles were acquired through the ISI Web of Science and Scopus databases. As the accepted practice is not to seek consent to report on information in the public domain, we did not do so for this study, and nor did we seek approval from our research ethics board.
Transparency: The lead author (MDB) affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
Data sharing: No additional data available.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/.