Nominal ISOMERs (Incorrect Spellings Of Medicines Eluding Researchers)—variants in the spellings of drug names in PubMed: a database reviewBMJ 2016; 355 doi: https://doi.org/10.1136/bmj.i4854 (Published 14 December 2016) Cite this as: BMJ 2016;355:i4854
- Robin E Ferner, director and honorary professor of clinical pharmacology12,
- Jeffrey K Aronson, honorary consultant physician and clinical pharmacologist3
- 1West Midlands Centre for Adverse Drug Reactions, City Hospital, Birmingham B18 7QH, UK
- 2Institute of Clinical Science, University of Birmingham, Birmingham, UK
- 3Centre for Evidence Based Medicine, Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
- Correspondence to: R E Ferner
- Accepted 18 July 2016
Objective To examine how misspellings of drug names could impede searches for published literature.
Design Database review.
Data source PubMed.
Review methods The study included 30 drug names that are commonly misspelt on prescription charts in hospitals in Birmingham, UK (test set), and 30 control names randomly chosen from a hospital formulary (control set). The following definitions were used: standard names—the international non-proprietary names, variant names—deviations in spelling from standard names that are not themselves standard names in English language nomenclature, and hidden reference variants—variant spellings that identified publications in textword (tw) searches of PubMed or other databases, and which were not identified by textword searches for the standard names. Variant names were generated from standard names by applying letter substitutions, omissions, additions, transpositions, duplications, deduplications, and combinations of these. Searches were carried out in PubMed (30 June 2016) for “standard name[tw]” and “variant name[tw] NOT standard name[tw].”
Results The 30 standard names of drugs in the test set gave 325 979 hits in total, and 160 hidden reference variants gave 3872 hits (1.17%). The standard names of the control set gave 470 064 hits, and 79 hidden reference variants gave 766 hits (0.16%). Letter substitutions (particularly i to y and vice versa) and omissions together accounted for 2924 (74%) of the variants. Amitriptyline (8530 hits) yielded 18 hidden reference variants (179 (2.1%) hits). Names ending in “in,” “ine,” or “micin” were commonly misspelt. Failing to search for hidden reference variants of “gentamicin,” “amitriptyline,” “mirtazapine,” and “trazodone” would miss at least 19 systematic reviews. A hidden reference variant related to Christmas, “No-el”, was rare; variants of “X-miss” were rarer.
Conclusion When performing searches, researchers should include misspellings of drug names among their search terms.
Variant spellings of drug names can cause confusion, which could lead to serious harm.1 2 Nevertheless, these names are expected to be correctly spelled and indexed in published work. We have tested this assumption, which underlies many search strategies for systematic reviews and meta-analyses of therapeutic interventions.
We defined the following types of drug names:
Standard name: the international non-proprietary name (INN)3 or (if there was no INN) the British Approved Name (BAN; box 1).
Variant name: any deviation in spelling from the standard name that was not itself a standard name in English language nomenclatures, such as BANs or US Adopted Names (USANs). For example, we did not regard thimerosal (USAN) as a transpositional variant of thiomersal (INN), although many papers would be missed by not searching for both.
Hidden reference variant: a variant spelling that, when used as a textword search term in PubMed and other databases, identified publications that were not identified by searching for the standard name as a textword.
Box 1: Some national drug naming systems
A panel of international nomenclature experts assigns recommended international non-proprietary names (rINNs) to drugs, under the aegis of the World Health Organization.
Occasionally, an objection is raised to a name. If agreement cannot be reached, the name remains a proposed INN (pINN). Nearly 5% of all INNs are pINNs. For example, amantadine was proposed in 1965, but it has not become a rINN because an objection remains on file.
The best known national drug naming systems are the British Approved Name (BAN), dénomination commune française (DCF), Japanese Accepted Name for pharmaceuticals (JAN), and US Adopted Name (USAN).
The UK uses the INN as the BAN, except for adrenaline and noradrenaline (INNs epinephrine and norepinephrine). That is not the case elsewhere. For example, compare paracetamol (INN) and acetaminophen (USAN); salbutamol (INN) and albuterol (USAN); rifampicin (INN) and rifampin (USAN); glibenclamide (INN), and glyburide (USAN).
Some compounds that do not have INNs can still have a BAN, using a chemical name—for example, acetylsalicylic acid and glyceryl trinitrate.
Mixtures of drugs do not have INNs. In some cases, BANs have been specially created for such mixtures (eg, co-codamol is the BAN for a mixture of codeine and paracetamol).
Senior pharmacists from hospitals in Birmingham, UK provided 30 examples of drug names that were commonly misspelt on hospital prescription charts. We then chose a control set of 30 drugs at random from the Sandwell and West Birmingham Hospitals NHS Trust formulary. We ran a search in PubMed4 on 30 June 2016 for textword instances of the standard name of each drug and for spelling variants created by the following types of changes:
Substitutions (eg, i to y and vice versa; one unaccented vowel to another vowel or y; soft c to s and vice versa; hard c to k and vice versa; ch to k; f to ph and vice versa; ide to ine and vice versa; m to n and vice versa; th to t; x to ks)
Omissions (eg, prednisolone to pednisolone; propranolol to popranolol or propanolol; omission of final e)
Additions (eg, cotrimoxazole/clotrimazole to clotrimoxazole; addition of final e)
Transpositions (eg, furosemide to fruosemide; filgrastim to filgastrim)
Duplications and deduplications (eg, l to ll and vice versa; n to nn and vice versa)
Combination of changes (eg, gentamicin to gentamycine; amitriptyline to amytriptilin).
We searched for “standard name[tw]” (where tw=textword) and noted the number of hits. We then searched for “variant spelling[tw] NOT standard name[tw]” and added together the number of hits for each name over all its variant spellings. We thus determined the number of hits that would have been missed by searching only for the standard name. We checked whether the retrieved references were systematic reviews, including meta-analyses.
Numbers of hits in PubMed after use of standard names and hidden reference variant spellings
Standard names of the test set of 30 drugs gave 325 979 hits; 160 hidden reference variants produced 3872 hits (1.19%; range 0-2068; median 49). Standard names of the control set gave 470 064 hits. Of 208 possible hidden reference variants, we found 79, which gave 766 hits (0.16%; range 0-115; median 16). Amitriptyline (8530 hits) had 18 hidden reference variant spellings (179 hits; 2.06%), the most variant names for a single standard name (tables 1 and 2⇓).
Types of variant
Table 3⇓ shows frequencies of the different types of spelling variants.
We examined names ending in “micin” in detail. Most of the errors occurred with the standard form “gentamicin” compared with the variant “gentamycin,” which resulted in 21 384 and 1977 hits (9.25%), respectively. The ending “mycin” was also often substituted in fidoxamicin (2.02%) and netilmicin (2.46%; table 4⇓). In contrast, in 19 standard drug names ending in “mycin” (218 415 hits), the hidden reference variant “micin” was rare (157 hits (0.07%); table 5⇓).
Names ending in “in” or “ine” were also likely to generate hidden spelling variants by addition or omission of the final “e.” The 28 standard names of this type in the test and control sets combined yielded 296 973 hits and hidden spelling variants yielded 3450 hits (1.16%), compared with 499 070 hits and 1188 hits (0.24%), respectively, for the other 32 names.
Searches for systematic reviews
We found 87 systematic reviews or meta-analyses that mentioned the standard name gentamicin, 0.41% of all hits for “gentamicin[tw].” We found six further systematic reviews (6.5% of the total) in PubMed after searching for hidden reference variants of gentamicin. In Medline, the equivalent search for “gentamicin.af.” (where af=all fields) identified 141 systematic reviews, with 19 782 hits (0.71%). The hidden reference variants, with 863 hits, identified 15 additional systematic reviews (9.6% of the total).
Similarly, for amitriptyline, we found 179 systematic reviews in PubMed and another five as hidden reference variants. Corresponding numbers were 110 and six for mirtazapine and 47 and two for trazodone. Thus, for these drugs, 19 systematic reviews of 455 (4.2%) would have been missed by searching for the standard spellings only.
A variant index score
We scored various features of standard names as follows:
Number of letters
Number of syllables
Number of unaccented vowels + 1
Number of incidences of i and y + 1
Number of incidences of f or ph + 1
Number of potential duplications or deduplications (l, m, n, s, t) + 1
Standard names ending in “in” or “ine” (no=1, yes=2)
Standard names ending in “micin” (no=1, yes=2).
The product of these factors, a variant index score, was on average much higher in the test group (range 54-4480; median 524) than the control group (range 36-1440; median 272).
We have uncovered a potential indirect harm from incorrect variant spellings of drug names that has not previously been investigated, to our knowledge, although others have reported misspelt general medical textwords in Medline8 and misspellings of the word “random” and its derivatives in Medline and EMBASE.9 Difficulties in recognising and distinguishing drug names can lead to clinical harm directly, for example, when one drug name is read as another. Here, we demonstrate the extent to which medical literature searches can be frustrated by textword searches that fail to include variant spellings, since articles referenced only by the variant spelling will remain hidden. PubMed offers the correct spelling (eg, gentamicin) when you enter an incorrect one (eg, gentamycin), but not the other way round—searching for “gentamicin[tw]” does not yield incorrect spellings.
Information in systematic reviews can be lost if the review is indexed under a hidden reference variant and not under the textword for the standard name. The problem is not limited to PubMed. In Medline, 13 systematic reviews were hidden under the variant spelling “gentamycin.” In the Cochrane Database of Systematic Reviews,5 there were 15 systematic reviews of “gentamicin,” but use of the term “gentamycin” identified four otherwise hidden reviews.
The most obvious way to mitigate this problem is for authors and editors to take care over the correct spellings of drug names. Indexing could be improved, especially by ensuring that standard names are always used when it is possible to identify them. However, even with scrupulous indexing, orthographic variants will pose challenges, because one cannot expect indexers to seek out all variant spellings in a paper for inclusion under a MeSH term heading. Researchers could also search for all likely variants as textwords, although this would pose challenges for names with many potential variants. For example, 18 variants of amitriptyline returned 179 hits that would have been hidden using only the standard name.
Another solution is to use wild cards, if available. Medline allows users to search for words that are spelt with alternative letters. For example, a search for “amitriptyline.af.” yields 8092 hits. Searching for “am#tr#pt#line.af.” uncovers all variant spellings with i to y substitutions (and vice versa) in amitriptyline (table 2⇑), revealing 123 hidden reference variants. The textword “am#tr#pt#l*.af.” truncated at the letter l uncovers variants of the last few letters (for example, ending in “lin,” “line,” “llin,” “lline”) without sacrificing specificity, and gives further hits. However, this does not exhaust all variant forms. For example, the hidden reference variant amitiptyline, generated by omission, was missed.
The variant index score that we have calculated from eight important features associated with hidden reference variants affords insight into the likelihood that newly coined names might prove problematic. Combining the index score with Trigram-2b or the Levenshtein distance, which measure how likely names are to be confused,6 7 could help reduce problems with new names.
It has been suggested that all relevant spelling variants should be included in search strategies. However, this recommendation did not refer to incorrectly spelt variants as opposed to variants in standard spelling, such as those between US and UK English (eg, anemia and anaemia), and did not mention drug names.10
Although we systematically generated variants of standard names of drugs (as described in the methods), we could have missed some variants, and underestimated the frequencies of hidden reference variants. In the Xmas spirit, we offer table 6⇓, illustrating other variant spellings.
What is already known on this topic
Spelling errors are not uncommon in databases such as PubMed and Medline
Drug names are frequently misspelt in these databases and in hospital prescription charts
What this study adds
Database searches using only drug names spelt correctly will miss relevant references in which the names are spelt incorrectly
These references, which include systematic reviews, will remain hidden unless searches are also undertaken using possible misspellings
Authors and editors should be more vigilant about spelling drug names correctly, and indexers of databases such as PubMed should cross index incorrect spelling variants to correctly spelt names in both directions
When performing searches involving drug names, researchers should include incorrect spellings among their search terms
Note added in proof: Both authors found it hard to proofread an article intended to contain many variant spellings. We apologise if, inadvertently, we have sometimes spelt drug names correctly.
Contributors: The authors contributed equally to the genesis, analysis, and interpretation of the data and the writing of the paper. REF is the guarantor.
Funding: There was no specific funding for this study.
Competing interests: Both authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: REF and JKA have had their names misspelt from time to time; both have occasionally undertaken research and medicolegal work that has entailed literature searches; both have published articles on nomenclature and definitions. JKA is chairman of the expert advisory group on nomenclature of the British Pharmacopoeia Commission; however, the opinions expressed in this article do not necessarily represent those of that organisation or other members of that group. They declare no other interests.
Ethical approval: Approval was not required.
Data sharing: No additional data available.
The lead author (the manuscript’s guarantor) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/.