Why the impact factor of journals should not be used for evaluating researchBMJ 1997; 314 doi: http://dx.doi.org/10.1136/bmj.314.7079.497 (Published 15 February 1997) Cite this as: BMJ 1997;314:497
- Per O Seglen, professora
- a Institute for Studies in Research and Higher Education (NIFU) Hegdehaugsveien 31 N-0352 Oslo Norway
- Accepted 9 January 1997
Evaluating scientific quality is a notoriously difficult problem which has no standard solution. Ideally, published scientific results should be scrutinised by true experts in the field and given scores for quality and quantity according to established rules. In practice, however, what is called peer review is usually performed by committees with general competence rather than with the specialist's insight that is needed to assess primary research data. Committees tend, therefore, to resort to secondary criteria like crude publication counts, journal prestige, the reputation of authors and institutions, and estimated importance and relevance of the research field,1 making peer review as much of a lottery as of a rational process.2 3
On this background, it is hardly surprising that alternative methods for evaluating research are being sought, such as citation rates and journal impact factors, which seem to be quantitative and objective indicators directly related to published science. The citation data are obtained from a database produced by the Institute for Scientific Information (ISI) in Philadelphia, which continuously records scientific citations as represented by the reference lists of articles from a large number of the world's scientific journals. The references are rearranged in the database to show how many times each publication has been cited within a certain period, and by whom, and the results are published as the Science Citation Index (SCI). On the basis of the Science Citation Index and authors' publication lists, the annual citation rate of papers by a scientific author or research group can thus be calculated. Similarly, the citation rate of a scientific journal—known as the journal impact factor—can be calculated as the mean citation rate of all the articles contained in the journal.4 Journal impact factors, which are published annually in SCI Journal Citation Reports, are widely regarded as a quality ranking for journals and used extensively by leading journals in their advertising.
Use of journal impact factors conceals the difference in article citation rates (articles in the most cited half of articles in a journal are cited 10 times as often as the least cited half)
Journals' impact factors are determined by technicalities unrelated to the scientific quality of their articles
Journal impact factors depend on the research field: high impact factors are likely in journals covering large areas of basic research with a rapidly expanding but short lived literature that use many references per article
Article citation rates determine the journal impact factor, not vice versa
Since journal impact factors are so readily available, it has been tempting to use them for evaluating individual scientists or research groups. On the assumption that the journal is representative of its articles, the journal impact factors of an author's articles can simply be added up to obtain an apparently objective and quantitative measure of the author's scientific achievement. In Italy, the use of journal impact factors was recently advocated to remedy the purported subjectivity and bias in appointments to higher academic positions.5 In the Nordic countries, journal impact factors have, on occasion, been used in the evaluation of individuals as well as of institutions and have been proposed, or actually used, as one of the premises for allocation of university resources and positions.1 6 1 Resource allocation based on impact factors has also been reported from Canada8 and Hungary9 and, colloquially, from several other countries. The increasing awareness of journal impact factors, and the possibility of their use in evaluation, is already changing scientists' publication behaviour towards publishing in journals with maximum impact,9 10 often at the expense of specialist journals that might actually be more appropriate vehicles for the research in question.
Given the increasing use of journal impact factors—as well as the (less explicit) use of journal prestige—in research evaluation, a critical examination of this indicator seems necessary (see box).
Problems associated with the use of journal impact factors
Journal impact factors are not statistically representative of individual journal articles
Journal impact factors correlate poorly with actual citations of individual articles
Authors use many criteria other than impact when submitting to journals
Citations to “non-citable” items are erroneously included in the database
Self citations are not corrected for
Review articles are heavily cited and inflate the impact factor of journals
Long articles collect many citations and give high journal impact factors
Short publication lag allows many short term journal self citations and gives a high journal impact factor
Citations in the national language of the journal are preferred by the journal's authors
Selective journal self citation: articles tend to preferentially cite other articles in the same journal
Coverage of the database is not complete
Books are not included in the database as a source for citations
Database has an English language bias
Database is dominated by American publications
Journal set in database may vary from year to year
Impact factor is a function of the number of references per article in the research field
Research fields with literature that rapidly becomes obsolete are favoured
Impact factor depends on dynamics (expansion or contraction) of the research field
Small research fields tend to lack journals with high impact
Relations between fields (clinical v basic research, for example) strongly determine the journal impact factor
Citation rate of article determines journal impact, but not vice versa
Is the journal impact factor really representative of the individual journal articles?
Relation of journal impact factor and citation rate of article
For the journal's impact factor to be reasonably representative of its articles, the citation rate of individual articles in the journal should show a narrow distribution, preferably a Gaussian distribution, around the mean value (the journal's impact factor). Figure 1) shows that this is far from being the case: three different biochemical journals all showed skewed distributions of articles' citation rates, with only a few articles anywhere near the population mean.11
The uneven contribution of the various articles to the journal impact is further illustrated in figure 2): the cumulative curve shows that the most cited 15% of the articles account for 50% of the citations, and the most cited 50% of the articles account for 90% of the citations. In other words, the most cited half of the articles are cited, on average, 10 times as often as the least cited half. Assigning the same score (the journal impact factor) to all articles masks this tremendous difference—which is the exact opposite of what an evaluation is meant to achieve. Even the uncited articles are then given full credit for the impact of the few highly cited articles that predominantly determine the value of the journal impact factor.
Since any large, random sample of journal articles will correlate well with the corresponding average of journal impact factors,12 the impact factors may seem reasonably representative after all. However, the correlation between journal impact and actual citation rate of articles from individual scientists or research groups is often poor9 12 (fig 3). Clearly, scientific authors do not necessarily publish their most citable work in journals of the highest impact, nor do their articles necessarily match the impact of the journals they appear in. Although some authors may take journals' impact factors into consideration when submitting an article, other factors are (or at least were) equally or more important, such as the journal's subject area and its relevance to the author's specialty, the fairness and rapidity of the editorial process, the probability of acceptance, publication lag, and publication cost (page charges).13
Journal impact factors are representative only when the evaluated research is absolutely average (relative to the journals used), a premise which really makes any evaluation superfluous. In actual practice, however, even samples as large as a nation's scientific output are far from being random and representative of the journals they have been published in: for example, during the period 1989-93, articles on general medicine in Turkey would have had an expected citation rate of 1.3 (relative to the world average) on the basis of journal impact, but the actual citation was only 0.3.14 The use of journal impact factors can therefore be as misleading for countries as for individuals.
Journal impact factors are calculated in a way that causes bias
Apart from being non-representative, the journal impact factor is encumbered with several shortcomings of a technical and more fundamental nature. The factor is generally defined as the recorded number of citations within a certain year (for example, 1996) to the items published in the journal during the two preceding years (1995 and 1994), divided by the number of such items (this would be the equivalent of the average citation rate of an item during the first and second calendar year after the year of publication). However, the Science Citation Index database includes only normal articles, notes, and reviews in the denominator as citable items, but records citations to all types of documents (editorials, letters, meeting abstracts, etc) in the numerator; citations to translated journal versions are even listed twice.15 16 17 Because of this flawed computation, a journal that includes meeting reports, interesting editorials, and a lively correspondence section can have its impact factor greatly inflated relative to journals that lack such items. Editors who want to raise the impact of their journals should make frequent reference to their previous editorials, since the database makes no correction for self citations. The inclusion of review articles, which generally receive many more citations than ordinary articles,17 18 is also recommended. Furthermore, because citation rate is roughly proportional to the length of the article,19 journals might wish to publish long, rather than short, articles. If correction were made for article length, “communications” journals like Biochemical and Biophysical Research Communications and FEBS Letters would get impact factors as high as, or higher than, the high impact journals within the field, like Journal of Biological Chemistry.20 21
The use of an extremely short term index (citations to articles published only in the past two years) in calculating the impact factor introduces a strong temporal bias, with several consequences. For example, articles in journals with short publication lags will contain relatively many up to date citations and thus contribute heavily to the impact factors of all cited journals. Since articles in a given journal tend to cite articles from the same journal,22 rapid publication is self serving with respect to journal impact, and significantly correlated with it.23 Dynamic research fields with high activity and short publication lags, such as biochemistry and molecular biology, have a correspondingly high proportion of citations to recent publications—and hence higher journal impact factors—than, for example, ecology and mathematics.23 24 Russian journals, which are cited mainly by other Russian journals,25 are reported to have particularly long publication lags, resulting in generally low impact factors.26 Pure technicalities can therefore account for several-fold differences in journal impact.
Limitations of the database
The Science Citation Index database covers about 3200 journals8; the estimated world total is about 126 000.27 The coverage varies considerably between research fields: in one university, 90% of the chemistry faculty's publications, but only 30% of the biology faculty's publications, were in the database.28 Since the impact factor of any journal will be proportional to the database coverage of its research field, such discrepancies mean that journals from an underrepresented field that are included will receive low impact factors. Furthermore, the journal set in the database is not constant but may vary in composition from year to year.24 29 In many research fields a substantial fraction of scientific output is published in the form of books, which are not included as source items in the database; they therefore have no impact factor.30 In mathematics, leading publications that were not included in the Science Citation Index database were cited more frequently than the leading publications that were included.31 Clearly, such systematic omissions from the database can cause serious bias in evaluations based on impact factor.
The preference of the Science Citation Index database for English language journals28 will contribute to a low impact factor for the few non-English journals that are included,32 since most citations to papers in languages other than English are given by other papers in the same language.25 27 33 The Institute for Scientific Information's database for the social sciences contained only two German social science journals, whereas a German database contained 542.34 Specifically, American scientists, who seem particularly prone to citing each other,33 35 dominate these databases to such an extent (over half of the citations) as to raise both the citation rate and the mean journal impact of American science 30% above the world average,14 the rest of the world then falling below average. This bias is aggravated by the use of a short term index: for example, in American publications within clinical medicine, 83% of references in the same year were to other papers by American scientists (many of them undoubtedly self citations), a value 25% higher than the stable level reached after three years (which would, incidentally, also be biased by self citations and citations of other American work).33 Thus, both the apparent quality lead of American science and the values of the various journal impact factors are, to an important extent, determined by the large volume, the self citations, and the national citation bias of American science,27 in combination with the short term index used by the Science Citation Index for calculating journal impact factors.
Journal impact factors depend on the research field
Citation habits and citation dynamics can be so different in different research fields as to make evaluative comparisons on the basis of citation rate or journal impact difficult or impossible. For example, biochemistry and molecular biology articles were cited about five times as often as pharmacy articles.33 Several factors have been found to contribute to such differences among fields of research.
The citation impact of a research field is directly proportional to the mean number of references per article, which varies considerably from field to field (it is twice as high in biochemistry as in mathematics, for example).24 Within the arts and humanities, references to articles are hardly used at all, leaving these research fields (and others) virtually uncited,36 a matter of considerable consternation among science administrators unfamiliar with citation kinetics.37
In highly dynamic research fields, such as biochemistry and molecular biology, where published reports rapidly become obsolete, a large proportion of citations are captured by the short term index used to calculate journal impact factors, as previously discussed38 –but fields with a more durable literature, such as mathematics, have a smaller fraction of short term citations and hence lower journal impact factors. This field property combines with the low number of references per article to give mathematics a recorded citation impact that is only a quarter that of biochemistry.24
In young and rapidly expanding research fields, the number of publications making citations is large relative to the amount of citable material, leading to high citation rates for articles and high journal impact factors for the field.39 40
In a largely self contained research field, the mean article (or journal) citation rate is independent of the size of the field,41 but the absolute range will be wider in a large field, meaning higher impact factors for the top journals.42 Such differences become obvious when comparing review journals, which tend to top their field (table 1)). Leading scientists in a small field may thus be at a disadvantage compared with their colleagues in larger fields, since they lack access to journals of equally high citation impact.43
Most research fields are, however, not completely self contained, the most important field factor probably being the ability of a research field to be cited by adjacent fields. The relation between basic and clinical medicine is a case in point: clinical medicine draws heavily on basic science, but not vice versa. The result is that basic medicine is cited three to five times more than clinical medicine, and this is reflected in journal impact factors.42 44 42 The outcome of an evaluation based on impact factors in medicine will therefore depend on the position of research groups or institutions along the basic-clinical axis.33
In measures of citation rates of articles, attempts to take research field into account often consist of expressing citation rate relative to some citation impact specific to the field.46 Such field corrections range from simply dividing the article's citation rate by the impact factor of its journal28 (which punishes publication in high impact journals) to the use of complex, author specific, field indicators based on reference lists47 48 (which punishes citations to high impact journals). However, field corrections cannot readily be applied to journal impact factors, since many research fields are dominated by one or a few journals, in which case corrections might merely generate relative impact factors of unit value. Even within large fields, the tendency of journals to subspecialise with certain subjects is likely to generate significant differences in journal impact: in a single biochemical journal there was a 10-fold difference in citation rates in subfields.19
Is the impact of an article increased by publication in a high impact journal?
It is widely assumed that publication in a high impact journal will enhance the impact of an article (the “free ride” hypothesis). In a comparison of two groups of scientific authors with similar journal preference who differed twofold in mean citation rate for articles, however, the relative difference was the same (twofold) throughout a range of journals with impact factors of 0.5 to 8.0.12 If the high impact journals had contributed “free” citations, independently of the article contents, the relative difference would have been expected to diminish as a function of increasing journal impact.49 These data suggest that the journals do not offer any free ride. The citation rates of the articles determine the journal impact factor (a truism illustrated by the good correlation between aggregate citation rates of article and aggregate journal impact found in these data), but not vice versa.
If scientific authors are not detectably rewarded with a higher impact by publishing in high impact journals, why are we so adamant on doing it? The answer, of course, is that as long as there are people out there who judge our science by its wrapping rather than by its contents, we cannot afford to take any chances. Although journal impact factors are rarely used explicitly, their implicit counterpart, journal prestige, is widely held to be a valid evaluation criterion50 and is probably the most used indicator besides a straightforward count of publications. As we have seen, however, the journal cannot in any way be taken as representative of the article. Even if it could, the journal impact factor would still be far from being a quality indicator: citation impact is primarily a measure of scientific utility rather than of scientific quality, and authors' selection of references is subject to strong biases unrelated to quality.51 52 For evaluation of scientific quality, there seems to be no alternative to qualified experts reading the publications. Much can be done, however, to improve and standardise the principles, procedures, and criteria used in evaluation, and the scientific community would be well served if efforts could be concentrated on this rather than on developing ever more sophisticated versions of basically useless indicators. In the words of Sidney Brenner, “What matters absolutely is the scientific content of a paper, and nothing will substitute for either knowing or reading it.”53
Conflicxt of interest: None.