After having heard Björn Brembs speech (see http://dx.doi.org/10.7557/5.3226) at the 9th Munin Conference, and re-reading some of P O Seglen‘s articles on the Impact Factor (IF), I decided to try to write out some of my reflections on the use, or abuse, of the IF on my blog. My interest in the IF lies primarily in the fact that it is a major obstacle to a stronger and quicker transition from toll access (TA) to open access (OA) publishing. This was also a strong point made by Claudio Aspesi at the 6th Conference on Open Access Scholarly Publishing (COASP) in Paris, 2014. It is not in his PowerPoint, but an important point he made was that as long as IF is important for researchers and research evaluators, OA will have problems in taking a major market share.
What are the problems with the Impact Factor?
There are numerous problems with the IF. This could possibly be grouped thematically. A major problem is that the IF is owned by a private company, that has strong financial interests in keeping the IF working as it is. It is owned by Thomson Reuter, and is published in their ISI Web of Science Journal Citation Reports.
The data the IF is based on
- The IF is based on a counting of citations, and takes for granted that a citation is a sign of scientific quality. This is a highly debatable position, and it is untenable at the micro level (i.e. article or author level). There are various reasons for citing, some of them negative. And Seglen (1991) notes that “a citation is primarily a measure of utility rather than of quality”
- Using citations as the only basis, means you only look at the importance of an article for research itself (only research cites). Other impacts, e.g. on society, is completely ignored. And citations only gives credit to authors, not to other participants in the process of creating an article, like peer reviewers (who can have great influence on the final article) or editors.
- The data the IF is based on is only a small portion of the data available. The IF only takes into account citations from journals (leaving e.g. monographs out), and only a fraction of all journals are taken into account. The sample of journals is highly skewed towards STM (science, technology and medicine), while HSS (humanities and social sciences) is only superficially covered. There is also a strong language bias towards English language journals, leaving other major world languages out of it.
- It is at best unclear what may be cited, and what may cite. There is also strong signs that what counts and what is counted is negotiable, Brembs shows an example of this in his presentation (see above).
- Different kind of content will invariably receive different rates of citation, it is a well-known fact that method articles and reveiw articles are cited much more than other content. The IF is heavily influenced by this. Articles describing negative results are generally rarely cited.
- Citation patterns are very different between different fields, so are co-authorship patterns. This also influences the IF, the first more so than the latter. There is e.g. a marked difference in the number of references in an article between different fields, some of this may be inherent in how science and writing is performed, but may also be due to different citation norms in different fields. What is common knowledge, and what needs a citation, may differ. And research shows that citing is a very imprecise activity, most articles contains citations that should not be there, and lacks others. The IF has to be “field normalized” to take this into account, the published IF is not field normalized. And field normalization is not necessarily without problems – how to you define a field, and to which field does a journal belong?
The way the IF is calculated
- A general problem with the formula for calculating the IF, is that few researchers – even the believers – know how it is done. Björn Brembs presentation contains a very nice illustration showing how it is calculated. The math is simple, but it could still be confusing to many researchers.
- The IF only counts citations in one year, to articles in the two preceding years. We know that the average time an article may be cited varies widely between fields, in some fast-moving fields a two-year window may be appropriate, in other fields it is wholly inappropriate. In scholarly fields a citing article may use two years from being written to being published, hence citing too late to count even if the cited article is fresh off the press when the citing article is written. The IF thus measures the velocity of scientific advance rather than the quality of the research published.
- The IF is an average (number of citations divided by the number of items that could be cited). Averages are a wonderful instrument, but only under some circumstances. One is that the numbers you look at are centered around the average – a distribution approaching a normal distribution.
Figure 4 from Lundberg (2006:16)(C) Jonas Lundberg. Reprinted with permission from Lundberg.
Lundberg (2006:16) shows in Figure 4 a typical distribution of citations. His numbers are for 424 480 life science articles published in 2000, counting citations 2000–2006. You need little statistical knowledge to see that this distribution has no similarity to a normal distribution, it is extremely skewed. The average or mean (corresponding to the IF) is 16, the median (i.e. the middle value) is 8 and the mode (most typical value) is 0. The average (mean) is thus not even remotely representative of the underlying data. Even in my high school math textbook (Hamre 1973:28) on descriptive statistics it is clearly adviced against using the mean as a measure when the data are skewed. We see from Lundberg’s figure that the wast majority of articles receives significantly fewer citations than the average indicates, and a minority receives more, some even extremely much more, citations than the average indicates.
The way the IF is manipulated
We know that the IF is dependent upon many decisions made along the way, both by authors and by editors. And we know that editors need citations in their branding and marketing of the journal to possible authors. Among practices – some quite respectable, some not so – that influences the IF are:
- Editors using non-scientific citing items – e.g. editorial content – to promote articles in their own journals, thus earning a citation and increasing their own IF.
- Citations-on-demand: Editors or peer-reviewers asking/demanding that the author puts in one or more citation to their articles/journals
- Editors weeding out articles, irrespective of scientific quality, that seems to have little potential for being cited. This means that it is nearly impossible to have negative results published in a high-ranking journal. Paradoxically, as also little-cited articles cite, their being published is important to high-impact journals in order to get a high IF. High IF journals need low-impact journals.
- Editors soliciting highly cited article types, most common are review articles and method articles. These are useful articles for authors. Review articles review and sum up recent research in a field, and are often cited instead of the original papers reviewed there. One could debate whether review articles themselves are research articles, as their function is one of evaluation. Method articles are referred to by authors using a specific method that is described and validated in the article.
- Editors sorting content that is rarely cited out from the scientific articles into other categories, so that they are not counted among the number of articles that goes into the IF formula as cited items. They will, however, often still be citing items.
- As the time span for getting citations that will count towards the IF is the two years following the year of publication, it is important that the article is published early in the year, so that as many readers as possible will have a chance to cite it in the following year. An article published in December will get few citations the following year, as it will take time for articles citing it getting written, reviewed and published. Some authors have noticed a very skewed distribution of articles over the year. I suspect this is one of the reasons for the increasingly popular phenomenon of “online before print” under various names, that makes the article available for reading and use for citing, before it is formally published – even though paper is immaterial (sorry about the pun) in todays’ dissemination of scientific knowledge. Articles may now “risk” being cited before they are published …
The way the IF is (ab)used
The IF is to some extent used for what it was intended for: To evaluate usefulness of journals for library collections. But it has increasingly been used to evaluate individual research and researchers, when employing, promoting or funding. It is not intended for this use, and from the above we should agree it is not suited for the purpose either – even if we were to agree to the shaky premises it is based upon. When the IF of the journal in which an article is published is used as a measure of the quality of that research, instead of evaluating the research itself, we will err in one of two possible ways: As the IF is an average in an extremely skewed distribution, we will assign too high a quality to most articles, hence over-employ, over-promote, over-pay and over-finance the more mediocre and less interesting research. At the same time we will overlook but under-employ, under-promote, under-pay and under-finance the few outstanding examples of research which are hidden in the IF.
The IF does not work as an instrument of evaluation of individual research. To quote Seglen (1980) again: “Clearly, journal impact factors cannot be used even for an approximate estimate of real article impact, and would be grossly misleading as evaluation parameters.” Fortunately, some institutions have realized this, and wow not to use the IF for evaluation in this way. The San Francisco Declaration on Research Assessment (DORA) (http://am.ascb.org/dora/) is an initiatie to reduce the (ab-)use of the Impact Factor in research evaluation, and also in the marketing and branding of journals. And the RCUK Policy on Open Access and Supporting Guidance (RCUK = Research Councils UK) document contains a promise to evalute research and not where it is printed: “When assessing proposals for research funding RCUK considers that it is the quality of the research proposed, and not where an author has or is intending to publish, that is of paramount importance; […]” The Wellcome Trust has a similar promise.
There is hope. One hopes …
Brembs, B (2014) When decade-old functionality would be progress – the desolate state of our scholarly infrastructure. Keynote speech at the 9th Munin Conference, Tromsø November 26th 2014. http://dx.doi.org/10.7557/5.3226
Hamre, A (1973) Beskrivende statistikk Del 3. Aschehoug, Oslo 1973.
Lundberg, J (2006) Bibliometrics as a research assessment tool: impact beyond the impact factor. Dissertation for the degree of Ph.D.at Karolinska Institutet 2006 http://hdl.handle.net/10616/39489
Seglen, P O (1991) Citation frequency and journal impact: valid indicators of scientific quality? Journal of Internal Medicine 1991:229 109-111