Impact measures of this kind are inexact and should not, in our view, be relied on for a detailed ranking of research achievements (it could be described as ‘a scale that can distinguish an elephant from a rabbit but not a horse from a cow’).
(from an evaluation report in economics)
In recent years there has been an increasing interest in how evaluation systems and resource allocation models affect research. A central question is how an increased focus on performance that is quantifiable affects researchers' practices and priorities. A number of documented and possible effects have been identified on a more general level (de Rijcke et al., 2016), but empirical studies of how bibliometric is used in evaluating individuals are few. Assessment at the individual level is difficult to study empirically. As a result, previous discussions were based primarily on individual examples and anecdotes. At the same time, this use of indicators is particularly important for individual careers as employment and research funding are at stake.
In our paper, Indicators as judgment devices (Hammarfelt and Rushforth, 2017) we draw on 188 referee reports in which independent reviewers rank candidates for lectureships and professorships in biomedicine and economics. This material allows us to study how indicators and journal rankings are used to make judgments on - and differentiate between - candidates. Our interpretation is that the referees use indicators as an aid to legitimate decisions, and our framework builds on Lucien Karpik (2010) and his theory of 'judgment devices'. According to this theory, such devices are used if the consumer (in this case the referee) needs to choose between a range of products (in this case candidates) that cannot be easily compared. Whenever there is a wealth of good candidates, as is the case in many of the reports we studied, ‘judgment devices’ will provide means for settling situations with an abundance of possible options.
The most famous bibliometric indicator is undoubtedly the Journal Impact Factor (JIF) (Rushforth and de Rijcke, 2015)1 . The JIF is used widely in our materials, primarily in biomedicine but also in economics. In many biomedical cases, the JIF is used as a benchmark for what should be regarded as a good journal and a lower limit for ’quality’ of around 3 seems to apply. In economics it is considerably more difficult to find an exact limit, although experts suggest 0.5 and 0.9 as possible target values. Broadly speaking, even many of the statements that do not explicitly use the JIF measure still refer to ‘high impact’ journals.
The h-index focuses on the evaluation of individual researchers. Unlike JIF, which measures the average citation rate of journals, the h-index is an attempt to summarize a researcher's combined productivity and influence. In our study, the h-index is usually used as background information, for instance alongside affiliation, age, or gender, as is the case here:
XXXX publishes in good to very good journals including Plos Genetics, FASEB J, and Mol Biol Cell. H–factor=18. (referee report in biomedicine).
H-index becomes what we call a ‘totalizing indicator’ where an entire career can be summarized in a single figure, and we also find that the measure coincides with the overall assessment of candidates in several cases.
Whereas citations and JIF are popular evaluation tools in biomedicine, economists tend to assess the value of articles based on classifications and lists of journals. The frequent use of journal rankings - we found five different lists in our material! - can be related to a tendency in economics to organize itself hierarchically. Another explanation is that citations and the JIF are less applicable in economics compared to biomedicine, due to the lower volume and rate of articles and citations in the former.
Our material shows how indicators can be combined and compared, and it is not uncommon for domain experts to be knowledgeable in bibliometrics. A clear example of how candidates are compared through different types of indicators is this table with quantifiable data about authorship, publications and citations.
(From referee report in biomedicine)
This table suggests an ambitious expert with significant knowledge in bibliometrics: the table is accompanied by no less than eight footnotes explaining the importance and weaknesses of the individual dimensions. An interesting detail is that the expert uses the median value of quotations rather than the average, a statistically well-founded practice given the highly skewed distribution of citations.
Toward Citizen Bibliometrics
Our findings lead us to argue that bibliometric indicators should be understood as ‘evaluation tools’ that are integrated into disciplinary evaluation practices. Furthermore, depending on how a field is organized, different types of indicators, such as citations in biomedicine or journals in economics, will become dominant. Some in the scientometrics field have referred to this type of bibliometric practice as ‘amateurish’ and warned of its propagation (e.g. Gläser and Laudel, 2007). However, we would argue that experts in these Swedish evaluation documents often appear to be relatively knowledgeable regarding the strengths and weaknesses of the indicators. Furthermore, they possess unique skills compared to professional bibliometrics, as they have knowledge of the value that these measures are attributed within their own discipline. Rather than amateurs, they may perhaps be better called ‘citizen bibliometricians’ (Leydesdorff et al., 2016). To be clear, our position is not that the use of bibliometrics in these contexts is unproblematic: indicators of this kind should be used with great caution at the individual level. Yet to completely reject these measures as inappropriate across all contexts would, in our view, be premature (see also Waltman and Traag, 2017 who reach a similar conclusion on the use of the JIF, albeit via different means).2
de Rijcke, S., Wouters, P. F., Rushforth, A. D., Franssen, T. P. & Hammarfelt, B. (2016). Evaluation practices and effects of indicator use—a literature review. Research Evaluation, 25, 161-169.
Gläser, J. & Laudel, G. (2007). The social construction of bibliometric evaluations. The changing governance of the sciences. Springer.
Hammarfelt, B. & Rushforth, A. D. (2017). Indicators as judgment devices: An empirical study of citizen bibliometrics in research evaluation. Research Evaluation, Res Eval rvx018.
Karpik, L. (2010). Valuing the unique: The economics of singularities. Economics Books.
Leydesdorff, L., Wouters, P. & Bornmann, L. (2016). Professional and citizen bibliometrics: complementarities and ambivalences in the development and use of indicators—a state-of-the-art report. Scientometrics, 109, 2129-2150.
Rushforth, A. & de Rijcke, S. (2015). Accounting for Impact? The Journal Impact Factor and the Making of Biomedical Research in the Netherlands. Minerva, 53, 117-139.
Waltman, L. & Traag, V., A. (2017). Use of the journal impact factor for assessing individual articles need not be wrong. arXiv.org > cs > arXiv:1703.02334.
1 See also the blog by Sarah de Rijcke