Why does a single size of metrics not fit all? What is indicated by the indicators?

blog20180611 In a CWTS-blog entitled “Responsible metrics: One size doesn’t fit all,” (March 29, 2018), Ludo Waltman contributes to a discussion about the Leiden Manifesto (Hicks, Wouters, Waltman, de Rijcke, & Rafols, 2015) and the discussion of “professional and citizen bibliometrics” (Leydesdorff, Wouters, & Bornmann, 2016), with the argument that the role of expert assessment in scientometric evaluations is different at the micro- or macro-level. At the micro-level, he agrees with the principle of the Leiden Manifesto that “quantitative evaluation should support qualitative, expert assessment.” The expert’s role at the macro-level, however, can be defined as “combining – in a balanced and well-informed manner – many different pieces of information that together provide the big picture of macro-level research performance.” In other words, the expert should guide macro-level evaluations, while micro-level evaluations require embedding of the qualitative and quantitative aspects in each other.

In my opinion, expertise should not only be ‘informed’ by micro- or macro-level policy contexts but also be ‘theoretically informed’. An expert provides one possible window on the expertise under evaluation. As Waltman signals, other forms of expertise and thus theoretical contexts are needed at the macro or the micro level because the systems under study are different. Similarly, one can expect the substantive expertise needed in the case of evaluating an academic hospital to be different from evaluating, for example, a humanities department studying the history and philosophy of science. Expertise is domain-specific knowledge with the status of a ‘proto-theory’: an expert articulates (intuitive and informed) assumptions about the systems and dynamics under study.

In addition to a perspective on the field, however, an expert has a position and therefore specific interests in her field of expertise. For example, an expert may assume that the organization under study may be able to use an evaluation for improving its scientific performance in terms of publications and citations, or that one can be informed strategically about how to position the organization internationally (e.g., van Vree et al., 2017). The institutional and intellectual orientations among experts can be expected to vary. Furthermore, complex macro-questions can sometimes be addressed with scientometric indicators, but without the need of expert advice; for example, whether “interdisciplinarity” can be stimulated by policy incentives? Or is international collaboration a better stimulus to interdisciplinarity? Which are the pros and cons?

I therefore think that the issue is not the enrollment of an expert, but the access to expertise. Improving the incorporation of relevant knowledge and expertise means opening the black box of the systems under study. Sometimes, this may require specific expertise; in other cases, one may wish to hear expertise and a second opinion. How can one use the evaluations and the relevant feedback for developing a research agenda about the mediations between organized knowledge production and evaluation processes (Dahler-Larsen, 2014)? The scientometric indicators can be improved (in terms of their validity) by developing the proto-theoretical assumptions of the experts further into a theory about the systems dynamics to be evaluated. The theoretical elaboration may lead to counter-intuitive insights. Experts, however, tend to operate on the basis of intuitions potentially less transparent. Different from integration into a ‘vision’ of an expert,, the differentiation of perspectives in scholarly communications can also be appreciated. The normative evaluation and the analytical perspectives in the fields of study to be evaluated can be expected to generate frictions that inform us about relevant contexts.

For example, in the case of social-impact studies, the assumption of knowledge having an “impact” as an output (the so-called “linear” input/output model) is common among experts and practitioners, but at odds with theorizing about the development of science communication (e.g., Bauer, 2009; Guetzkow, Lamont, & Mallard, 2004). Similarly, the enlightenment-driven ideal of Open Access, does not sufficiently understand what this may mean for the organization of knowledge production in a department: if there are scarce resources for author processing fees, who is entitled to publish? The old-boys networks or is the money prioritized for PhD students and postdocs? Not incidentally, most funding agencies champion Open Access: it creates a new dependency relation for academic research by shifting control from the editorial process to institutional decision making.

Evaluations can be improved by a further understanding of the system under study. Naïve policy measures—for example, based on input/output models—predictably lead to unintended consequences (Ashby, 1958). The expert herself, for example, may not be aware of her position in the system under study. The evaluations can be used and misused in power games; beyond control. Our academic task is to critically assess the scientometric enterprise as not a purpose in itself, but as a tool in evaluation practices and at the same time a constitutive contribution to science, technology, and innovation studies as an academic enterprise (Wyatt et al., 2018).

An understanding of what is indicated by the indicators can improve the scientometric enterprise. Citations, for example, are not necessarily an indicator of “quality” or “impact.” In fields with a rapidly developing research front, authors have to position their papers in relation to other groups and programs by making references. In slower moving fields, citation patterns may crystallize within the time-frame of ten or more years (Price, 1970). In the former case, one does not measure “quality” but “centrality,” and in the latter one measures “codification.” Provocatively, we have called this the “impact fallacy” (Leydesdorff, Bornmann, Comins, & Milojević, 2016) which adds to the possibility of an ecological fallacy when judging authors or institutions in terms of the impact factors of the journals in which they publish.

In sum, the intellectual issue is not to improve the social acceptability of scientometric evaluations by inserting expertise-based semantics in the reports or manifestos. Such a synthesis conceals the problems from a practical (that is, policy) perspective. For example, adherence to the Ten Principles for research metrics of the Leiden Manifesto legitimates—sometimes erroneous—practices, but it does not by itself provide analytical and empirical research perspectives. For example, the normalization of citation data central to Principle 6 of the Leiden Manifesto—“Account for variation by field in publication and citation practices”—is often (e.g., in the case of using the indicator MNCS, that is, the Mean Normalized Citation Score) based on field delineations that have not been updated (Principle 9) despite their known weaknesses (Pudovkin & Garfield, 2002, p. 1113, fn. 1; Leydesdorff & Bornmann, 2016; van Eck et al., 2013). The tensions between normative and analytical perspectives are pervasive and provide us with points of access for raising salient research questions in science and technology studies (Giddens, 1976).

In a policy context, one may have to compromise on pragmatic grounds; but in an academic context, the tensions among perspectives do not have to be released. The initial step, in my view, is to consider specific expertise as a window on relevant scholarly discourses. Importing relevant insights from these discourses into scientometrics as the quantitatively-oriented branch of science and technology studies may help to clarify the differences between the (quasi-)industry of indicator production, the use of these reports by clients, and the longer-term research program of academic units (such as CWTS) active in this field.

References

Ashby, W. R. (1958). Requisite variety and its implications for the control of complex systems. Cybernetica, 1(2), 1-17.

Bauer, M. W. (2009). The evolution of public understanding of science—discourse and comparative evidence. Science, technology and society, 14(2), 221-240.

Dahler-Larsen, P. (2014). Constitutive Effects of Performance Indicators: Getting beyond unintended consequences. Public Management Review, 16(7), 969-986.

Giddens, A. (1976). New Rules of Sociological Method. London: Hutchinson.

Guetzkow, J., Lamont, M., & Mallard, G. (2004). What is Originality in the Humanities and the Social Sciences? American Sociological Review, 69(2), 190-212.

Hicks, D., Wouters, P., Waltman, L., de Rijcke, S., & Rafols, I. (2015). The Leiden Manifesto for research metrics. Nature, 520, 429-431.

Leydesdorff, L., & Bornmann, L. (2016). The operationalization of “fields” as WoS subject categories (WCs) in evaluative bibliometrics: The cases of “library and information science” and “science & technology studies”. Journal of the Association for Information Science and Technology, 67(3), 707-714. doi: 10.1002/asi.23408

Leydesdorff, L., Bornmann, L., Comins, J., & Milojević, S. (2016). Citations: Indicators of Quality? The Impact Fallacy. Frontiers in Research Metrics and Analytics, 1(Article 1). doi: 10.3389/frma.2016.00001

Leydesdorff, L., Wouters, P., & Bornmann, L. (2016). Professional and citizen bibliometrics: complementarities and ambivalences in the development and use of indicators—a state-of-the-art report. Scientometrics, 109(3), 2129-2150.

Price, D. J. de Solla (1970). Citation Measures of Hard Science, Soft Science, Technology, and Nonscience. In C. E. Nelson & D. K. Pollock (Eds.), Communication among Scientists and Engineers (pp. 3-22). Lexington, MA: Heath.

Pudovkin, A. I., & Garfield, E. (2002). Algorithmic procedure for finding semantically related journals. Journal of the American Society for Information Science and Technology, 53(13), 1113-1119.

Van Eck, N. J., Waltman, L., van Raan, A. F., Klautz, R. J., & Peul, W. C. (2013). Citation analysis may severely underestimate the impact of clinical research as compared to basic research. PLoS ONE, 8(4), e62395.

van Vree, F., Prins, A., Spaapen, J., van Leeuwen, T., Duindam, D., & Boekhout, M. (2017). Handleiding Evaluatie van geesteswetenschappelijk onderzoek volgens het SEP. At https://www.qrih.nl/nl/over-qrih/de-handleiding.

Wyatt, S., Milojević, S., Park, H. W., & Leydesdorff, L. (2017). The Intellectual and Practical Contributions of Scientometrics to STS. In U. Felt, R. Fouché, C. Miller & L. Smith-Doerr (Eds.), Handbook of Science and Technology Studies (4th edition) (pp. 87-112). Boston, MA: MIT Press.

Blog archive

Why does a single size of metrics not fit all? What is indicated by the indicators?

References

About Loet Leydesdorff

No comments