• Blog
  • A farewell to the MNCS indicator?

A farewell to the MNCS indicator?



Is the mean normalized citation score (MNCS), one of the most commonly used field-normalized scientometric indicators, fundamentally flawed as an indicator of scientific performance? This is the topic of a sharp debate in the scientometric community. Journal of Informetrics, of which I am the Editor-in-Chief, today published a special section in which a number of prominent scientometricians provide their viewpoint on this question. The debate opens with a provocative paper by Giovanni Abramo and Ciriaco Andrea D’Angelo, two well-known Italian scientometricians. They argue that the MNCS and other similar indicators are invalid indicators of scientific performance. Their critical perspective can be highly relevant, because the MNCS and related indicators are widely used in evaluations of research groups and scientific institutes.

To illustrate the essence of the debate, let us take a look at an example. Suppose we have two research groups in the field of physics. These groups have exactly the same resources, for instance the same number of researchers and the same amount of funding. Suppose further that during a certain time period group X has produced 60 publications, of which there are 40 that, after normalization for field and publication year, have a citation score of 2.0 and 20 that have a citation score of 0.5. In the same time period, group Y has produced 100 publications, half of them with a normalized citation score of 2.0 and half of them with a normalized citation score of 0.5. Based only on this information, which of the two research groups seems to be the one that is performing better?

Using the standard scientometric indicators that we use at CWTS, the answer to this question would be that research group X is performing better. At CWTS, we normally use the MNCS indicator – and other so-called size-independent indicators, such as the PP(top 10%) indicator – to answer questions like the above one. The MNCS indicator equals the average normalized citation score of the publications of a research group. The indicator has a value of (40 x 2.0 + 20 x 0.5) / 60 = 1.50 for group X and a value of (50 x 2.0 + 50 x 0.5) / 100 = 1.25 for group Y. Hence, on average the publications of group X have a higher normalized citation score than the publications of group Y, and therefore according to the MNCS indicator group X is performing better than group Y. The same answer would be obtained using indicators analogous to the MNCS that are used by other scientometric centers and that are provided in commercial scientometric analysis tools, such as InCites and SciVal.

Does the reasoning followed by the MNCS indicator make sense, and should research group X indeed be considered the one that is performing better? A counterargument could be as follows. Groups X and Y have exactly the same resources. Using these resources, group Y outperforms group X in terms of both its number of highly cited publications (50 vs. 40 publications with a normalized citation score of 2.0) and its number of lowly cited publications (50 vs. 20 publications with a normalized citation score of 0.5). In other words, using the same resources, group Y produces 10 more highly cited publications than group X and 30 more lowly cited publications. Since group Y is more productive than group X both in terms of highly cited publications and in terms of lowly cited publications, the conclusion should be that group Y is the one that is performing better.

This reasoning is followed by Abramo and D’Angelo in their paper A farewell to the MNCS and like size-independent indicators. This paper is the starting point for the debate that is taking place in the above-mentioned special section of Journal of Informetrics. Abramo and D’Angelo argue that research group Y is performing better in our example, and they conclude that indicators such as the MNCS, according to which group X is the better one, are fundamentally flawed as indicators of scientific performance. Since the MNCS and other similar indicators are widely used, the conclusion drawn by Abramo and D’Angelo could have far-reaching consequences.

Are Abramo and D’Angelo right in claiming that in our example research group Y is performing better than research group X? I agree that group Y should be considered the better one, and therefore I believe that Abramo and D’Angelo are indeed right. The next question of course is why the MNCS indicator fails to identify group Y as the better one. The reason is that the MNCS indicator does not use all information available in our example. In our example, groups X and Y have exactly the same resources, but this information is not used by the MNCS indicator. The MNCS indicator takes into account only the outputs produced by the groups, so the publications and their citations, and it does not take into account the resources available to the groups. For this reason, the MNCS indicator is unable to find out that group Y is performing better than group X. This is a fundamental problem of the MNCS indicator and of any scientometric indicator that does not take into account the resources research groups have available.

One may wonder why indicators such as the MNCS are being used so frequently even though they suffer from the above-mentioned fundamental problem. As I explain in a response (free preprint available here) that I have written to the paper by Abramo and D’Angelo, the answer is that in most cases there does not seem to be a better alternative. In my response, co-authored with CWTS colleagues Nees Jan van Eck, Martijn Visser, and Paul Wouters, it is pointed out that information about the resources of a research group usually is not available, and if this information is available, the accuracy of the information is often questionable. For this reason, scientometric assessments of the performance of research groups usually need to be made based only on information about the outputs of the groups. Since such assessments are based on incomplete information, they provide only a partial indication of the performance of research groups. This is the case for scientometric assessments in which the MNCS indicator is used, but in fact it always applies when indicators are used that take into account only the outputs of a research group.

A more extensive discussion of the limitations of indicators such as the MNCS, and of the proper use of these indicators, can be found in the above-mentioned special section of Journal of Informetrics. The special section includes eight responses to the paper by Abramo and D’Angelo. These responses, written by scientometricians with diverse backgrounds and ideas, provide a rich set of perspectives on the fundamental criticism of Abramo and D’Angelo on commonly used scientometric indicators. Gunnar Sivertsen for instance offers an interesting plea for methodological pragmatism, allowing for a diversity of indicators each providing different insights and serving different purposes. Mike Thelwall argues that collecting the data required for calculating the indicators advocated by Abramo and D’Angelo is too expensive. Lutz Bornmann and Robin Haunschild even claim that employing the indicators proposed by Abramo and D’Angelo may be undesirable because of the risk of misuse by politicians and decision makers.

Many different viewpoints can be taken on the criticism of Abramo and D’Angelo, but probably the main lesson to be learned from the debate is that we need to continuously remind ourselves of the limitations of scientometric indicators. As scientometricians we need to be humble about the possibilities offered by our indicators.


About Ludo Waltman

Senior researcher and deputy director of CWTS. Ludo leads the Quantitative Science Studies (QSS) research group. His core research interests focus on the analysis and visualization of bibliometric networks and the development of scientometric indicators.


2 comments

Mandatory fields
  • Loet Leydesdorff May 31st, 2016 5:57 pm
    Dear Ludo and colleagues,
    The Mean Normalized Citation Score (MNCS) was proposed by Waltman et al. (2011a and b) in response to a critique of the previous “crown indicator” (CPP/FCSm; Moed et al., 1995) of the Leiden Center for Science and Technology Studies (CWTS). The old “crown indiator” had been based on a mistake against the order of operations prescribing that one should first multiply and divide and only thereafter add and substract (Opthof & Leydesdorff, 2010; cf. Gingras & Larivičre, 2011). The new “crown indicator” repaired this problem, but did not sufficiently reflect on two other problems with these “crown indicators”: (1) the use of the mean and (2) the problem of field delineation. Field delineation is needed in evaluation practices because one cannot compare citation scores in different disciplines.
    1. In a reaction to the above discussion, Bornmann & Mutz (2011) proposed to use percentile ranks as a non-parametric alternative to using the means of citation distributions for the normalization. Note that the Science and Engineering Indicators of the U.S. National Science Board have used percentile ranks (top-1%, top-10%, etc.) since decades. Citation distributions are often skewed and the use of the mean can then not be advised. At the time (2011), we joined forces in a paper entitled “Turning the Tables of Citation Analysis One More Time: Principles for comparing sets of documents,” warning, among other things, against the use of mean-based indicators as proposed by CWTS (Leydesdorff, Bornmann, Mutz, & Opthof, 2011). Indeed, the Leiden Rankings provide the top-10% as a category since 2012 (Waltman et al., 2012), but most evaluation practices are still based on MNCS.
    2. Field delineation is an unresolved problem in evaluative bibliometrics (Leydesdorff, 2008). Like its predecessor the new “crown indicator” uses the Web-of-Science Subject Categories (WCs) for “solving” this problem. However, these categories are notoriously flawed: some of them overlap more than others and journals have been incrementally categorized during decades. The system itself is a remnant of the early days of the Science Citation Index with some patchwork (Pudovkin & Garfield, 2002: 1113n). In other words, the problem is not solved: many journals are misplaced and WCs can be heterogeneous. Perhaps, the problem is not clearly solvable because the journals are organized horizontally in terms of disciplines and vertically in terms of hierarchies. This leads to a complex system that may not be unambiguously decomposable. The consequential uncertainty in the decomposition can be detrimental to the evaluation (Rafols et al., 2012).
    Is the current discussion laying the ground work for the introduction of a next “crown indicator”? We seem to be caught in a reflexive loop: on the assumption that policy makers and R&D managers ask for reliable indicators, CWTS and other centers need to update versions when too many flaws become visible in the results. In the meantime, the repertoires have been differentiated: one repertoire in the journals covering “advanced scientometrics improving the indicators,” another one in the reports legitimating evaluations based on “state of the art”, and a third one issuing STS-style appeals to principles in evaluation practices (e.g., “the Leiden manifesto”; Hicks et al., 2015).
    References
    Bornmann, L., & Mutz, R. (2011). Further steps towards an ideal method of measuring citation performance: The avoidance of citation (ratio) averages in field-normalization. Journal of Informetrics, 5(1), 228-230.
    Garfield, E., Pudovkin, A. I., & Istomin, V. S. (2003). Why do we need algorithmic historiography? Journal of the American Society for Information Science and Technology, 54(5), 400-412.
    Gingras, Y., & Larivičre, V. (2011). There are neither “king” nor “crown” in scientometrics: Comments on a supposed “alternative” method of normalization. Journal of Informetrics, 5(1), 226-227.
    Hicks, D., Wouters, P., Waltman, L., de Rijcke, S., & Rafols, I. (2015). The Leiden Manifesto for research metrics. Nature, 520, 429-431.
    Leydesdorff, L. (2008). Caveats for the Use of Citation Indicators in Research and Journal Evaluation. Journal of the American Society for Information Science and Technology, 59(2), 278-287.
    Leydesdorff, L., Bornmann, L., Mutz, R., & Opthof, T. (2011). Turning the tables in citation analysis one more time: Principles for comparing sets of documents Journal of the American Society for Information Science and Technology, 62(7), 1370-1381.
    Moed, H. F., De Bruin, R. E., & Van Leeuwen, T. N. (1995). New bibliometric tools for the assessment of national research performance: Database description, overview of indicators and first applications. Scientometrics, 33(3), 381-422.
    Opthof, T., & Leydesdorff, L. (2010). Caveats for the journal and field normalizations in the CWTS (“Leiden”) evaluations of research performance. Journal of Informetrics, 4(3), 423-430.
    Pudovkin, A. I., & Garfield, E. (2002). Algorithmic procedure for finding semantically related journals. Journal of the American Society for Information Science and Technology, 53(13), 1113-1119.
    Rafols, I., Leydesdorff, L., O’Hare, A., Nightingale, P., & Stirling, A. (2012). How journal rankings can suppress interdisciplinary research: A comparison between innovation studies and business & management. Research Policy, 41(7), 1262-1282.
    Waltman, L., van Eck, N. J., van Leeuwen, T. N., Visser, M. S., & van Raan, A. F. J. (2011a). Towards a new crown indicator: An empirical analysis. Scientometrics, 87, 467–481.
    Waltman, L., Van Eck, N. J., Van Leeuwen, T. N., Visser, M. S., & Van Raan, A. F. J. (2011b). Towards a New Crown Indicator: Some Theoretical Considerations. Journal of Informetrics, 5(1), 37-47.
    Waltman, L., Calero-Medina, C., Kosten, J., Noyons, E., Tijssen, R. J., Eck, N. J., . . . Wouters, P. (2012). The Leiden Ranking 2011/2012: Data collection, indicators, and interpretation. Journal of the American Society for Information Science and Technology, 63(12), 2419-2432.
    Reply
    • Gangan Prathap June 1st, 2016 6:18 am
      Indicator or variable Unit X Unit Y Percentage advantage
      S, input 100 100 0.00
      P, output (0th order) 60 100 66.67
      C, outcome (1st order) 90 125 38.89
      i, impact or quality 1.5 1.25 -16.67
      X, outcome (2nd order) 135 156.25 15.74
      P/S, output productivity 0.6 1 66.67
      C/S, outcome productivity 0.9 1.25 38.89
      X/S, outcome productivity 1.35 1.5625 15.74

      E, outcome (2nd order) 165 212.5 28.79
      η 0.82 0.74 -10.13
      Ludo's example is summarized using simple "thermodynamic" analogies in the table above. There is a bibliometric inner-core (P-i-C-X) and an econometric outer-core (S-P/S-C/S-X/S). S and P are the primary size-dependent parameters. i, the impact, is a size-independent proxy for quality, like MNCS. On this, Unit X performs better than Unit Y. P is a zeroth-order measure of output, while C and X are first- and second-order measures of outcome. C and X are composite indicators. On all these, Unit Y performs better than Unit X.
      One can carry the thermodynamic analogy further to distinguish exergy from energy. Then a measure of consistency emerges: here, Unit X is ahead of Unit Y.
      Reply
Share on:
Subscribe to:
Build on Applepie CMS by Waltman Development