Use of the journal impact factor for assessing individual articles need not be wrong

Without any doubt, the journal impact factor (IF) is one of the most debated scientometric indicators. Especially the use of the IF for assessing individual articles and their authors is highly controversial. Most scientometricians reject this way of using the IF. They argue that the IF tells something about a journal as a whole and that it is statistically incorrect to extend its interpretation to individual articles in a journal. The well-known San Francisco Declaration on Research Assessment, which has received widespread support in the scientific community, also strongly objects against the use of the IF at the level of individual articles. Even Clarivate Analytics, the company that calculates the IF, advices against IF-based assessment of individual articles.

Today we published a paper in the arXiv in which we take a different position. We argue that statistical objections against the use of the IF for assessing individual articles are not convincing, and we therefore conclude that the use of the IF at the level of individual articles is not necessarily wrong, at least not from a statistical point of view.

The most common statistical objection against the use of the IF for assessing individual articles is based on the skewness of journal citation distributions. The distribution of citations over the articles published in a journal is typically highly skewed. The figure below for instance illustrates this for Journal of Informetrics. The figure shows that 20% of the articles published in Journal of Informetrics in 2013 and 2014 are responsible for 55% of the citations contributing to the 2015 IF of the journal. Similar results can be obtained for other journals. An extensive analysis of the skewness of journal citation distributions can be found in a paper published last year by Vincent Larivière and colleagues.

Because of the skewness of journal citation distributions, the IF of a journal is not representative of the number of citations received by an individual article in the journal. For this reason, many scientometricians claim that it is statistically incorrect to use the IF at the level of individual articles.

In our paper published today, we argue that this statistical objection against the use of the IF for assessing individual articles is misguided. We agree that the IF is not representative of the number of citations of an individual article, but we do not agree that this necessarily leads to the conclusion that the IF should not be used at the level of individual articles.

The essence of our argument can be summarized as follows. The number of citations of an article can be seen as a proxy of the ‘value’ (e.g., the impact or quality) of the article. However, how accurately citations approximate the value of an article is unclear. Some may consider citations to be a quite accurate indicator of the value of an article, while others may regard citations only as a weak indicator of an article’s value.

When citations are seen as a relatively accurate indicator of the value of an article, the skewness of journal citation distributions implies that the distribution of the values of the articles published in a journal must be highly skewed as well. In this situation, we agree that it is statistically incorrect to use the IF for assessing individual articles.

However, things are different when citations are regarded as a relatively inaccurate indicator of the value of an article. It is then possible that the articles published in a journal all have a more or less similar value. This might for instance be due to the peer review system of a journal, which may ensure that articles whose value is below a certain threshold are not published in the journal. When citations are a relatively inaccurate indicator of the value of an article, some articles in a journal may receive many more citations than others, even though the articles are all of more or less similar value. This may then cause the citation distribution of a journal to be highly skewed, despite the fact that the journal is relatively homogeneous in terms of the values of its articles. In this scenario, assessing individual articles based on citations may be questionable and assessment based on the IF of the journal in which an article has appeared may actually be more accurate.

In our paper, the above argument is elaborated in much more detail. We present an extensive conceptual discussion, but we also report computer simulations to support our argument. The figure below shows the outcome of one of the simulations that we have performed. The horizontal axis indicates how ‘noisy’ citations are as a proxy of the value of an article. The higher the value along this axis, the more citations are affected by noise and the more difficult it is to obtain a good proxy of the value of an article. The vertical axis shows the accuracy of both citations and the IF as indicators of the value of an article. As can be expected, the more citations are affected by noise, the lower the accuracy of both citations and the IF. When there is little or no noise, citations and the IF both have a high accuracy, but the IF is less accurate than citations. Importantly, when the amount of noise increases, the accuracy of citations decreases faster than the accuracy of the IF, and therefore the IF becomes more accurate than citations. For large amounts of noise, citations and the IF both become highly inaccurate.

Our conclusion is that statistical considerations do not convincingly support the rejection of the use of the IF for assessing individual articles and their authors. Whether the use of the IF at the level of individual articles should be rejected and whether article-level citation statistics should be preferred over the IF depends on how accurately citations can be considered to represent the value of an article. Empirical follow-up research is needed to shed more light on this. From a statistical point of view, it is not necessarily wrong to use the IF for assessing individual articles. Scientometricians need to reconsider their opinion on this issue. On the other hand, we recognize that there may be important non-statistical reasons to argue against the use of the IF in the evaluation of scientific research. The IF plays a pervasive role in today’s science system, and this may have all kinds of undesirable consequences. The debate on the IF is highly important and should continue, but it should not be based on misplaced statistical arguments.

Earlier blog posts on the IF and related topics:

About Ludo Waltman

Ludo Waltman is professor of Quantitative Science Studies and scientific director at the Centre for Science and Technology Studies (CWTS) at Leiden University. He is a coordinator of the Information & Openness focal area and a member of the Evaluation & Culture focal area. Ludo is co-chair of the Research on Research Institute (RoRI).

About Vincent Traag

Senior researcher and bibliometric consultant. Vincent's research focuses on complex networks and social dynamics. He holds a Master in sociology and a PhD in applied mathematics, and tries to combine the two in his work.

17 comments

Mamoona December 14th, 2017 11:59 am

hi
dear sir
ludo waltman can you help me out that how to perform bibliometric experiments

Reply
Ivan Sterligov March 16th, 2017 9:08 pm

Dear Ludo and Vincent,
many thanks for a fresh take on this issue!
As a research administrator, I am sometimes fascinated by the short-sightedness of academic scientometricians relying solely on Seglen's arguments and bashing us for using IFs. The last episode of such bashing I've witnessed was by Ronald Rousseau&Co at the Nordic Workshop on Bibliometrics in autumn, see "A replication study: Seglen’s work on journal impact factors".
We use IF quartiles not because they predict citations, but because we believe that Q1 predicts high peer review quality (ofc this is true mainly for hard science, and for some areas within it with low IFs - like math - this prediction could be worse). This simple truth is strangely very difficult to grasp. As a young university in a country with real problems with organizing full-scale international peer review of our faculty, the idea of outsourcing such review via the use of IF quartiles looks definitely not that bad. Because what matters for us is a sound external evaluation of our faculty's recent work.
Are there any studies that show that high relative IF (quartile, percentile, whatever) or SNIP or SJR or AI do not correlate with quality of peer review?

Reply
- Ludo Waltman March 17th, 2017 8:09 am
  
  Thanks Ivan for your comments. I agree that the scientometric research community sometimes has a too strong tendency to bash research managers. Research managers need to make decisions and do not always have the luxury of using the most optimal scientometric methods. In my view, your way of using the impact factor may not be unreasonable in certain situations.
  To my knowledge, the most extensive study comparing journal-level and article-level metrics with peer review outcomes is the Metric Tide study cited in the concluding section of my paper. In that study, journal-level metrics correlate relatively strongly with peer review outcomes.
  
  Reply
- Loet Leydesdorff March 17th, 2017 8:50 am
  
  Dear Igor, Ludo, and colleagues,
  “Because what matters for us is a _sound_ external evaluation of our faculty’s recent work.” At issue is the soundness of the inference from journal-level indicators to the level of individual articles. As argued, this is an ecological fallacy: one loses control about the quality of the prediction. The prediction may be reasonably good, excellent or poor (with different probabilities).
  Ludo follows up with: “In that study, journal-level metrics correlate relatively strongly with peer review outcomes.” Note the “relatively strongly” and the precaution of “in that study”. First, it can be different in another study—one would have to control for this correlation in another study. Second, “relatively strong” may be not good enough at the level of individual researchers (when career decisions have to be taken). Otherwise, I agree that there is a tension between professional and citizen scientometrics (Leydesdorff, Wouters, & Bornmann, 2016).
  In sum, one is dis-advised to use journal measures for the evaluation of individual articles or researchers. It may work, but the chances of errors cannot be neglected. This is not meant to bash research managemers, but just to warn them for potential mistakes.
  Best,
  Loet
  
  Reply
  - Vincent Traag March 17th, 2017 3:45 pm
    
    Dear Loet,
    The point is that neither for citations nor for the IF do we know the error that is made in assessing individual articles. Both will have some degree of error, and depending on which has a higher error, either the one or the other may be preferred (although a combination of both may even work better). In your conclusion you state that the chances of errors cannot be neglected. We agree, but this equally holds for citations. The results from the Metric Tide report indicate that journal metrics do correlate with peer-review results, as Ludo already indicated. In fact, the correlations are comparable to the correlations between citations and peer review. Hence, both citations and IF will entail some degree of error compared to peer-review. Moreover, peer review itself is also not perfect, and will therefore also have some error in assessing the individual articles.
    Best,
    Vincent
    
    Reply
  - Ivan Sterligov March 17th, 2017 4:16 pm
    
    Dear Loet,
    Thanks for your advice. I'd like to ask you a question.
    Would you kindly name some of the journals in JCR 2015 IF Q1 in BIOCHEMISTRY & MOLECULAR BIOLOGY that did not provide sound peer review for (almost all) articles they've published in 2016? (let us put reviews journals aside)
    Best regards,
    Ivan
    PS how do you guys decide who's a pro and who's a meager citizen? Perhaps there's a tiny space inbetween for those who do applied bibliometrics and are well aware of various developments in the field... Those who try to navigate a path between costs, limitations, vice-rectors, ministries, rankings, geopolitics and all sorts of things you meet in the real world.
    Not pretending to be such a person, however :)
    
    Reply
    - Loet Leydesdorff March 18th, 2017 3:58 am
      
      Dear Vincent, Igor, Ludo, and colleagues,
      The issue is now shifting in our exchanges to a discussion of the relative merits of citation rates or journal indicators for impact assessment at the level of individual papers. My point was about using journal indicators and the ecological fallacy implied. In the case of using citation rates, I used the terminology “impact fallacy” in the title (!) of Leydesdorff, Bornmann, Comins, & Milojević (2016) [that is, “Citations: Indicators of Quality? The Impact Fallacy.”]
      I agree with you, Vincent, that one can try to assess empirically how serious the error is against peer review results. However, the results of peer review are often not publicly available for meta-evaluation. In a number of studies, we found the best rejected proposals to be significantly better than the granted ones when tested against bibliometric indicators (Bornmann, Leydesdorff, & Van den Besselaar, 2010; van den Besselaar, 2017; van den Besselaar & Leydesdorff, 2009). These decisions were based on peer review. In my opinion, we more or less understand how old-boys networks (“invisible colleges”) inadvertedly operate. This can also be discussed in terms of the constructivist versus normative theories of citation.
      It seems to me that Igor raises another issue: the use of indicators in evaluation practices. These are normative discourses (e.g., research management and science policies). In normative discourse the issue is not whether an indicator is valid, but whether it is legitimate to use it. Policy makers can excellently work with descriptive or erroneous statistics. For example, the previous Leiden crown indicator functioned excellently for more than a decade in the evaluation (Opthof & Leydesdorff, 2010). Things change sometimes under the pressure of critique. Using Q1 IFs, for example, may lead to Type I and Type II errors. Excellent papers may be published also in the other quartiles.
      Best,
      Loet
      References:
      Bornmann, L., Leydesdorff, L., & Van den Besselaar, P. (2010). A Meta-evaluation of Scientific Research Proposals: Different Ways of Comparing Rejected to Awarded Applications. Journal of Informetrics, 4(3), 211-220. doi: 10.1016/j.joi.2009.10.004
      Leydesdorff, L., Bornmann, L., Comins, J., & Milojević, S. (2016). Citations: Indicators of Quality? The Impact Fallacy. Frontiers in Research Metrics and Analytics, 1(Article 1). doi: 10.3389/frma.2016.00001
      Opthof, T., & Leydesdorff, L. (2010). Caveats for the journal and field normalizations in the CWTS (“Leiden”) evaluations of research performance. Journal of Informetrics, 4(3), 423-430.
      van den Besselaar, P. (2017). Analyzing the quality of funding decisions, a reply. Research Evaluation, 26(1), 53-54.
      van den Besselaar, P., & Leydesdorff, L. (2009). Past performance, peer review, and project selection: A case study in the social and behavioral sciences. Research Evaluation, 18(4), 273-288.
      
      Reply
Stephen Curry March 12th, 2017 2:50 pm

Hi Ludo and Vincent – thanks for drawing to my attention this blogpost and the accompanying paper – which I have now read.
Let me say at the outset that as a “prominent critic of the IF”, I would never accuse you guys of being statistically illiterate… ;-)
Your paper presents an interesting theoretical model of the uncertainty around the meaning and interpretation of the JIF and citation counts in terms of the value of any piece of research (which we can all agree is context-dependent and hard to define). And I think it’s true that, within a specified set of assumptions and constraints, it may be that the JIF can be a more accurate indicator than the citation count of the (undefined) value of a paper. The difficulty I have is in seeing whether this conclusion can reasonably be applied to the real world – outside the limitations of your model.
I don’t have time to go in to all the details but let me highlight a couple of major concerns. The first is illustrated in Figure 1 which shows that only at values of sigma-r squared (the parameter that characterises the accuracy of the peer review system) that are 0.4 or lower does the JIF appear to offer marginally higher accuracy. But as you rightly point out, we have no way of knowing the value of this parameter. And, as Fig 1 shows, at higher values, the JIF offers a less accurate assessment.
It is interesting that all subsequent modelling is done with a value of sigma-r squared set at 0.4 (a situation that favours the JIF). I appreciate this may have been for convenience, to avoid repetitious analysis, but the consequences of this choice for the downstream analysis should perhaps be made clearer.
The second major concern is one that you highlight yourselves in the discussion – that the model does not accurately capture the accumulation of citations that contribute to the JIF (i.e. it uses a different timeframe) and it is not likely to accurately model author behaviour in selecting journals. I wonder does it also capture variation in the reasons behind each citation (though take your point that over all the papers in a journal some of this variation may be self-cancelling). In any case, these real world effects seem to me to be likely to add noise to the smooth curves in your plots, adding additional uncertainty to the regime in which JIF might out-perform citation counts.
For these reasons I think title and abstract of your paper overstate the value of the analysis. Yes, it is an interesting theoretical exercise but my conclusion would be that you have shown that the use of the journal impact factor for assessing individual articles need not be wrong using a theoretical model that partially captures the review and citation process and is valid only by assuming level of accuracy in peer review that has not been justified experimentally. If the aim is for nuance and precision in discussions about the use of impact factors, these caveats and conditions should, in my opinion, be given more prominence.
One final point in regard to the criticism of the statistical criticism of the use of impact factors in your introduction. This emphasises a reliance by critics on the skew of the citation distribution. That is problematic technically because of the use of an arithmetic mean to characterise a non-normal distribution. However, in our preprint on citation distributions published last year (http://biorxiv.org/content/early/2016/09/11/062109) we also emphasised the spread of the distribution. Ultimately, our paper was not about suggesting the primacy of JIFs over citations but about exposing the data more fully for inspection so as to encourage evaluation of published papers on content rather than metrics. My emphasis is on practice and practicalities. As we acknowledged, these pieces of information should not be disregarded entirely but need to be tightly reined in if we are to implement processes of research evaluation that are appropriate and incentivise good work. As you well know, the tendency to over-rely on metrics creates incentives for behaviours that run counter to the higher aims of research and scholarly communication.

Reply
- Vincent Traag March 13th, 2017 2:22 pm
  
  Thank you for the feedback, and for the constructive comments! Good to hear you appreciate our model and — in theory — acknowledge its conclusion. To be sure, the core message of the paper is not that the IF is preferable to citations for assessing individual papers, but that it is a possibility. We show that the IF could be preferable, under some circumstances. The title emphasises this, and makes clear that use of the IF for assessing individual articles need not be wrong. We don't say it is right.
  The choice for sigma_r^2=0.4 is indeed somewhat arbitrary. Nonetheless, there is some reason for choosing some intermediate level of accuracy of the peer review system. Probably no one will accept the idea of a perfectly accurate peer review system, nor will it be realistic to assume that peer review is completely uninformative and has no bearing at all on the value of a paper. The conclusions in either extreme (perfectly accurate, or not at all) are also quite straightforward and therefore of little interest for further analysis: if peer review is perfect, the IF is always preferable, and if peer review is highly inaccurate, the IF is never preferable. The choice for sigma_r^2=0.4 is interesting from an analytical viewpoint because, depending on the value of sigma_c^2, either the IF or citations can be the more accurate indicator.
  Indeed, the model is not very realistic. In the real world, citations accumulate over time, and may be affected by the IF of the journal itself. Authors are unlikely to always submit to the highest ranked journal, and a more realistic model of submissions would be desirable. Perhaps peer review (and consequently the IF) reflects a different dimension of the value of an article than citations. Moreover, as you suggest, there are various reasons for citations, which could introduce some noise. These are all important extensions and qualifications of our current model. We hope that our paper will stimulate this line of research on modelling and that more realistic models will be developed. Nevertheless, we expect that we capture the core dynamics with our current model and that our results will therefore be quite robust with respect to these extensions. More realistic models may either strengthen or weaken the arguments in favour of the IF, but we expect there will always be some (plausible) ranges of parameters for which the IF could be more accurate than citations.
  Whether the IF should also be preferred in practice is of course highly relevant, and you rightly emphasise it. At the moment we don't know whether the IF could be preferred in practice. But here's the tricky part: neither can we argue unequivocally in favour of citations. The burden of proof for preferring either citations or IF weighs equally on both proponents of citations and proponents of IF. Of course, neither will be perfectly accurate indicators, and both will entail errors in assessing individual papers. It would therefore be unreasonable to praise the one as being right, and deride the other for being wrong. Both will have some merit and both will have some errors. To understand this in more detail will require further empirical research.
  If citation distributions are log-normal instead of normal, I agree that it seems to make more sense to use for instance the geometric mean rather than the arithmetic mean. Nevertheless, the core question remains: can we assess individual articles based on a journal level indicator? The question also remains if we use different indicators such as SNIP, SJR or CiteScore. Our results are robust with respect to using the geometric mean (we have tested this), and using other journal measures would probably lead to similar conclusions. Still, some indicators may be more appropriate than others.
  Finally, using the IF may create incorrect incentives and may have unintended consequences, which could indeed be a reason to restrict or even reject the use of the IF. Raising awareness of the fact that the IF does not necessarily reflect the value of a publication is laudable. Clarifying this by publishing the full citation distribution may work well. Yet at the same time, article level indicators, such as citations, rather than journal level indicators, will also not reflect the value of a publication perfectly. Using a combination of the two to inform assessment may then not be unreasonable. How this can be done in such a way that does not create wrong incentives and have unintended consequences is a complex question which I cannot immediately answer.
  
  Reply
  - Stephen Curry March 18th, 2017 4:36 pm
    
    Thanks for replying Vincent. I guess ultimately my difficulty with this work arises because I am not a bibliometrician. I am primarily interested in controlling the use of metrics in the evaluation of individual researchers. For that reason I am not so interested in the finer points of the circumstances under which the JIF might be a slightly more reliable indicator of ‘value’ than a citation count. To my mind the answer to your core question – “can we assess individual articles based on a journal level indicator?” – is clear: not reliably. Although the analyses done for The Metric Tide report reveals positive correlations between peer review evaluations of papers and the SJR (journal rank) or citation counts, in both cases the correlation coefficients were pretty low – in the range 0.25-0.35 (see Table 4 in Supplementary Report II). With regard to an individual paper therefore they have rather limited predictive value. I would not argue that either metric be dismissed out of hand, they are interesting pieces of information – and certainly metrics are useful for looking at broader trends. But when it comes to assessing individual pieces of work neither is sufficiently reliable to form the basis of a reasonable judgement. The context provided by peer judgement (which of course needs to be controlled for hidden biases) is vital. I want to move to a research environment where people can have confidence that their peers will judge their contributions, not on the numbers, but on their particular merits.
    
    Reply
Loet Leydesdorff March 10th, 2017 8:01 am

I agree with the authors that there are two different arguments against using the impact factor of a journal (IF) as a proxy for the quality of papers in the journal: (1) the skewness of the citation distribution, and (2) the ecological fallacy.
1. Against argument 1, the authors reason as follows: Let us assume (in scenario 2, at p. 16) that “journals are relatively homogenous in terms of the values of the articles they publish.” This relatively flat distribution of the non-observable “values” is for (unknown) statistical reasons represented by the skewed distribution of citations to these articles. The latter distribution can be observed. In this scenario, a journal measure such as the journal impact factor—in other words, the mean—could be a better predictor of the “value” of an article than its individual citation rate.
Unlike the reasoning of others who criticize the use of the IF for the evaluation of individual papers, the reasoning above would be free of assumptions. 
2. Let me add that the ecological fallacy (Robinson, 1950) does not imply that the value of an attribute to an individual is independent of the value at the group level, but that the latter may fail as a predictor of the former. One loses control of the prediction: in some cases it works; in others not.
See: Kreft, G. G., & de Leeuw, E. (1988). The see-saw effect: A multilevel problem? Quality and Quantity, 22(2), 127-137.
Abstract: Studies of school effectiveness often use measures of association, such as regression weights and correlation coefficients. These statistics are used to estimate the size of the change or “effect” that would occur in one variable (for example reading ability) given a particular change in another variable (for example sex and sex ratio). In this paper we explore the limitations of regression coefficients for use in a contextual analysis, in which both individual and contextual variables are included as independent variables. In our example “individual sex” and a context variable “sex ratio of the schoolclass” are regressors, and reading ability is the dependent variable. Our conclusion is that researchers should be careful making interpretations of effects from multiple regression analysis, when dealing with aggregate data. Even in the case (as in our example) when individual and contextual variables are made orthogonal to avoid multicollinearity, interpretation of the effects of the aggregate variable is problematical.
See also: Robinson, W. D. (1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15, 351-357.
Best wishes,
Loet

Reply
- Vincent Traag March 10th, 2017 3:35 pm
  
  The skewness is a somewhat different argument than the ecological fallacy. Our point is that the skewness in the citation distribution does not necessarily entail a skewness in the 'value' of the papers. Indeed, as you sketch in your comment, the distribution of 'value' may be relatively homogeneous (i.e. closely concentrated around the mean), or "flat" as you say. If I understand correctly, you indeed agree that in that case the IF may be a relatively good indicator of the 'value' of an article (and indeed better than citations), right?
  Nevertheless, there is a relation between the argument of skewness and the ecological fallacy. Suppose for the sake of argument that the citation distribution would be very concentrated around the mean. Would you agree that the IF (as the mean of the citation distribution) would be quite an accurate indicator of the citations of individual papers in this case? But, if we do so, we would still infer individual level characteristics from a group level characteristic (the mean in this case). Would you then still consider this an ecological fallacy?
  
  Reply
  - Loet Leydesdorff March 11th, 2017 7:58 am
    
    You have a perfect prediction of the value of the unobservable variable “value” in the ideal case of scenario 2. But not of the observable citation rate.
    
    Reply
Michael Kurtz March 8th, 2017 9:31 pm

Hi,
The Discussion section of our recent JASIST article also discusses this. I pasted the relevant paragraphs below.
Cheers,
Michael
http://onlinelibrary.wiley.com.ezp-prod1.hul.harvard.edu/doi/10.1002/asi.23689/full
https://arxiv.org/abs/1510.09099
Another, somewhat amusing result comes from the equivalence in effectiveness of citations with peer evaluations. The widespread use of the impact factor (Garfield, 1972) of a journal to rank specific articles by an individual is now widely viewed as improper (e.g., Adler, Ewing, & Taylor, 2009). As citation counts tend to follow a power law distribution, and with about 70% of all articles having citation counts below the mean, this would seem a reasonable view. However acceptance of an article by an established journal is not a citation measure, it is a peer evaluation measure.
Peer evaluations can certainly be ranked. Being offered a position at a top ranked research organization is different from being turned down, and is also different from receiving an offer from a lesser-ranked organization. Similarly having an article accepted by a highly selective publication is different from having it rejected, and is different from having it accepted by a less selective publication. The impact factor effectively ranks the peer evaluation processes of the journals, and provides the numerical means to compare the result of the peer evaluations of individual articles by journals with citation studies.

Reply
- Vincent Traag March 9th, 2017 8:21 am
  
  Thank you for the pointer to your work, we were not aware of it. Indeed it is relevant to the discussion, although our arguments are slightly different. The core of our paper is that the skewness of the citation distribution is not necessarily an argument for rejecting the use of the IF for assessing individual articles. You suggest that the IF may measure something else than citation impact, and this is indeed a possibility. We do not explore this further in our current paper, but it is one of the options we suggest as an extension for further exploration of the subject.
  
  Reply
Charles Oppenheim March 8th, 2017 11:59 am

Interesting, but kind of misses the point about the potential abuse of IF. The problem arises when senior management insists that if staff wish to get promoted/tenure/get returned for some research assessment exercise those staff must only submit articles to high IF journals. Could the authors comment on their opinion of senior managements adopting this approach?

Reply
- Vincent Traag March 9th, 2017 8:13 am
  
  The core of our paper is that the skewness of the citation distribution is not necessarily a good argument for rejecting the use of the IF for assessing individual articles (and by extension authors). This does not mean there are no other valid argument for rejecting the use of the IF, and we would welcome further discussion on that topic.
  The insistence that research staff must only publish in high-impact journals, should be rejected, I think. It creates wrong incentives and narrows scholarship to the few select journals that are considered "high-impact".
  Yet it does not fully exclude the use of the IF in assessing someone's work. When management has to choose from a number of candidates for hiring or promotion, their judgement may possibly be informed by the IF, in addition to other metrics, such as the number of citations. If the number of papers are relatively limited, a more in-depth reading of a sample may of course also be possible.
  The central question is what use of the IF would be reasonable and appropriate, without creating wrong incentives and having unintended (and undesirable) consequences. This is not easy to determine precisely, nor are consequences always easy to foresee. Personally, I intend to explore this further by formalizing and modelling such arguments and ideas. But this can, and should, also be explored further empirically and theoretically.
  
  Reply

Blog archive

Use of the journal impact factor for assessing individual articles need not be wrong

Earlier blog posts on the IF and related topics:

About Ludo Waltman

About Vincent Traag

17 comments