Open science is becoming a critical topic in the current science policy and research management landscape, in which open access (OA) is seen as the first necessary step forward towards an open work environment in modern day scholarly life. Given its importance, the inclusion of indicators on OA penetration is a natural step in the 2019 edition of the CWTS Leiden Ranking. Until now, university ranking producers have mostly focused their efforts on reporting regular performance indicators (publications, citations, awards, reputation surveys, etc.), leaving out this important new dimension of scholarly communication. In this post, we describe the methodological approach used for assigning the OA labels in the Leiden Ranking 2019. The Leiden Ranking is the first university ranking to include in a comprehensive manner the uptake of OA publishing by universities worldwide.
CWTS has been working on collecting OA evidence over the last three years. This work was reported first at the Science and Technology Indicators Conference in Paris in 2017 (van Leeuwen et al., 2017). In the method presented at that conference, we strived for a high degree of reproducibility of our results. Our method was based on free and open data sources (DOAJ, ROAD, Crossref, PubMed Central and OpenAIRE). The method was applied to Web of Science (WoS) publication data.
From mid-2018 onwards, another source for OA tagging has become prominent, namely the Unpaywall database. CWTS has linked this data source to the WoS database, while trying to understand what this implementation means for OA uptake analyses. We compared our previously developed methodology with the new one, as well as with the Unpaywall data (Piwowar et al, 2018). The inclusion of Unpaywall in the methodology required us to investigate what OA evidence Unpaywall provides, whether this evidence aligns with our own criteria for building OA evidence, and whether there are any conceptual issues related to the typologies of OA provided by Unpaywall. For example, it can be discussed whether the Bronze OA label disclosed by Unpaywall is a sustainable form of OA. The chart presented below provides the workflow followed to label OA publications along with the total share of OA publications identified for each OA label. As observed, Unpaywall does not provide OA tags but informs on the origin of the extracted OA document, along with some characteristics assigned to it (e.g., type of license). The final numbers of OA uptake we have are relatively consistent with similar studies (e.g., Piwowar et al., 2018, Martín-Martín et al., 2018). We want to stress that work on the analysis of OA uptake is highly fluid, and as such our work on reporting OA uptake can be considered work in progress.
The reading of the chart starts in the upper left corner at ‘is_oa’. From here on binary decisions determine the flow. So if publications are open, the question is whether their journal is OA (journal_is_oa). If this is answered confirmative, the Gold OA label is assigned, while a negative answer to this question brings us to the host type where the publications were found. If host_type is ‘repository’, the Green OA label is assigned. On the other hand, if host_type is ‘publisher’, the question of the availability of a license is raised. If ‘no license’ is found, the Bronze OA label is assigned, while ‘any license’ leads to the Hybrid OA label.
Open access labels
The methodological approach that we use in the Leiden Ranking 2019 focuses on assigning OA labels to publications in the WoS database, using Unpaywall to establish the OA status of publications. Two basic principles for assigning an OA label are sustainability and legality. By sustainability we mean that it should, in principle, be possible to reproduce the OA labeling from the various sources used, repeatedly, in an open fashion, with a limited risk of the sources used disappearing behind a pay-wall, and particularly with a limited risk of publications reported as OA changing their status to closed. The second aspect, legality, relates to the use of data sources that represent evidence of legal OA publishing, excluding rogue or illegal OA publishing. As a consequence, we do not consider publications made available in platforms such as ResearchGate (the sustainability aspect) or Sci-Hub (the legality aspect) to be OA publications. While the criterion of sustainability is mainly a scientific requirement, namely of reproducibility and perdurability over time, the criterion of legality is particularly important for science policy, indicating that OA publishing aligns with policies and mandates.
Using Unpaywall data, we are able to distinguish between Gold, Hybrid, Green, and Bronze OA.
Gold OA relates to publications in OA journals. To identify Gold OA publications, we expand beyond the DOAJ list, a directory of OA journals, and select publications identified by Unpaywall in OA journals in general.
Hybrid OA is a form of OA publishing in which the author(s) of a publication pay for OA publishing in a non-OA journal, thereby creating open accessibility to a single publication in an otherwise toll access journal.
Green OA is a form of OA publishing in which publications are stored in an openly accessible database, also called an archive or repository. An important aspect in the Green OA form relates to the archival function of storing publications in an open environment, closely connected to the sustainability dimension of OA publishing. In the Green OA form, we distinguish two perspectives that relate to the degree of engagement, one that could be labeled as self-archiving, the other as general archiving. Self-archiving means the deliberate action by author(s) or librarians to store publications in either an institutional or a disciplinary archive or repository. General archiving also includes open accessibility of publications in systems such as PubMed Central (PMC), whereby the archival function is most important (the sustainability dimension of OA publishing). Importantly, Green OA publications may also have a Gold, Hybrid, or Bronze OA label. For example, publications in the OA journal PLOS ONE are not only labeled as Gold OA, but also as Green OA, because they are made available in PMC. Green OA is thereby one of the most complex elements in the overall OA landscape. As we strive for completeness, we have decided to adopt the general archiving Green OA perspective in the Leiden Ranking 2019.
Bronze OA is a form of OA publishing where publishers make publications openly accessible without a clear license. According to the criteria outlined above, this is not a sustainable form of OA. However, for reasons of completeness, we have chosen to report Bronze OA as a separate OA category in the Leiden Ranking 2019, and consequently Bronze OA is also included in the overall counting of OA publications.
As an example of our results, the following chart shows the relation between the share of green OA publications and gold OA publications for all universities included in the Leiden Ranking 2019 for the period 2014-2017.
As always, the chosen methodology has some limitations. One of the most prominent limitations relates to the coverage of the WoS database used in the Leiden Ranking. Only publications in journals are taken into account. In domains where book and proceedings publishing are important, reporting of OA uptake based only on journal publishing means an underreporting of what is going on. So for universities with large social sciences and humanities departments (where book publishing is common) or engineering departments (where conference papers are the norm), the OA indicators in the Leiden Ranking provide an incomplete picture. Another limitation relates to the linking of Unpaywall data to WoS. The linking methodology is based on Digital Object Identifiers (DOIs), as a unique identifier of journal publications. Since some WoS publications do not have a DOI and therefore cannot be linked to Unpaywall data, the OA indicators in the Leiden Ranking may somewhat underrepresent the true level of OA publishing.
As a final note, we look forward to discussing our methodology as well as some preliminary analyses of the penetration of OA publishing in universities at the ISSI 2019 Conference in Rome in September this year.
Martín-Martín, A., Costas, R., van Leeuwen, T., & Delgado López-Cózar, E. (2018). Evidence of open access of scientific publications in Google Scholar: A large-scale analysis. Journal of Informetrics, 12(3), 819-841. https://doi.org/10.1016/j.joi.2018.06.012
Piwowar, H., Priem, J., Larivière, V., Alperin, J. P., Matthias, L., Norlander, B., … Haustein, S. (2018). The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles. PeerJ, 6, e4375. https://doi.org/10.7717/peerj.4375
van Leeuwen, T.N., Meijer, I., Yegros-Yegros, A., & Costas, R. (2017). Developing indicators on open access by combining evidence from diverse data sources. In Proceedings of the 2017 STI Conference. https://arxiv.org/abs/1802.02827