Centre for Science and Technology Studies Centre for Science and Technology Studies 2333AL Leiden Zuid Holland 31715273909
  • Blog
  • Visualizing freely available citation data using VOSviewer

Blog archive

This is the archive of our old blog. Please visit our new Leiden Madtrics blog.

Visualizing freely available citation data using VOSviewer



Today we released version 1.6.6 of our VOSviewer software for constructing and visualizing bibliometric networks. The most important new feature in this version is the support for working with Crossref data. Recently, the Initiative for Open Citations (I4OC) managed to convince a large number of scientific publishers to make the reference lists of publications in their journals freely available through Crossref. Thanks to I4OC, Crossref has become a valuable data source for VOSviewer users. In this blog post, we discuss how users of the new version 1.6.6 of VOSviewer can benefit from Crossref data.

Using Crossref data in VOSviewer

There are two ways in which VOSviewer supports the use of Crossref data:

  • A VOSviewer user can provide a set of DOIs to VOSviewer. Using the application programming interface (API) of Crossref, VOSviewer will download data for the corresponding publications.
  • A VOSviewer user can work directly with the Crossref API to download data and can then provide the downloaded data as input to VOSviewer.

The first approach is the easier one, since it does not require users to work directly with the Crossref API. When users already have DOIs of the publications they would like to analyze (e.g., publications included in their university’s research information system), we recommend using the first approach. The second approach is a bit more complex, but it has the advantage of offering much more flexibility. We will now explore the second approach in more detail.

Downloading data using the Crossref API

To demonstrate the use of the Crossref API, we collect data on publications in two scientometric journals, Journal of Informetrics and Scientometrics, in the period 2007-2016. In each API call, data can be obtained for at most 1000 publications. We therefore need to make multiple API calls. We choose to make separate calls for each of the two journals.

The number of publications in Journal of Informetrics in the period 2007-2016 is below 1000. Data for Journal of Informetrics therefore can be obtained in a single API call. To make this API call, we enter the following URL in a web browser:

http://api.crossref.org/works?filter=issn:1751-1577,from-pub-date:2007-01-01,until-pub-date:2016-12-31&rows=1000

The URL specifies a request for the Crossref API. The API request includes the ISSN number of Journal of Informetrics (i.e., 1751-1577) as well as the begin date and the end date of the time period that we are interested in. The rows parameter in the API request indicates that we would like to receive data for up to 1000 publications. By entering the above URL in a web browser, we make a call to the Crossref API requesting data on all publications in Journal of Informetrics in the period 2007-2016. After waiting for some time, the web browser will present the result of the API call. We save this result in a file called JOI.json. This is a so-called JSON file.

We follow the same approach for Scientometrics. However, Scientometrics is a larger journal, and we therefore need to make three API calls, each of them resulting in data for at most 1000 publications. We use the following URLs:

http://api.crossref.org/works?filter=issn:0138-9130,from-pub-date:2007-01-01,until-pub-date:2016-12-31&rows=1000

http://api.crossref.org/works?filter=issn:0138-9130,from-pub-date:2007-01-01,until-pub-date:2016-12-31&rows=1000&offset=1000

http://api.crossref.org/works?filter=issn:0138-9130,from-pub-date:2007-01-01,until-pub-date:2016-12-31&rows=1000&offset=2000

The three API calls are identical except that in the second and the third call we use the offset parameter to specify that we want to obtain data for a second and a third batch of publications. We save the results of the API calls in three JSON files.

We have now given a simple demonstration of the use of the Crossref API. The Crossref API offers many more options. For further information, we refer to the documentation of the API.

Creating bibliometric visualizations based on Crossref data

We first use the downloaded Crossref data to visualize a co-authorship network of researchers in the field of scientometrics. In the Create Map wizard in VOSviewer, we choose the option Create a map based on bibliographic data. In the second step of the wizard, we go to the Crossref JSON tab, where we select the four downloaded JSON files. After choosing to perform a co-authorship analysis, we simply use the default choices in the remaining steps of the wizard. The visualization of the resulting co-authorship network is presented below.

Crossref co-authorship network of scientometric researchers

Next, we use our Crossref data to visualize a bibliographic coupling network of publications in the field of scientometrics. Two publications have a bibliographic coupling link if they have one or more references in common. We again choose the option Create a map based on bibliographic data in the Create Map wizard. After selecting our four JSON files, we choose to perform a bibliographic coupling analysis at the document level. We use the default choices in the remaining steps of the wizard, which means that our bibliographic coupling network includes the 500 publications with the largest number of bibliographic coupling links. The visualization of the network is shown below.

Crossref bibliographic coupling network of scientometric publications

Examination of the bibliographic coupling network may reveal something unexpected. The 500 publications included in the bibliographic coupling network have all appeared in Scientometrics. The network does not include publications from Journal of Informetrics. This demonstrates an important limitation of Crossref data. Thanks to I4OC, many publishers nowadays make the reference lists of publications in their journals available through Crossref. However, some publishers do not (yet?) participate in I4OC. This is also the case for Elsevier, the publisher of Journal of Informetrics. Because the reference lists of publications in Journal of Informetrics are not available through Crossref, publications from this journal cannot be included in a bibliographic coupling analysis based on Crossref data.

Large-scale example

We have now provided relatively small-scale examples of the use of Crossref data in VOSviewer. It is also possible to use Crossref data at a much larger scale in VOSviewer, but this requires a significant effort in preprocessing the data. To illustrate the large-scale use of Crossref data, we use the data to visualize a citation network of 5000 journals from all fields of science.

Using the Crossref API, we downloaded data for all publications in the period 1980-2016. The amount of data was very large, and the data therefore needed to be preprocessed before it could be provided as input to VOSviewer. The data was stored in a relational database. Using this database, we identified all journals (as well as conference proceedings and book series) that have at least 100 publications for which a reference list is available. We then constructed the network of citation links between the identified journals. The direction of a citation link was ignored, so no distinction was made between a citation from journal A to journal B and a citation from journal B to journal A. The journal citation network was saved in a VOSviewer network file, and this file was used as input for VOSviewer. In VOSviewer, the 5000 journals with the largest number of citation links with other journals were selected and the citation network of these 5000 journals was visualized. The resulting visualization is presented below. An interactive visualization can be opened in VOSviewer by clicking here.

Crossref citation network of journals

The visualization shows a structure of science that is well known from earlier large-scale bibliometric visualizations, which were based on Web of Science or Scopus data. Mathematics, computer science, and engineering journals can be found in the center of the bottom area of the visualization. Journals in the physical sciences are located in the right area of the visualization, while journals in the life and medical sciences can be found in the top area. Finally, social science journals are located in the bottom-left area of the visualization. Some important journals are missing in the visualization. These journals have a publisher that does not participate in I4OC and that does not make the reference lists of publications available through Crossref.

Conclusion

Thanks to I4OC, Crossref has become a valuable source of freely available citation data. Crossref citation data can be used for many purposes, including the analysis and visualization of citation networks of journals, researchers, and individual publications. Version 1.6.6 of VOSviewer provides direct support for the use of Crossref data for visualizing citation networks. We hope that this new functionality of VOSviewer offers a convincing demonstration of the value of freely available citation data. We encourage publishers that do not yet participate in I4OC to join the initiative and to make the reference lists of publications in their journals freely available.


About Nees Jan van Eck

Senior researcher, head of data science, and coordinator of the Information & Openness focal area. Nees Jan's research focuses on infrastructures and the development of tools and algorithms to support research assessment, science policy, and scholarly communication.

About Ludo Waltman

Ludo Waltman is professor of Quantitative Science Studies and scientific director at the Centre for Science and Technology Studies (CWTS) at Leiden University. He is a coordinator of the Information & Openness focal area and a member of the Evaluation & Culture focal area. Ludo is co-chair of the Research on Research Institute (RoRI).


1 comment

Mandatory fields
  • David Shotton October 24th, 2017 11:59 am
    Your work provides a clear example of how journal articles from publishers who do not open the reference lists they already deposit to Crossref will increasingly loose visibility in bibliometrics analyses.
    As far as the upcoming generation of scholars is concerned, if it's not openly available on the Web, it does not exist.
    Reply
Share on:
Subscribe to:
Build on Applepie CMS by Waltman Development