15 dic. 2015

A digest of Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison

A DIGEST OF
Harzing, A.W.; Alakangas, S. (2016) Google Scholar, Scopus and the Web of Science: A longitudinal and cross-disciplinary comparison, in press for Scientometrics.
OBJECTIVES
§  To provide a systematic and comprehensive comparison of the coverage of the three major bibliometric databases: Google Scholar, Scopus and the Web of Science. Various comparisons are carried out:
§  2-year longitudinal comparison (2013-2015) with quarterly data-points
§ Cross-disciplinary comparison across all major disciplinary areas (Humanities, Social Sciences, Engineering, Sciences and Life Sciences).
§  Comparison of 4 different metrics:
    • publications
    • citations
    • h-index
    • hI, annual (h-index corrected for career length and number of co- authors)
METHODOLOGY

Sample
Documents published by 146 Associate Professors and Full Professors at the University of Melbourne and the citations these documents have received according to Web of Science, Scopus, and Google Scholar
Design
§  Two full professors (1 male, 1 female) and two associate professors (1 male, 1 female) were selected for each of the 37 disciplines represented at the University of Melbourne
§  Individual academics were selected randomly within each sub-discipline, although individuals with very common names were avoided to mitigate problems with author disambiguation. Overall, 56.2 % of la sample was male
§  Search queries for individual authors were refined on an iterative basis through a detailed comparison of the results for the three databases
§  Searches for Google Scholar were defined in the Publish or Perish software, making the running of monthly data collection for 2 years
§  For Scopus and the Web of Science, data were collected on a quarterly basis only
Measures
§  Total number of publications per academic
§  Total number of citations per academic
§  The growth rate of papers and citations over time was first calculated for each academic individually and then averaged over the 146 academics
§  h-index an academic
§  hIa hI norm/academic age, where:
  § hI norm: normalize the number of citations for each paper by dividing the number of citations by the number of authors for that paper, and then calculate the h-index of the normalized citation counts
       §  Academic age: number of years elapsed since first publication
Period analyzed:  2013-2015
RESULTS
Coverage: Web of Science, Scopus, Google Scholar
1.   The results show that Google Scholar provides the most comprehensive coverage and that coverage for the Web of Science and Scopus is similar. Google Scholar has five times the number of documents indexed in WoS, and three times more documents than Scopus (Figure 1). Regarding citations, Google Scholar has twice the number of citations than WoS and Scopus.
Figure 1. Average number of papers and citations per academic across three databases (July 2015)
Data source: re-elaborated from Harzing & Alakangas (2015)
2.   Drilling down to the level of individual academics we found that Google Scholar provides a broader coverage and thus higher research metrics than the Web of Science for all academics. For Scopus the same was true for more than 90% of the academics in terms of publications and for more than three quarters of the academics in terms of citations.  All academics in the sample had higher metrics in Google Scholar than in the Web of Science. There is only one author with fewer citations in Google Scholar than in the Web of Science. There is a larger number of individual academics that show lower research metrics in Scopus than in the Web of Science
Cross-disciplinary comparison across all major disciplinary areas
3.   The number of papers and citations in Google Scholar is substantially higher than both the Web of Science and Scopus for every discipline. However, the differences are particularly large for the Humanities and the Social Sciences, where Google Scholar reports 6–4 times as many papers and 9-4 times as many citations as the two other databases (Figura 2).
Figure 2. Average number of papers and citations per academic across five disciplines and three databases, (July 2015)
Data source: Harzing & Alakangas (2015)

Longitudinal comparisons: The growth rate of Web of Science, Scopus, and   Google Scholar
4.  A consistent and reasonably stable quarterly growth for both publications and citations across the three databases is observed (Table 1). This suggests that all three databases provide sufficient stability of coverage to be used for more detailed cross-disciplinary comparisons.

Table 1. Average quarterly increase in papers, citations and h-index per academic
across the three databases (July 2013–July 2015)
Data source: Harzing & Alakangas (2015)
Papers
Citations
h-index
Scopus
2.7
5.1
2.9
Google Scholar
2.5
4.4
2.1
Web of Science
2.2
4.2
2.1


5.  It is worth noting that Scopus is currently undergoing a major expansion process, the Scopus Cited Reference Expansion Program initiated in March 2014, which aims to include cited references in Scopus going back to 1970 for pre-1996 content (Figura 2).
Figure 3. Cumulative # of pre-1996 items added to Scopus from November 2014 to Noviember 2015
Data source: http://goo.gl/PyHw3m

Metrics comparison between the three databases

6.   The data source and the specific metrics used change the conclusions that can be drawn from cross-disciplinary comparisons. More specifically, this study found that when using the h-index as a metric and the Web of Science as a data source, the average academic in the Life Sciences and Sciences had an h-index that was nearly eight times as high as their counterpart in the Humanities, and two to three times as high as their counterparts in Engineering and the Social Sciences,   respectively. However, when using the hI-annual and Google Scholar or Scopus as a data source, the average academic in the Life Sciences, Sciences, Engineering and the Social Sciences shows a very similar research performance; whereas the average Humanities academic has a hI-annual that is half to two-thirds as high as the other disciplines.
Table 2. Comparison average H-index and HI-annual per academic for five different disciplines in three different databases, July 2015
Data source:  Harzing & Alakangas (2015)


CONCLUSION


What is already known on this topic

Google Scholar has a much broader document and citation coverage than Web of Science and Scopus. This study supports the findings of Orduna-Malea et al. (2015), and Martín-Martín et al. (2015).

The longitudinal analysis showed a consistent and reasonably stable quarterly growth for both publications and citations across the three databases, confirming the findings of De Winter et al. (2014), Harzing (2014), and Orduna-Malea & Delgado López-Cozar (2014).

What this study adds

This study has been the first to present a comprehensive longitudinal and crossdisciplinary comparison across three major sources of citation data: Web of Science, Scopus and Google Scholar in five major disciplines (Humanities, Social Sciences, Engineering, Sciences, and Life Sciences). The small size and the limitations of the sample (only four researchers are studied in each subdiscipline) makes it difficult to generalize the results to the 37 subdisciplines studied, which is an issue that should be addressed in future studies.

The most suggestive results of this work showed that both the data source and the specific metrics used change the conclusions that can be drawn from cross-disciplinary comparisons.

The study thus argues that a fair and inclusive cross-disciplinary comparison of research performance is possible, provided Google Scholar or Scopus is used as a data source, and the recently introduced hI-annual—a h-index corrected for career length and co-authorship patterns—is selected as the metric of choice.

REFERENCES

De Winter, J. C., Zadpoor, A. A., & Dodou, D. (2014). The expansion of Google Scholar versus Web of Science: A longitudinal study. Scientometrics, 98(2), 1547–1565. DOI: 10.1007/s11192-013-1089-2

Harzing, A. W. (2014). A longitudinal study of Google Scholar coverage between 2012 and 2013. Scientometrics, 98(1), 565–575. DOI: 10.1007/s11192-013-0975-y

Martín-Martín, A., Orduña-Malea, E., Ayllón, J.M., Delgado López-Cózar, E. (2014). Does Google Scholar contain all highly cited documents (1950-2013)? EC3 Working Papers, 19. DOI: 10.13140/RG.2.1.2791.8163

Orduna-Malea, E., Ayllón, J.M., Martín-Martín, A., Delgado López-Cózar, E. (2015). Methods for estimating the size of Google Scholar. Scientometrics, 104(3), 931-949. DOI: 10.1007/s11192-015-1614-6


Orduña-Malea, E., Delgado López-Cózar, E. (2014). Google Scholar Metrics Evolution: an analysis according to languages. Scientometrics, 98(3), 2353-2367. DOI: 10.1007/s11192-013-1164-8