15 dic. 2015

A digest of Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison

Harzing, A.W.; Alakangas, S. (2016) Google Scholar, Scopus and the Web of Science: A longitudinal and cross-disciplinary comparison, in press for Scientometrics.
§  To provide a systematic and comprehensive comparison of the coverage of the three major bibliometric databases: Google Scholar, Scopus and the Web of Science. Various comparisons are carried out:
§  2-year longitudinal comparison (2013-2015) with quarterly data-points
§ Cross-disciplinary comparison across all major disciplinary areas (Humanities, Social Sciences, Engineering, Sciences and Life Sciences).
§  Comparison of 4 different metrics:
    • publications
    • citations
    • h-index
    • hI, annual (h-index corrected for career length and number of co- authors)

Documents published by 146 Associate Professors and Full Professors at the University of Melbourne and the citations these documents have received according to Web of Science, Scopus, and Google Scholar
§  Two full professors (1 male, 1 female) and two associate professors (1 male, 1 female) were selected for each of the 37 disciplines represented at the University of Melbourne
§  Individual academics were selected randomly within each sub-discipline, although individuals with very common names were avoided to mitigate problems with author disambiguation. Overall, 56.2 % of la sample was male
§  Search queries for individual authors were refined on an iterative basis through a detailed comparison of the results for the three databases
§  Searches for Google Scholar were defined in the Publish or Perish software, making the running of monthly data collection for 2 years
§  For Scopus and the Web of Science, data were collected on a quarterly basis only
§  Total number of publications per academic
§  Total number of citations per academic
§  The growth rate of papers and citations over time was first calculated for each academic individually and then averaged over the 146 academics
§  h-index an academic
§  hIa hI norm/academic age, where:
  § hI norm: normalize the number of citations for each paper by dividing the number of citations by the number of authors for that paper, and then calculate the h-index of the normalized citation counts
       §  Academic age: number of years elapsed since first publication
Period analyzed:  2013-2015
Coverage: Web of Science, Scopus, Google Scholar
1.   The results show that Google Scholar provides the most comprehensive coverage and that coverage for the Web of Science and Scopus is similar. Google Scholar has five times the number of documents indexed in WoS, and three times more documents than Scopus (Figure 1). Regarding citations, Google Scholar has twice the number of citations than WoS and Scopus.
Figure 1. Average number of papers and citations per academic across three databases (July 2015)
Data source: re-elaborated from Harzing & Alakangas (2015)
2.   Drilling down to the level of individual academics we found that Google Scholar provides a broader coverage and thus higher research metrics than the Web of Science for all academics. For Scopus the same was true for more than 90% of the academics in terms of publications and for more than three quarters of the academics in terms of citations.  All academics in the sample had higher metrics in Google Scholar than in the Web of Science. There is only one author with fewer citations in Google Scholar than in the Web of Science. There is a larger number of individual academics that show lower research metrics in Scopus than in the Web of Science
Cross-disciplinary comparison across all major disciplinary areas
3.   The number of papers and citations in Google Scholar is substantially higher than both the Web of Science and Scopus for every discipline. However, the differences are particularly large for the Humanities and the Social Sciences, where Google Scholar reports 6–4 times as many papers and 9-4 times as many citations as the two other databases (Figura 2).
Figure 2. Average number of papers and citations per academic across five disciplines and three databases, (July 2015)
Data source: Harzing & Alakangas (2015)

Longitudinal comparisons: The growth rate of Web of Science, Scopus, and   Google Scholar
4.  A consistent and reasonably stable quarterly growth for both publications and citations across the three databases is observed (Table 1). This suggests that all three databases provide sufficient stability of coverage to be used for more detailed cross-disciplinary comparisons.

Table 1. Average quarterly increase in papers, citations and h-index per academic
across the three databases (July 2013–July 2015)
Data source: Harzing & Alakangas (2015)
Google Scholar
Web of Science

5.  It is worth noting that Scopus is currently undergoing a major expansion process, the Scopus Cited Reference Expansion Program initiated in March 2014, which aims to include cited references in Scopus going back to 1970 for pre-1996 content (Figura 2).
Figure 3. Cumulative # of pre-1996 items added to Scopus from November 2014 to Noviember 2015
Data source: http://goo.gl/PyHw3m

Metrics comparison between the three databases

6.   The data source and the specific metrics used change the conclusions that can be drawn from cross-disciplinary comparisons. More specifically, this study found that when using the h-index as a metric and the Web of Science as a data source, the average academic in the Life Sciences and Sciences had an h-index that was nearly eight times as high as their counterpart in the Humanities, and two to three times as high as their counterparts in Engineering and the Social Sciences,   respectively. However, when using the hI-annual and Google Scholar or Scopus as a data source, the average academic in the Life Sciences, Sciences, Engineering and the Social Sciences shows a very similar research performance; whereas the average Humanities academic has a hI-annual that is half to two-thirds as high as the other disciplines.
Table 2. Comparison average H-index and HI-annual per academic for five different disciplines in three different databases, July 2015
Data source:  Harzing & Alakangas (2015)


What is already known on this topic

Google Scholar has a much broader document and citation coverage than Web of Science and Scopus. This study supports the findings of Orduna-Malea et al. (2015), and Martín-Martín et al. (2015).

The longitudinal analysis showed a consistent and reasonably stable quarterly growth for both publications and citations across the three databases, confirming the findings of De Winter et al. (2014), Harzing (2014), and Orduna-Malea & Delgado López-Cozar (2014).

What this study adds

This study has been the first to present a comprehensive longitudinal and crossdisciplinary comparison across three major sources of citation data: Web of Science, Scopus and Google Scholar in five major disciplines (Humanities, Social Sciences, Engineering, Sciences, and Life Sciences). The small size and the limitations of the sample (only four researchers are studied in each subdiscipline) makes it difficult to generalize the results to the 37 subdisciplines studied, which is an issue that should be addressed in future studies.

The most suggestive results of this work showed that both the data source and the specific metrics used change the conclusions that can be drawn from cross-disciplinary comparisons.

The study thus argues that a fair and inclusive cross-disciplinary comparison of research performance is possible, provided Google Scholar or Scopus is used as a data source, and the recently introduced hI-annual—a h-index corrected for career length and co-authorship patterns—is selected as the metric of choice.


De Winter, J. C., Zadpoor, A. A., & Dodou, D. (2014). The expansion of Google Scholar versus Web of Science: A longitudinal study. Scientometrics, 98(2), 1547–1565. DOI: 10.1007/s11192-013-1089-2

Harzing, A. W. (2014). A longitudinal study of Google Scholar coverage between 2012 and 2013. Scientometrics, 98(1), 565–575. DOI: 10.1007/s11192-013-0975-y

Martín-Martín, A., Orduña-Malea, E., Ayllón, J.M., Delgado López-Cózar, E. (2014). Does Google Scholar contain all highly cited documents (1950-2013)? EC3 Working Papers, 19. DOI: 10.13140/RG.2.1.2791.8163

Orduna-Malea, E., Ayllón, J.M., Martín-Martín, A., Delgado López-Cózar, E. (2015). Methods for estimating the size of Google Scholar. Scientometrics, 104(3), 931-949. DOI: 10.1007/s11192-015-1614-6

Orduña-Malea, E., Delgado López-Cózar, E. (2014). Google Scholar Metrics Evolution: an analysis according to languages. Scientometrics, 98(3), 2353-2367. DOI: 10.1007/s11192-013-1164-8

4 nov. 2015

Bibliometrics & the bibliometricians in Google Scholar Citations and ResearcherID, ResearchGate, Mendeley, Twitter

In keeping with the research line the EC3 Research Group began several years ago aimed at unravelling the inner depths of Google Scholar and testing its capabilities as a tool for scientific evaluation, this time we have turned our efforts to finding new uses for Google Scholar Citations (GSC). Based on the information available on every GSC public profile, a procedure has been developed to collect data from the scientists working on a given field of study, and to aggregate that data in order to present metrics at various levels: authors, documents, journals, and book publishers. Thus, GSC data would presumably allow us to present a picture of the history and scientific communication patterns of a discipline. In order to explore the feasibility of this project, we decided to select the field of Bibliometrics, Scientometrics, Informetrics, Webometrics, and Altmetrics as our test subject.
Once we’ve seen the picture of the discipline that can be observed through the data available in GSC, we also want to compare it to its counterparts in other academic web services, like ResearcherID, a researcher identification system launched by Thomson Reuters, mainly built upon data from Web of Science (which has been and still is the go-to source for many researchers in the field of research evaluation), and other profiling services which have arisen in the wake of the Web 2.0 movement: ResearchGate, an academic social network, and Mendeley, a social reference manager which also offers profiling features. These are the most widely known tools worldwide for academic profiling . In addition, we also include the links to the authors' homepages (the first tool researchers used to showcase their scientific activities on the Web), and Twitter, the popular microblogging site, in order to learn how much presence bibliometricians have in this platform and the kind of communication activities in which they take part there. 

28 different indicators from 813 authors are displayed. The data is presented "as is": no filtering or cleaning of the data has been carried out. From the ranking of 813 bibliometricians who have made their Google Scholar Citations profile public, and the top 1057 most cited documents in those profiles, two additional rankings have been developed: a journal ranking, and an publisher ranking  according to the number of citations received.
In short, our aim is to present a multifaceted and integral perspective of the discipline, as well as to provide the opportunity for an easy and intuitive comparison of these products and the reflections of scientific activity each of them portrays. In addition, we also want to bring attention to the new platforms that are offering scientific performance metrics and look into what their meaning could be. With this step, we enter the altmetrics debate, but with a different approach: we do it from the individuals' perspective, and not only from the perspective of the documents they publish. In short, to notice what these tools really measure while applying it precisely to those who measure

We are currently on an analysis of the data displayed in this product, which will be presented shortly in a working paper.

The product is accessible from:

28 sept. 2015

The Role of Google Scholar in Evidence Reviews and Its Applicability to Grey Literature Searching

Haddaway NR, Collins AM, Coughlin D, Kirk S
The Role of Google Scholar in Evidence Reviews and Its Applicability to Grey Literature Searching.
PLoS ONE 10(9): e0138237

This paper analyses the use of Google Scholar as a source of research literature to help answer the following questions:
1. What proportion of Google Scholar search results is academic literature and what proportion grey literature, and how does this vary between different topics?
2. How much overlap is there between the results obtained from Google Scholar and those obtained from Web of Science?
3. What proportion of Google Scholar and Web of Science search results are duplicates and what causes this duplication?
4. Are articles included in previous environmental systematic reviews identifiable by using Google Scholar alone?
5. Is Google Scholar an effective means of finding grey literature relative to that identified from hand searches of organisational websites?

Using systematic review case studies from environmental science (seven), this paper analyses the utility of Google Scholar in systematic reviews and in searches for grey literature.  The search strings used herein were either taken directly from the string used in Google Scholar in each systematic review’s methods or were based on the review’s academic search string where Google Scholar was not originally searched. Search results in Google Scholar were performed both at “full text” (i.e. the entire full text of each document was searched for the specified terms) and “title” (i.e. only the title of each document was searched for the specified terms) level using the advanced search facility. 
Since Google Scholar displays a maximum of 1,000 search results this was the maximum number of citations that could be analyzed.

Between 8 and 39% of full text search results from Google Scholar were classed as grey literature (mean ± SD: 19% ± 11), and between 8 and 64% of title search results (40% ± 17).
-  Google Scholar's search results show a greater percentage of grey literature than academic literature in title search results (43.0%) than full text results (18.9%).
- Most of the grey literature documents were usually displayed around page 80 (±15 (SD)) for full text results, whilst it occurred at page 35 (± 25 (SD)) for title results.
- Google Scholar demonstrated modest overlap with Web of Science title searches: this overlap ranged from 10 to 67% of the total results in Web of Science
- The percentage of total results that are duplicate records for Google Scholar range from 0.56 to 2.93% and for Web of Science from 0.03 to 0.05.
- Many of the included articles from the six published systematic review case studies were identified when searching for those articles specifically in Google Scholar (94.3 to 100% of
studies). However, a significant proportion of studies in one review [31] were not found at all using Google Scholar (31.5%)
When searching specifically for individual articles, Google Scholar catalogued a larger proportion of articles than Web of Science (% of total in Google Scholar / % of total in Web of Science: SR1, 98.3/96.7; SR4, 94.3/83.9; SR6, 99.4/89.7).
None of the 84 grey literature articles identified by (systematic review 5)  were found within the exported Google Scholar search results (68 total records from title searches and 1,000 of a total 49,700 records from full text searches). However, when searched for specifically 61 of the 84 articles were identified by Google Scholar

16 sept. 2015

Improvements in Google Scholar Citations are for the summer: creating an institutional affiliation link feature

It seems that the Mountain View’s company has a special fondness for the summer to make changes to its flagship products. If last year it announced on August 21st a 'Fresh Look of Scholar Profiles", this 2015 we have learnt almost at the same time -not from the official Google Scholar blog which has not provided any information but from a Tweet by Isidro Aguillo tthat "Google Scholar Citations add links to institution`s names (incl acronyms) in correct-built affiliations of profiles".

We definitely welcome this new initiative that represents an improvement in the product since it allows having a new and easy way to search information from scholars belonging to a specific institution. Previously specific searches by the institution name or the email domain in the open box were required for this, a tedious and very unfriendly process. Now, just clicking on the name of the institution we can identify all scholars belonging to an organization as well as the global scientific interest and thematic focus of the corresponding institution. Incidentally it will facilitate the morbid – as well as dangerous - evaluative exercises that some institutions have already performed from them.

At the end, Google has implemented a new information search feature under the form of an authority control tool for institutional affiliations that lies halfway between the classic controlled search and the natural language.

Always vigilant to the changes Google introduces in its products, we have prepared a report where we explore the current implementation of this new feature. First, this new tool is described, pointing out its main characteristics and functioning. Next, the coverage and precision of the tool are evaluated. Two special cases (Google Inc. and Spanish Universities) are briefly treated with the purpose of illustrating some aspects about the accuracy of the tool for the task of gathering authors within their appropriate institution. Finally, some inconsistencies, errors and malfunctioning are identified, categorized and described. The report finishes by providing some suggestions to improve the feature. The general conclusion is that the standardized institutional affiliation link provided by Google Scholar Citations, despite working pretty well for a large number of institutions (especially Anglo-Saxon universities) still has a number of shortcomings and pitfalls which need to be addressed in order to make this authority control tool fully useful worldwide, both for searching purposes and for metric tasks

9 sept. 2015

Developing an Open-Source Bibliometric Ranking Website Using Google Scholar Citation Profiles for Researchers in the Field of Biomedical informatics

Dean F. Sittig, Allison B. McCoy, Adam WrightJimmy Lin
Developing an open-source bibliometric ranking website using Google Scholar Citation Profiles for researchers in the field of Biomedical informatics 
Sarkar et al. (Eds.). MEDINFO 2015: eHealth-enabled Health.  MIA and IOS Press,2015, 
DOI 10.3233/978-1-61499-564-7-1004

The principal objective of this work is to develop a searchable, interactive, automatically updating, open source, bibliometric ranking website using Google Scholar Citation Profiles that includes over 1,170 Biomedical Informatics researchers from around the world: the Biomedical Informatics Researchers ranking website (rank.informatics-review.com). 
This list contains only researchers who have a Google Scholar Profile.
The website is composed of four key components that work together to create an automatically updating ranking website: 
(1) list of biomedical informatics researchers
(2) Google Scholar scraper
(3) display page
(4) updater
This open-source application is written in Node.js® and built using commonly-available open source libraries. It takes as input the list of researchers and then iteratively retrieves the listing of each person’s Google Scholar citation counts, the total number of citations, the year of first citation, the i10-index, and the h-index. These values are extracted based on matching the relevant elements from each page’s DOM (Document Object Model) structure.
In addition to extracting raw statistics from profile pages, the application also calculates the citations/year, i-10 index/year, and h-index/year; all computed values are written into a file in JSON format, which faciliates the display as well asdownstream processing by other applications.
The correlation coefficient between the h-index and total citations (r2=0.8) and i10-index (r2=0.93)
The Biomedical Informatics Researchers ranking website is 

30 jul. 2015

How is an academic social site populated? A demographic study of Google Scholar Citations population

José Luis Ortega
How is an academic social site populated? A demographic study of Google Scholar Citations population 
Scientometrics, 2015, 104(1):1–18 
DOI 10.1007/s11192-015-1593-7

The principal objective of this work is to describe the growth of GSC in its initial moments (2011–2012) through a set of personal attributes such as bibliometric indicators, positions, disciplines, organizations and countries. This objective aims to make clear the biases that could appear in this population and discuss how they would affect the research evaluation.
Several research questions can be formulated from this primary objective:
• How is the growth of profiles in GSC and how can the number of profiles be estimated?
• How have the characteristics that define this population (bibliometric indicators,
position, discipline, affiliation and country) evolved during this initial moment?
• What consequences could have this distribution of profiles for research evaluation?
Quarterly samples from December 2011 to December 2012 were extracted from Google Scholar Citations to analyse the number of members, distribution of their bibliometric indicators, positions, institutional and country affiliations and the labels to describe their scientific activity.
- Google Scholar Citations was growing very fast during 2012, going from 26,600 profiles in December 2011 to 187,301 in December 2012. At least from the harvested data, because our estimations suggest 236,000 profiles, which is close to 10 times of the initial size
- Most of the users are young researchers, with a starting scientific career 
- From the subject matter point of view, Google Scholar Citationsis dominated since its beginnings by researchers close to Computer Science and related disciplines. However, the last samples appreciate the emergence of researchers from Physics and Environmental Sciences and
Medicine that balance the thematic distribution of the service.
- Both country and institutional distributions exhibit evidence that this service is getting populated by waves of researchers, firstly from English-speaking countries where Harvard University and Massachusetts Institute of Technology were outstood; then from European countries and finally from emergent countries, highlighting Brazil and their Universidade of Sa˜o Paulo and Universidade Estadual Paulista.

With the fast growth of Google Scholar, the question is:  What is the situation in july 2015?