21 dic. 2016

Does Dirty Data Affect Google Scholar Citations? The case of the academic profiles of 11 Turkish researchers

G Doğan, I Şencan, Y Tonta
ASIST '16.  Proceedings of the 79th ASIS&T Annual Meeting: Creating Knowledge, Enhancing Lives through Information & Technology. Copenhagen, Denmark, October 14-18 2016


The main goal of this study is to find out if Google Scholar citation metrics fluctuate on the basis of presence of duplicate publications and citations in the database. Are addressed the following research question: 

- Does Google Scholar database include duplicate publications and citations in researchers’ profiles?

- If yes, what is the impact of this practice on citation counts and Google Scholar Citations metrics such as h- and i10-index values? 
Answering this question will shed some light on the size of the problem and help us better interpret the rankings and metrics based on GS data.

Are selected the 11 researchers based at Hacettepe University’s Department of Information Management with public GS profiles (January 27, 2016. Are collected and cleaned data between January 27-March 18, 2016.
Are checked Google Scholar profiles of 11 researchers to identify duplicate records for the same publications. Next, are identified the number of different records for each publication and citations thereto as well as singular publication counts for each researcher and combined citation counts for each publication. Are then re-calculated the h- and i10-indexes for each researcher using their new publication and combined citation counts and compared them with Google Scholar Citations metrics


Duplicate Publications
- 14% (n=69) of publications (N=499) were represented with more than one records (mostly 2, max. 5)
- Excluding duplicate records did not reduce the number of citations (only 4 out of 69 publications got affected)
- None of the researchers’ re-calculated h-index was changed and only one researcher’s i10-index has increased by 1
Duplicate Citations
- 135 publications (55%) received a total 364 duplicate citations: 12% of all citations (3,079)
- When duplicate citations removed, citation counts of half of 135 publications decreased by at least two citations
- Citation counts of almost all researchers decreased, some as much as by 20%
- h-indexes of more than half the researchers decreased by at least 1
- i10-indexes of four researchers decreased by 2 and 4, although one researcher’s i10-index increased by 1

Confirming our hypothesis

We can not generalize. The sample is small and skewed (11 Turkish researchers). National, linguistic and disciplinary peculiarities.
Further studies are needed and with larger and more representative samples

