18 jul. 2016

2016 Google Scholar Metrics released: a matter of languages... and something else

2016 Google Scholar Metrics released 

We can only be delighted by the publication of the new edition of Google Scholar Metrics (GSM) (Thursday, July 14, 2016,6:44 PM). These fifteen days of delay respect to the release of the previous version in 2015 (Thursday, June 25th, 2015,  12:16 PM) were starting to worry us, but we see now that these worries were unfounded. This year, GSM has been the last product for journal evaluation through citation analysis to be updated: the new editions of the Journal Citation Reports, Journal Metrics, and the SCImago Journal Rank were released in June.

As we said last year, we can only welcome that the American company has decided to keep supporting GSM, a free product which is also very different from traditional journal rankings. Competition is healthy, and scientists can only be pleased about this variety of search and ranking tools, especially when they are offered free of charge.

There haven't been any structural changes in this new edition. The total number of publications that can be visualised in the 2016 rankings is 7,398. Now, however, since 1,664 of them (22,5%) are classified in more than one subject area, the number of unique publications is lower: 5,734.

The main differences respect to last year's version is the inclusion of five additional language rankings (Russian, Korean, Polish, Ukrainian & Indonesian) and the removal of two language rankings: Italian, and Dutch. The addition of new language rankings is welcome as they enrich the product. That's why we don't understand why they decided to remove the Italian and Dutch rankings.

Another important change in this new version of Google Scholar Metrics is the removal of many Working Papers and Discussion Papers series. If users search "working papers""discussion papers""working paper", or "discussion paper" in GSM's search box, they will only get 7 results in total.

For example, in the previous edition of Scholar Metrics, the CEPR Discussion Papers (h5-index: 112) was ranked #4 in the general category Business, Economics & Management. This series even made it to the top 100 publications in the English ranking (93rd position). Similarly, the IZA Discussion Papers (h5-index: 82) was ranked #8 in the general category Business, Economics & Management. These two series are not to be found in the new edition of Scholar Metrics.

One might think this has been caused by a change in GSM's inclusion policies [1]. They may have decided to remove all working papers and discussion papers, but if that were the case, they shouldn't have included other working paper series, like NBER Working Papers, currently #1 in the general category Business, Economics & Management, and also #1 in the subcategory Economics. They have also maintained all the subcategories available at arXiv. This is clearly inconsistent.

Apart from these differences, Google has just updated the data, which means that some of the limitations outlined in previous studies still persist [2-7]: the visualisation of a limited number of publications (100 for those that are not published in English), the lack of categorisation by subject areas and disciplines for non-English publications, and normalisation problems (unification of journal titles, problems in the linking of documents, and problems in the search and retrieval of publication titles). 
There are three different entries for the Brazilian Journal of Anestesiology.

One of the main source of errors in GSM are the journals published in several languages. Journals published in their original native language and, at the same time, in English, are quite common. GSM has decided ti create separate entries for each of the languages in which a journal is published.

This decision is arguable, but at the very least, it should be applied consistently to all journals. The journal Revista Española de Cardiología, however, received a different treatment: the Spanish and English versions were merged.

In the case of Revista Española de Enfermedades Digestivas and Revista Portuguesa de Neumología, they weren't able to successfully separate the two versions, since both versions present articles in the original languages (Spanish and Portuguese, respectively), and English.

In the case of  Giornale italiano di medicina del lavoro ed ergonomia, they only identified the English version, but not the Italian one.

There are also several errores related to the correct linking of documents, which point to references or incorrect full-texts.

In some cases, like the journal Nutrición Hospitalaria, we find dead links, links to the PDFs in Scielo, links to Dialnet, and links to the various repositories where authors have archived their articles. Probably for this reason the title of the journal presents up to three variants.
Over the years we have detected cases of journals that don't seem to meet all the criteria set by GSM to be included in this product (mainly the minimum of 100 articles published in the las five years), and nevertheless they are included. An example of this phenomenon is the journal Area Abierta, for which there are only 43 articles published between 2011 and 2015 indexed in Google Scholar, but still is included. Additionally, the most cited article in this journal is incorrect because it actually points to an article published in another journal.
The journal Investigaciones de Historia Económica presents a similar case: this journal doesn't publish the minimum 100 original articles in the last five years, and still it is included in GSM. If we search articles published by this journal in Google Scholar, we see that this journal publishes a high amount of book reviews. Probably, these reviews were considered as articles when the data was computed.

Lastly, it should be reminded that journals not always present an uniform typographic design in their titles or the titles of the articles.

Having said that, there are fewer errors than in previous years.

In our previous studies, we have described again and again the underlying philosophy embedded in all of Google’s academic products. These products have been created in the image and likeness of Google’s general search engine: fast, simple, easy to use, understand and calculate?, and last but not least, accessible to everyone free of charge. GSM follows all these precepts, and it is, in the end, nothing more than:

- A hybrid between a bibliometric tool (indicators based on citation counts), and a bibliography (a list of highly cited documents, and of the documents that cite them).
- It offers a simple, straightforward journal classification scheme (although it also includes some conferences and repositories).
- It is based on two basic bibliometric indicators (the h index, and the median number of citations for the articles that make up the h index).
- It covers a single five-year time frame (the current one being 2011-2015).
- It uses rudimentary journal inclusion criteria, namely: publishing at least 100 articles during the last five-year period, and having received at least one citation.
- It provides lists of publications according to the language their documents are written in. For all of them, except for English publications (these are a total of 11: Chinese, Portuguese, German, Spanish, French, Japanese, Russian, Korean, Polish, Ukrainian and Indonesian) it offers lists of only 100 titles: those with the higher h index. For English publications, however, it shows a total of 4737 different publications, grouped in 8 subject areas. For each publication, it shows the titles of the documents whose citations contribute to the h index, and for each one of these documents, in turn, the titles of the documents that cite them.
- It provides a search feature that, for any given set of keywords, will retrieve a list of 20 publications whose titles contain the selected keywords. In the cases where there are more than 20 publications that satisfy the query, only the first 20 results, those with a higher h index, will be displayed.
- It doesn’t perform any kind of quality control in the indexing process nor in the information visualization process.

To sum up, GSM is a minimalist information product with few features, closed (it cannot be customized by the user), and simple (navigating it only takes a few clicks). If GSM wants to improve as a bibliometric toolit should incorporate a wider range of features. At the very least, it should: 

- Display the total number of publications indexed in GSM, as well as their countries and language of publication. Our estimations lead us to believe that this figure is probably higher than 40,000 [8]. In the case of Spain, there are over 1,000 publications indexed, which make up about 45% of the total number of academic publications in Spain [9-11].
- Provide some other basic and descriptive bibliometric indicators, like the total number of documents published in the publications indexed in GSM, and the total number of citations received in the analysed time frame. These are the two essential parameters that make it possible to assess the reliability and accuracy of any bibliometric indicator. Other indicators could be added in order to elucidate other issues like self-citation rates, impact over time (immediacy index), or to normalize results (citation average).
- Provide the complete list of documents of any given publication that have received n citations and especially those that have received 0 citations. This would allow us to verify the accuracy of the information provided by this product. It is true, much to Google’s credit, that this information could be extracted, though not easily?, from Google Scholar.
- Provide a detailed list of the conferences and repositories included in the product. The statement Google makes about including some conferences in the Engineering & Computer Science area, and some document collections like the mega-repositories arXiv, RePec and SSRN, is much too vague.
- Define the criteria that has been followed for the creation of the classification scheme (areas and disciplines), and the rules and procedures followed when assigning publications to these areas and disciplines.
- Enable the selection of different time frames for the calculation of indicators and the visualization and sorting of publications. The significant disparities in publishing processes and citation habits between areas (publishing speed, pace of obsolescence) require the possibility to customize the time frame according to the particularities of any given subject area.
-  Enable access to previous versions of Google Scholar Metrics (2007-2011, 2008-2012, 2009-2013, 2010-2014) to ensure that it is possible to assess the evolution of publications over time. Moreover, they could dare venture into the unknown and do something no one else has done before: a dynamic product, with indicators and rankings updated in real-time, just as Google Scholar does.
- Enable browsing publications by language, country and discipline, and directly display all results for these selections.
- Remove visualization restrictions: currently 100 results for each language and 20 for each discipline or keyword search.
- Enable the visualization of results by country of publication and by publisher.
- Enable sorting results according to various criteria (publication title, country, language, publishers), as well as according to other indicators (h index, h median, number of documents per publication, number of citations, self-citation rate…).
- Enable searching not only by publication title, but also by country and language of publication.
- Enable an option for exporting global results, as well as results by discipline, or those of a custom query.
- Enable an option for reporting errors detected by users, so they can be fixed (duplicate titles, erroneous titles, incorrect links, deficient calculations…).
- Lastly, reducing the mininimum number of articles published in the last 5 years from 100 to 50 might be a good idea. 20 articles per year is not a difficult goal for journals written in English, especially in areas like natural sciences and health. However, there are many local journals published in non-English-speaking countries, especially in the Arts & Humanities, that just can't reach that amount of articles.
Dixit two years ago


Granada, July 15, 2016, 22:10 PM.

