Elsevier

Journal of Informetrics

Volume 12, Issue 1, February 2018, Pages 330-343
Journal of Informetrics

Regular article
Effect of publication month on citation impact

https://doi.org/10.1016/j.joi.2018.01.012Get rights and content

Highlights

  • Large scale analysis of the incidence of a publication month bias.

  • Month bias substantively distorts citation impact indicators.

  • The bias is shown to influence scores in quantitative science evaluation scenarios.

  • A method to eliminate month bias is introduced and evaluated.

Abstract

A standard procedure in citation analysis is that all papers published in one year are assessed at the same later point in time, implicitly treating all publications as if they were published at the exact same date. This leads to systematic bias in favor of early-months publications and against late-months publications. This contribution analyses the size of this distortion on a large body of publications from all disciplines over citation windows of up to 15 years. It is found that early-month publications enjoy a substantial citation advantage, which arises from citations received in the first three years after publication. While the advantage is stronger for author self-citations as opposed to citations from others, it cannot be eliminated by excluding self-citations. The bias decreases only slowly over longer citation windows due to the continuing influence of the earlier years’ citations. Because of the substantial extent and long persistence of the distortions, it would be useful to remove or control for this bias in research and evaluation studies which use citation data. It is demonstrated that this can be achieved by using the newly introduced concept of month-based citation windows.

Introduction

Citation impact normalization is a central concept for the construction of advanced bibliometric indicators which eliminate the effects of different scientific discipline, type of document and date of publication (Waltman, 2016). By delineating sets of publications that are similar to each other in content and formal characteristics and by using these sets to compute reference values and by computing relative impact indicators based on these reference values, the heterogeneity in citation counts due to these factors is removed. The intention is to make possible fair comparisons, to compare like with like (Schubert & Braun, 1986). The basic formal characteristics, as opposed to the content (disciplinary area), that are taken into account are document type (such as research articles, review papers, letters, editorials etc.) and publication date. Further characteristics have also been demonstrated to co-vary with citation counts, for example differences between methodological, theoretical and empirical works (Peritz, 1983), clinical vs. basic research in medicine (Van Eck, Waltman, van Raan, Klautz, & Peul, 2013) or clinical study level (e.g. Bhandari et al., 2007).

One important component of normalization is controlling for publication date, as, ceteris paribus, the more time has passed since publication the more papers will be published whose authors had the opportunity to read and cite a given publication. The publication year is commonly used to operationalize publication date. This practice is based on the implicit assumption that, for the question of interest of a study, it makes no difference when exactly in a year a paper was published. The fact that, when citations are counted at some later date, documents published in January have eleven months more to be read and cited than works published in December of the same years raises the question if the above assumption is justified, and, if it is not, under which conditions and how a more precise publication date ought to be used in citation analysis.

The question of the influence of a more exact publication date is related to the problem of choosing adequate citation windows, the period in which citations to papers in a set of publications are counted. A citation window that is very short, say two years, would more obviously lead to bias against papers published towards the end of the investigation period compared to those published towards the beginning. Consider the following simple illustration. Citations are counted at the end of the year after publication (2-year citation window). Then papers from January had 24 months to be read and cited, assuming the case they were published on the first day of the month and citations counted after the last day of the citation window, while December papers had 11 months less, just 13 months, which is 54% of the time period of the January papers. This relative disadvantage becomes smaller as the citation window length is increased. In a five year citation window, for example, the December papers had 82% of the citation duration of the January papers. Citations do not accumulate uniformly over time and one is not only concerned with January and December papers, so this reckoning does not say much about the actual size of the distortion. But it might serve as a first order approximation model. Just how big this ‘head-start’ effect is in reality and at what point in time it vanishes is the topic investigated in this paper.

The article is organized as follows. In the next section, previous work on the topic is briefly reviewed and some knowledge gaps are pointed out which this study addresses. Next, the data on which the study is based are presented. The major part of this contribution is comprised of the analysis of the results regarding the month effect from several points of view, including its size as reflected in basic citation scores and in regression analysis, also taking into consideration the online publication date, the change of the effect size over longer citation windows and its presence and patterns across disciplines. Furthermore, we introduce a method to eliminate the month bias and use the resulting corrected citation counts to demonstrate the bias on a simulated academic impact assessment of institutions under realistic conditions similar to currently employed research evaluation procedures. We finish with a discussion of the results and their implications for the field.

Section snippets

Related work

There have been a number of prior studies that have noted and investigated the month effect. They will be briefly reviewed in the following and their results used as a point of departure for this study.

Haslam et al. (2008) used publication month as a control variable throughout their regression analyses of influence factors of citation impact in a psychology sub-discipline. Their criterion was the natural logarithm of articles’ citations counted after ten years. In their results, the

Data sets

Data set A consists of all journal publications of document type ‘Article’ from the year 2000, obtained from Clarivate Analytics’ Web of Science1 (n = 767,959), for which publication month data was either available in the source data or could be estimated, as will be reported below. This year was chosen in order to be able to

Missing month data estimation and validation

The month estimation method outlined above was applied for data set A to all issue records without publication dates specific to a month, that is, those with no data, with a range of months and with a seasonal date (i.e. ‘SPR’, ‘SUM’, ‘FAL’, ‘WIN’). In order to validate the results, 339 issues were looked up on journal websites. For 199 of these issues the publication month(s) could be found. Issues spanning multiple months according to the official dates were transformed as described above. In

Limitations

The present study has some limitations. The study considered only the publication years 2000 and 2009. No temporal dynamics in publication month bias were investigated. The variable of interest was the point of publication of individual articles within a year which was approximated by using the recorded or estimated month of publication of an issue. The publication month was estimated in a portion of the data but the estimation method was shown to work well. Another limitation is that the

Acknowledgements

The author thanks Nees Jan van Eck for providing data set B and for stimulating discussions which greatly improved this study. Part of the analysis was conducted using infrastructure funded through BMBF project01PQ17001.

Mr Paul Donner studied Library and Information Science (LIS) at the Humboldt University of Berlin. In 2012 he graduated with a Master of Arts in LIS with a thesis in the field of bibliometrics. He has been working as a bibliometrics researcher for DZHW and its predecessor iFQ since August 2013.

References (15)

There are more references available in the full text version of this article.

Cited by (0)

Mr Paul Donner studied Library and Information Science (LIS) at the Humboldt University of Berlin. In 2012 he graduated with a Master of Arts in LIS with a thesis in the field of bibliometrics. He has been working as a bibliometrics researcher for DZHW and its predecessor iFQ since August 2013.

View full text