Regular articleEffect of publication month on citation impact
Introduction
Citation impact normalization is a central concept for the construction of advanced bibliometric indicators which eliminate the effects of different scientific discipline, type of document and date of publication (Waltman, 2016). By delineating sets of publications that are similar to each other in content and formal characteristics and by using these sets to compute reference values and by computing relative impact indicators based on these reference values, the heterogeneity in citation counts due to these factors is removed. The intention is to make possible fair comparisons, to compare like with like (Schubert & Braun, 1986). The basic formal characteristics, as opposed to the content (disciplinary area), that are taken into account are document type (such as research articles, review papers, letters, editorials etc.) and publication date. Further characteristics have also been demonstrated to co-vary with citation counts, for example differences between methodological, theoretical and empirical works (Peritz, 1983), clinical vs. basic research in medicine (Van Eck, Waltman, van Raan, Klautz, & Peul, 2013) or clinical study level (e.g. Bhandari et al., 2007).
One important component of normalization is controlling for publication date, as, ceteris paribus, the more time has passed since publication the more papers will be published whose authors had the opportunity to read and cite a given publication. The publication year is commonly used to operationalize publication date. This practice is based on the implicit assumption that, for the question of interest of a study, it makes no difference when exactly in a year a paper was published. The fact that, when citations are counted at some later date, documents published in January have eleven months more to be read and cited than works published in December of the same years raises the question if the above assumption is justified, and, if it is not, under which conditions and how a more precise publication date ought to be used in citation analysis.
The question of the influence of a more exact publication date is related to the problem of choosing adequate citation windows, the period in which citations to papers in a set of publications are counted. A citation window that is very short, say two years, would more obviously lead to bias against papers published towards the end of the investigation period compared to those published towards the beginning. Consider the following simple illustration. Citations are counted at the end of the year after publication (2-year citation window). Then papers from January had 24 months to be read and cited, assuming the case they were published on the first day of the month and citations counted after the last day of the citation window, while December papers had 11 months less, just 13 months, which is 54% of the time period of the January papers. This relative disadvantage becomes smaller as the citation window length is increased. In a five year citation window, for example, the December papers had 82% of the citation duration of the January papers. Citations do not accumulate uniformly over time and one is not only concerned with January and December papers, so this reckoning does not say much about the actual size of the distortion. But it might serve as a first order approximation model. Just how big this ‘head-start’ effect is in reality and at what point in time it vanishes is the topic investigated in this paper.
The article is organized as follows. In the next section, previous work on the topic is briefly reviewed and some knowledge gaps are pointed out which this study addresses. Next, the data on which the study is based are presented. The major part of this contribution is comprised of the analysis of the results regarding the month effect from several points of view, including its size as reflected in basic citation scores and in regression analysis, also taking into consideration the online publication date, the change of the effect size over longer citation windows and its presence and patterns across disciplines. Furthermore, we introduce a method to eliminate the month bias and use the resulting corrected citation counts to demonstrate the bias on a simulated academic impact assessment of institutions under realistic conditions similar to currently employed research evaluation procedures. We finish with a discussion of the results and their implications for the field.
Section snippets
Related work
There have been a number of prior studies that have noted and investigated the month effect. They will be briefly reviewed in the following and their results used as a point of departure for this study.
Haslam et al. (2008) used publication month as a control variable throughout their regression analyses of influence factors of citation impact in a psychology sub-discipline. Their criterion was the natural logarithm of articles’ citations counted after ten years. In their results, the
Data sets
Data set A consists of all journal publications of document type ‘Article’ from the year 2000, obtained from Clarivate Analytics’ Web of Science1 (n = 767,959), for which publication month data was either available in the source data or could be estimated, as will be reported below. This year was chosen in order to be able to
Missing month data estimation and validation
The month estimation method outlined above was applied for data set A to all issue records without publication dates specific to a month, that is, those with no data, with a range of months and with a seasonal date (i.e. ‘SPR’, ‘SUM’, ‘FAL’, ‘WIN’). In order to validate the results, 339 issues were looked up on journal websites. For 199 of these issues the publication month(s) could be found. Issues spanning multiple months according to the official dates were transformed as described above. In
Limitations
The present study has some limitations. The study considered only the publication years 2000 and 2009. No temporal dynamics in publication month bias were investigated. The variable of interest was the point of publication of individual articles within a year which was approximated by using the recorded or estimated month of publication of an issue. The publication month was estimated in a portion of the data but the estimation method was shown to work well. Another limitation is that the
Acknowledgements
The author thanks Nees Jan van Eck for providing data set B and for stimulating discussions which greatly improved this study. Part of the analysis was conducted using infrastructure funded through BMBF project01PQ17001.
Mr Paul Donner studied Library and Information Science (LIS) at the Humboldt University of Berlin. In 2012 he graduated with a Master of Arts in LIS with a thesis in the field of bibliometrics. He has been working as a bibliometrics researcher for DZHW and its predecessor iFQ since August 2013.
References (15)
- et al.
A combined bibliometric indicator to predict article impact
Information Processing & Management
(2011) The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression
Journal of Informetrics
(2016)A review of the literature on citation impact indicators
Journal of Informetrics
(2016)Excellence in research for Australia (ERA) 2015 evaluation handbook
(2015)- et al.
Factors associated with citation rates in the orthopedic literature
Canadian Journal of Surgery
(2007) - et al.
The citation evolution law of papers published in the same year but different month
Learned Publishing
(2015) False publication dates and other rip-offs
Current contents
(1978)
Cited by (0)
Mr Paul Donner studied Library and Information Science (LIS) at the Humboldt University of Berlin. In 2012 he graduated with a Master of Arts in LIS with a thesis in the field of bibliometrics. He has been working as a bibliometrics researcher for DZHW and its predecessor iFQ since August 2013.