Introduction

There are frequent discussions in scientometric literature, as well as in higher education evaluations, about how long the period that a citation analysis is based on should be (e.g. Research Evaluation and Policy Project 2005, p. 20f.). According to Adams (2005), a short-term citation period of 1 or 2 years would make it possible to monitor and evaluate the citation rate at an early stage, while looking at short citation time periods can cause the data to be distorted. Publications have varying levels of ageing on the one hand (Glänzel and Schoepflin 1995) and durability on the other (Costas et al. 2010). With regard to investigating the research impact of German-language business schools, researchers such as Dyckhoff and Schmitz (2007) analyse comparatively long periods of 10 and 14 years, while Dilger (2010) made the citation period relatively short at two and a half years. Long citation periods bring with them the benefit of compensating for short, random fluctuations to a certain degree by lengthening the citation period. The drawback is the resulting greater effort required to survey the citation indicators.

With regard to determining suitable citation periods for generating informative and reliable results, it must be recognized that there is only a modest number of studies and that no uniform consensus regarding a suitable citation period for certain analyses can be derived from these studies. Furthermore, various mathematical methods are applied, such as half-lives, linear regression models, and correlation analyses (see, e.g., Abramo et al. 2019; Rovira-Esteva et al. 2019; Wang 2013). Only simple citation indicators are used in these studies. Beginning with the first citation analysis by Gross and Gross (1927), a wide array of citation indicators have been developed over time. In order to determine the effect of research services, indicators such as the h-index, developed by Hirsch (2005), are used. Because departments have differing habits relating to publication and citation (Glänzel et al. 2008), comparisons can only be made between disciplines if standardized indicators are used. As already noted, the variety of different citation indicator values in bibliometric studies caused by different citation periods has so far been ignored. This raises the following questions:

What citation period for citations of scientific publications will maximize the informative value of research performance analyses? Can differences be seen between different citation indicators?

Furthermore, studies are based either on whole data sets or on individual disciplines. The results of the latter studies vary depending on the discipline that is being analysed. So far, however, there is no comparative analysis of different citation periods for disciplines, which raises the following question:

Does the length of the citation period that can be shown to have the maximum informative value vary between disciplines?

Our paper’s contribution is three folded and can be sum up as follows:

  • Comprehensive inclusion of different citation periods to analyse probably changing processes during the timeframes

  • Analysis of different relevant citation indicators, especially first time inclusion of a journal normalized indicator

  • Consequent results consideration from the viewpoint of universities research evaluation

Our paper is structured as follows: The next section gives a brief account of the current state of research with regard to the stated research questions. Additionally, we concretize these research questions. Next, we show the design of our research, before presenting the results and analysing them concerning our research questions. The article concludes with the study’s limitations and an outlook for further research.

Literature overview and detailed research questions

As already mentioned, the issue of the effect of the respective citation period has so far been the subject of some empirical research. In the following, we give a brief overview of the existing studies and their conclusions:

By using half-lives for 70,000 publications and a time frame of 55 years, Rovira-Esteva et al. (2019) determined that 50% of citations occurred within 5 years. From the 10th year, an increase in the citations of only a small percentage can be registered. The authors draw the conclusion that a period of between 5 and 10 years is sufficient to reach valid statements regarding the long-term perception of a publication. An approach for differentiated analysis of various fields can also be found within this study.

Abramo et al. (2019) base their analysis on 123,128 publications from Web of Science (WoS). They use a linear regression model and combine the journal impact factor and the number of citations. The authors conclude that a citation period of 3 years is sufficient to determine the long-term effects of scientific publications.

Wang et al. (2019) investigated the correlation between previous citations and new citations for 36 journals over a period of 10 years from 2008 until 2017. They use several citation periods to analyse the period, where the previous citations have the greatest influence on future citation of the regarded publications. It was found that the correlation between previous citations and new citations decrease significantly with increasing time windows. Thus, citations that are less recent in time show a greater influence on future citations of the publications than ‘older’ citations.

Liu et al. (2015) performed a comparison of journal impact factors with different citation periods and “peer-reviewed results” for ophthalmological journals. The highest correlations between peer-reviewed results and journal impact factor were found for a citation period of 3 and 4 years, which leads to the conclusion that a 3-year citation period is sufficient to reflect the actual impact of journals. Leydesdorff et al. (2013) came to the same conclusions, especially considering that a 2-year citation period does not cover disciplines with a slow citation build-up. Dorta-Gonzalez and Dorta-Gonzalez (2013), on the other hand, proposed to use the citation period with the highest average number of citations for the calculation of the journal impact factor.

Wang (2013) studies the correlations between all citations over 31 years and the cumulative citations in all possible periods for 746,460 publications from 1980 available on WoS. The results of the correlation analyses, i.e. the Spearman rank correlation coefficient between the total number of citations and the cumulative citations, of 0.754 for 3 years, 0.871 for 5 years, and 0.948 for 10 years, imply that even a citation period of 3 years can show long-term effects and establish a fundamental trend that will not change.

The delayed impact of publications, the so-called “sleeping beauties”, is used in the scientific community as a counter-argument for the consideration of short or medium-term citation periods for citation indicators. Glänzel et al. (2003) analyse publications from 1980 over a period of 21 years to gain insights into the significance of the share of publications with delayed impact. They find out that 76% of all publications were cited in the first 3 years after the publication year. Publications that are not cited in the period of 3 or 5 years were associated with a lower expected citation impact, so delayed uptake does not lead to a postponement of the citation process by several years. Publications that attract a lot of attention in later periods represent extreme cases (see Glänzel et al. 2003;van Raan 2004a), e.g., concerning Glänzel only 0.00014% of almost 450,000 publications were sleeping beauties. In 2008, Glänzel stated “that the particular choice of a standard citation window cannot be made responsible for possibly negative results of an otherwise correct bibliometric evaluation study”.

Gonzalez and Gonzalez (2016) regard the long-term impact of publications. They postulate that the length of a citation period with the highest significance varies over time and by discipline. In the literature, citation periods with a fixed start and end are predominantly used. This does not reflect different levels of maturity in specific disciplines, and advantage or disadvantage some publications. From this, the authors conclude that a variable citation period is more suitable, which takes into account the differences between the disciplines to be analysed.

Based on the existing studies, it can be said that no consensus has been registered to date as to what effect the length of the citation period since the publication’s release has on the validity of the used citation indicators. The recommendations vary between 3 and 5 years. Since we wish to pursue these issues in this article, we concretise our research questions by formulating five more detailed ones:

Ketzler and Zimmermann (2013) find that the age of an article has a significant positive effect on the number of citations. It is therefore to be expected that the results of the citation indicators will differ for citation periods of different lengths, since analysis of longer citation periods involves an increased volume of citations. Consequently, if the citation period is extended, the number of uncited publications should decrease, causing the variance in the citations obtained for each publication to increase. Nicolaisen and Frandsen (2019) discovered that the ratio of non-citation frequency of publications reacts sensitively to the length of the citation period. Hu and Wu (2014) found that for a citation period of 12 years or longer, the percentage of papers never cited is stable and the probability of being cited in the future is very low for these publications. Using duration analyses, van Dalen and Henkens (2005) showed, contrary to the prevailing myth, that the chance of being cited decreases with age, non-citability does not play a role in the temporal distribution of citations. Based on these findings, our first research question is as follows:

(R1) Does the informative content of citation indicators increases for longer citation periods?

A publication ages based on the principle of phases of maturing and decline of its citation volume (Glänzel and Schoepflin 1995). We can conclude from this that once the publication transitions from the maturing phase to the decline phase, analysis of a longer citation period provides no additional informative value with regard to determining the impact of a publication. At this point, saturation is reached. This raises the question of whether the trend towards saturation can be detected at a certain point during the maturing process. Due to the heterogeneous citation behaviour of departments or research fields, the maturation of citations up to their distribution peak takes place according to different subject-specific time patterns (see, e.g., Abramo et al. 2011; Lehmann et al. 2006; Glänzel and Moed 2002). Since nobody knows the existence and the time of occurrence of high citation numbers of sleeping beauties, it is difficult to define the length of time until the citation matures (see, e.g., El Aichouchi and Gorry 2018; Song et al. 2018; van Raan and Winnink 2018; Teixeira et al. 2017). Based on this we formulate our second detailed research question:

(R2) Is there a citation period for which the long-term impact of citations can be determined and after which a longer period of analysis of the citation rate does not provide any additional informative value?

Within the framework of the above-mentioned maturating process, it is questionable, how the distribution of citations will change over time. Since Lotka’s law (Lotka 1926) we know that citation distributions are skewed. There is a large number of studies, which confirm this law. These studies conclude that citations are distributed heterogeneously among publications. This means that a small percentage of publications generate the majority of citations and a large percentage of publications receive zero or only a few number of citations (see, e.g., Ruiz-Castillo and Costas 2014, 2018 Radicchi and Castellano 2012; Albarran et al. 2011; Albarran and Ruiz-Castillo 2011; Radicchi et al. 2008). According to Ruiz-Castillo and Costas (2018), the skewness of the individual citation distributions varies greatly within individual disciplines, but the average skewness in all disciplines is of a similar magnitude. With regard to the observation of the citation distribution over time, Li et al. (2013) found that the forms of citation distributions within each year under consideration are similar. We would like to combine the analysis of Ruiz-Castillo and Costas (2018) and Li et al. (2013) and raise the question:

(R3) To what extent does the uneven distribution of citation numbers differ for the three disciplines under consideration and does it change over time?

The previous mentioned studies are predominantly based on the number of citations. As already established, the research deficit relates especially to the issue of how far the citation period can be limited for other citation indicators. In this article, we will therefore address the h-index as a simple indicator and the J-factor as a standardized indicator.

The h-index has the advantage of not increasing uniformly with each additional citation, but instead only when the citations are changed in the h-core, which means that it is more robust in the face of fluctuations and citation peaks (Hirsch 2005). The h-index has already been subjected to a number of empirical studies with regard to its validity and compensating potential distortions by the bibliometrics community (see, e.g., Hirsch 2007; Alonso et al. 2009; Jensen et al. 2009; Honekopp and Khan 2012; Sharma et al. 2013). In order to eliminate the disadvantages (supposedly) inherent in the h-index, a variety of further indicators have been developed. On the basis of several studies into the relationships between the h-index and its variants, it has been established that the h-index is highly correlated with its variants (see, e.g., Bar-Ilan 2008; Schreiber 2008; Bornmann and Daniel 2009; Bornmann et al. 2011). As a consequence, considering another related and/or derived indicator in addition to the original h-index provides no added information value and we therefore do not consider any of these variants. H-index variants that take the currentness of the publications into consideration can also be disregarded, since the publications from the disciplines being studied come from a narrow range of publication ages. Variants that include the duration of the author’s publication career or the number of co-authors are obsolete, as this study is not conducted at the level of individual scientists.

Schreiber (2015) analysed the development of the timed h-index as a function of the length of citation periods. He found that in most groups the median of all received citations is reached for a citation period of less than 5 years. With short citation periods of two (Fiala 2014) or 3 years (Van Raan 2006), publications that are still at the beginning of the process of citation accumulation are used to calculate the h-index. Consequently, there are still small differences in the citation volume of the individual publications, which reduces the selectivity of the h-index values. However, Pan and Fortunato (2014) conclude that the values of the h-index for a 5-year citation period are already based on more differentiated and higher citation numbers, which provides a higher selectivity of the h-Index values and fluctuations have less influence on the h-index values.

Based on previous studies, it remains unclear, how the value of the h-index will be affected when extending the citation period. As per the design of the h-index, its value does not necessarily and uniformly increase with each additional citation, since only core publications have an effect on this index. This means that the increasing volume of citations that comes with looking at growing citation periods does not cause the h-index to increase proportionally. As a consequence, looking at longer citation periods should imply only limited added informative value, which is why we consider the following:

(R4) Does the added informative value of the h-index remains limited in comparison with the absolute number of citations and the citations per publication when the citation period is extended?

To permit interdisciplinary comparisons to evaluate research performance, differences specific to the disciplines are compared while using methods of standardization. A distinction is drawn between field-normalized and journal-normalized methods. In this regard, Glänzel et al. (2008) compare the characteristics of the field-normalized indicator Normalized Mean Citation Rate (NMCR) for 676 European institutions and two citation periods. Such field-normalized indicators compare the citation rate of a publication with all publications from the same discipline. In the above-mentioned study, there are high correlations between the indicators for all institutions for the 3-year and 5-year citation periods. As a consequence, it was concluded that a 3-year citation period is sufficient to reach valid statements on the effects of citations with regard to the NMCR indicator.

A major weakness of field normalization is the need for the a priori classification of disciplines and sub-disciplines. Any delineation between disciplines will always prompt debates despite the presence of different classification systems. Journal-normalized methods, by contrast, are based on the idea of taking mean citation rates for the available publications from the same year of publication and of the same document type of the unit being studied, and correlating these with the mean citation rate of all publications from the respective journal. This process is applied to each journal in question. As a consequence, normalization is independent of the choice of field classification.

Because of this advantage and the fact that a study of journal-normalized citation indicators has not been carried out to date, we use the J-factor developed by Ball et al. (2009) in this publication. The J-factor analyses the ratio between the citation rates of the publications of a unit I being studied (e.g. a department or university) in a journal z and the citation rates of the publications from a reference group St in the journal z. This means that for each unit being studied, the number of citations per publication CPPI in a particular journal z is correlated with the mean number of citations per publication CPPST for all publications in the journal z with the same year of publication and document type. This ratio is weighted by the proportion of publications \(P^{I} \left( z \right)\) of a unit being studied in this journal in relation to all publications \(P^{I} \left( {\text{all}} \right)\) of the unit being studied in the evaluation period. The total of the results for each journal generates the J-factor for a unit:

$$J\left( {I,{St}} \right) = \mathop \sum \limits_{Z} \frac{{{\text{CPP}}^{I} \left( z \right)}}{{{\text{CPP}}^{St} \left( z \right)}}* \frac{{P^{I} \left( z \right)}}{{P^{I} \left( {\text{all}} \right)}}$$

The J-factor implies a relative evaluation of the specific citation performance in comparison to the citation performance of the reference group. A J-factor of 1 means a citation performance equal to the reference group; a J-factor greater than 1 indicates an above-average citation performance in relation to the reference group, while a J-factor less than 1 signals a below-average citation performance.

One benefit of the J-factor is that the analysis can be performed for a period of unlimited length. The disadvantage is that the unit being studied can also publish in journals unrelated to the discipline, which then show different citation behaviours and can cause distortions of the data as a result. In contrast to simple citation indicators, an increase in the J-factor cannot be achieved simply by increasing the citation numbers, since the higher number of citations also appears in the reference group. This means that the behaviour of the J-factor depends on the citation development of the respective reference group even when the number of citations in a unit being studied increases. Accordingly, the development of the J-factor as the number of citations increases is barely calculable. However, we assume that the J-factor is more informative, the more citations are incorporated into the factor’s calculation. The number of citations from both the unit being studied and the reference group will continue to increase as the citation period grows longer. Accordingly, with regard to field-normalized citation indicators, we ask ourselves if looking at longer citation periods leads to added informative value for the J-factor:

(R5) Does the informative value of the J-factor increase as the citation period grows longer?

As already noted, disciplines differ in their citation behaviour, regarding both the citation volume and the timespan in which they display their citation potential. Because of this, short, or even too short, citation periods for some disciplines can lead to an inaccurate evaluation (Glänzel et al. 2008). Accordingly, the length of the citation period, adapted to the citation behaviour of the respective discipline, should be selected (see, e.g., Wang 2013; Waltman et al. 2011). Based on the preceding findings that disciplines differ in their citation behaviour and require different timespans to mature their citation potential as a consequence, we finally question the following:

(R6) Do the results concerning the length of the citation periods with regard to the effect of the research differ depending on the discipline being studied?

Study design

Data collection

The starting point for our data collection process are publications from the so-called ‘CHE ranking’, carried out by the Center for Higher Education Development (CHE). The CHE provides a list of all chairs and institutes from different disciplines of universities in Germany. For these chairs and institutes, the CHE creates a discipline-specific ranking in a 3-year-cycle. One indicator used is the absolute number of publications during the regarded 3 years (Berghoff et al. 2009). The bibliometrics team at Forschungszentrum Jülich annually prepares the publication data for these chairs and institutes. Therefore, all publications are determined top-down, starting at the university-level and going down to a special institute or chair, according to the work-done-at-method in WoS. If several institutions co-authored a publication, each participating institution is credited once for the publication. A fractional counting is not applied. In this way, publications are assigned to fields without using a classification. For our study, we took the CHE publication list from the chairs and institutions of business administration, biology, and medicine, for the period 2007 to 2009. Only the document types “Article” and “Review” are taken into account.

Based on these publication data, we determined the citation data for the three stated disciplines in the WoS local installation of the Competence Centre for Bibliometrics.Footnote 1 We will look at ten separate consecutive citation periods, starting in 2009 up to 2018. The first citation period is for 3 years and runs from 2007 up to and including 2009, the second is from 2007 up to 2010, the third from 2007 up to 2011 and so on. Based on this absolute publication and citation data we calculate for each chair or institution and for each citation period the following indicators: the citations per publication, the h-index and the J-factor. The process of data collection and the represented level of the study are not on a field level, also it looks like, but instead of this, they are on an institutional level. Furthermore, there is no scientific reason in the survey methodology why the indicators should not work on a field level. As long as the examined subset is part of the overall benchmark, h-Index and J-factor works also on a field level (see van Raan 2004b; Malesios and Psarakis 2014).

Analysis instruments

In order to determine informative citation periods, various mathematical and/or statistical methods were used in the bibliometric literature, such as cited half-lives, linear regressions, and correlation coefficients. A cited half-life means the timespan after which half of all the citations received by a publication occur (De Bellis 2009). Linear regressions are used to make forecasts and describe relationships between two variables. However, neither method is suitable for testing our hypotheses.

By contrast, correlation coefficients describe the relationship between two or more variables and therefore are generally suitable for addressing our research questions. However, to do this, we need to define what we mean when we talk about a citation indicator’s added information value below. This means that increasing an indicator, e.g. increasing the absolute number of citations, will always provide added information value, since this higher figure reflects the cited paper’s perception in the community. In this respect, it naturally makes a difference whether the paper has been cited 5 or 500 times.

However, performance indicators are frequently used in higher education evaluations to generate rankings of individual scientists, departments, or universities. For this purpose, as already mentioned, CHE has developed a research ranking, which has undergone critical analysis by Clermont and Dirksen (2016). In rankings, the precise characteristic of the indicator is irrelevant to the viewer. What is important is its rank position. Therefore, there is added information value in this kind of analysis when the rankings differ strongly from each other as the citation period varies. Ranking correlation coefficients are suitable for such a study. There are two known methods for calculating this: Spearman’s ρ (Spearman 1904) and Kendall’s τ (Kendall 1938).

According to Spearman’s method, pairs of ranks are formed and compared with each other (Xu et al. 2013). However, problems occur when there are ties, i.e. when two values have the same ranking (Schendera 2004). As a solution to this, an arithmetic mean is formed, which has negative effects on the results’ informative value. Furthermore, Spearman assumes that the differences present in the rankings are equivalent, i.e. that the difference between first and second place is equivalent to the difference between last and second-to-last. We were not able to establish a relationship of this kind with our collected data.

In contrast to Spearman, Kendall does not assume identical differences in the rankings. Kendall’s ranking correlation coefficient assumes a given ranking based on an indicator used (e.g. a ranking based on the number of citations between 2007 and 2009). It is measured then, how often a ranking based on another indicator (e.g. a ranking based on the number of citations between 2007 and 2018) “breaks” the initial ranking. The resulting number is divided by the number of all possible rankings, so that the coefficient is between − 1 and + 1. A value of + 1 means, that all ranking positions are identical; a value of − 1 assumes that the ranking positions are reversed.

Let us assume a ranking of all universities under consideration n based on the initial indicator x1. Starting from the original ranking, a pair comparison with all subsequent rankings is made, starting with the first ranking position of the initial ranking. Thus 0.5n(n − 1) pair comparisons are carried out. Concerning all pair comparisons it is analysed, how many concordant and discordant pairs exist. If the ranking positions based on x2 are greater than the initial regarded ranking position based on x1, this pair is concordant. Accordingly, a pair is discordant, if the following ranking position is smaller than the regarded one. If we denote the number of concordant pairs as C and the number of discordant pairs as D, Kendall’s τ is defined as follows:

$$\tau = \frac{C - D}{{0.5n\left( {n - 1} \right)}}$$

If there are bindings, i.e. identical rank positions, the above formula have to be extended, since bindings are neither concordant nor discordant. We refrain from an explicit presentation here and refer to the relevant statistical literature.

An advantage of Kendall’s ranking correlation coefficient is that all rankings can be compared with each other, and not just individual ranking pairs. For our study, this would mean that all ten measured values for the ten citation periods can be compared with each other for individual indicators. According to a study by Xu et al. (2013), Kendall’s method is also preferable with regard to a smaller number of data sets and outliers in the data. This corresponds to our data situation with a skewed distribution (see also the next section), potential outliers, and a low volume of data from 86 universities for business studies, 68 for biology, and 44 for medicine. On the basis of the considered properties of the two methods and the stated characteristics of our data situation, we use the ranking correlation coefficient by Kendall.

Results

Descriptive analysis

The descriptive analyses of the three considered disciplines with regard to the ten generated citation periods for the individual indicators are shown in Tables 1, 2 and 3. Here, the position of the data is given in more detail with the arithmetic mean, median, minimum, and maximum values for each indicator and citation period. The distributional width of the data is shown based on the standard deviation and variation coefficient.

Table 1 Descriptive analysis: business administration
Table 2 Descriptive analysis: biology
Table 3 Descriptive analysis: medicine

A consistent pattern emerges for the arithmetic mean of the citation indicators of all disciplines and citation periods. The indicators “number of citations” and “citations per publication” increase strongly in the first 4 years. For business administration, the increase between the citation periods 2007 to 2009 and 2007 to 2010 is 109% (biology and medicine: approx. 178%), between 2007 to 2010 and 2007 to 2011 65% (biology and medicine: approx. 95%), between 2007 to 2011 and 2007 to 2012 44% (biology and medicine: approx. 54%) and between 2007 to 2012 and 2007 to 2013 33% (biology and medicine: approx. 34%). From the following periods onwards, the growth levelled off at approx. 20% and fell below 20% afterwards. In the last analysed citation period (from 2007 to 2018), there is only a growth of about 3% for all regarded disciplines.

Similar tendencies can be seen for the h-index. However, as discussed in our literature overview, not being on the same scale as the citations and citations per paper. Thus, the metrics for business administration increase between the citation periods from 2007 to 2009 and 2007 to 2010 by 47% (Biology and Medicine: approx. 65%), between 2007 to 2010 and 2007 to 2011 by 27% (Biology and Medicine: approx. 40%), between 2007 to 2011 and 2007 to 2012 by 18% (Biology and Medicine: approx. 28%), between 2007 to 2012 and 2007 to 2013 by 10% (Biology and Medicine: approx. 20%). From the following citation period on, the growth is below 10% (biology and medicine: approx. below 15%) and falls for all three disciplines to 1% in the following periods. The smaller rise for the h-index can be attributed to the fact that this metric, related to the length of the citation periods, exhibits a certain level of robustness and the fact that only a certain subset of the citations (h-core) is taken into account when calculating the h-index.

The results regarding the arithmetic mean of the J-factor paint a different picture. It shows that the J-factor between 2007 to 2009 and 2007 to 2010 for biology and medicine increase around 30%, since for business administration it is smaller (around 10%). For the rest of the citation periods, the J-factor is stagnant around − 1% and + 1% for all three disciplines.

If we take the median into account, we see that this diverges considerably for the absolute number of citations. Furthermore, the characteristics of the median are far below those of the arithmetic mean. This leads us to conclude that the distribution of the citation numbers has been right-skewed, in line with; Saam and Reiter (1999); Seglen (1992); Price (1965). This result is also confirmed by the high standard deviations and variation coefficients of all three disciplines, which indicate a high degree of distribution. To analyse this, we refer to the Lorenz curve, which displays the relative concentration of the citations’ frequency distributions, i.e. it shows the unevenness of a distribution. In Fig. 1 the cumulative percentage of the universities is shown on the abscissa and the cumulative percentage of the number of citations is shown on the ordinate. Thus, each point on the curve shows what percentage of the universities obtains what proportion of the overall number of citations. The (drawn) bisector is a hypothetical construct as a standard of comparison for the citation distributions of the three disciplines, it cannot be achieved in reality. It can be interpreted as an equal distribution of citations, i.e. each university receives the exact same number of citations.

Fig. 1
figure 1

Lorenz curves of citation distributions for all citation periods and disciplines

For business administration, it can be seen that 80% of the universities receive around just 30% of the citations for their publications, while the remaining 20% receive almost 70% of the citations. The citations of biology and medicine are more uniform distributed than the citations of business administration, concerning biology 80% of the universities receives 50% of the citation (60% in medicine), while 20% gets the other 50% (40% in medicine).

In order to specify the uneven distribution more accurately, we calculate the “Gini coefficient”. It provides information about the degree of unevenness of the citation distributions and is determined based on the ratio of the area bounded by the 45° line and the Lorenz curve to the overall area (Pyatt 1976): \(= \frac{{2\mathop \sum \nolimits_{i = 1}^{n} ix_{i} }}{{n\mathop \sum \nolimits_{i = 1}^{n} x_{i} }} - \frac{n + 1}{n}\).

A Gini coefficient of 0 implies that all the citations are evenly distributed among the universities, while a Gini-coefficient of 1 means that a single university has received all the citations. Table 4 lists the Gini coefficients found for the three disciplines and citation periods. These once again confirm the statements made based on the Lorenz curves.

Table 4 Gini coefficients

The disciplines’ different publication and citation behaviours can be attributed to different citation habits and the differing ratio of coverage in WoS. For example, business administration has the lowest number of publications and citations in the present comparison of the disciplines and, based on the Lorenz curve, has the most uneven citation distribution. On the other hand, biology already shows 19-fold the number of publications and 35-fold the number of citations of business administration and the Lorenz curve runs more equally than for business administration. Medicine publishes 5-fold (105-fold) and cites 4-fold (150-fold) as much as biology (business administration) and the Lorenz curve has the most equal distribution of the three disciplines. Based on our dataset, we are able to identify discipline-dependent differences in the distributional skewness. With respect to the observation of the uneven distribution over time, the Gini coefficient shows minimal tendencies towards uniformity of the citation distribution. While the Gini coefficient for business administration and medicine only increases by 3% and 4%, the Gini coefficient for biology even decreases by 1%. These findings are confirmed by the course of the Lorenz curves over time.

Correlation analyses

An extract of the values from the correlation analyses for the citation indicators being studied and the included citation periods with regard to the individual disciplines can be found in Tables 5, 6 and 7. In the following, we look in particular at the correlations of the individual citation periods with the last period for the years 2007 to 2018. Resulting correlations of over 90% give sufficient indication that further survey periods only allow for lower additional information. Therefore, we assume that an appropriate citation period—especially from the point of view of a ranking—exists precisely when the ranking correlation coefficient exceeds this value of 90%.

Table 5 Extract of linear correlations and ranking correlations for the citation indicators in business administration
Table 6 Extract of linear correlations and ranking correlations for the citation indicators in biology
Table 7 Extract of linear correlations and ranking correlations for the citation indicators in medicine

Intertemporal evaluation of the correlations of citation indicators from the ten consecutive citation periods confirms the previously established results. A rank correlation of over 90% for the number of citations for business administration between the citation periods of 2007 to 2012 and 2007 to 2013 can be registered, for biology between 2007 to 2010 and 2007 to 2011 and for medicine between 2007 to 2009 and 2007 to 2010. While there are already rank correlation coefficients of more than 80% for biology and medicine from the first citation period on, there are more changes in business administration between 2009 and 2011. We have already seen in the descriptive analysis that the number of citations rises stronger for business administration during the first years. It seems that the time between publication and citing is longer in business administration than in biology and medicine.

Looking at the citations per paper, it can be seen that there are rank correlation coefficients for business administration and biology of over 90% from 2014 onwards. For medicine, however, rank correlations of over 90% are already achieved in the citation period 2012.

For the h-index it turns out that for all three disciplines, rank correlations of over 90% exist from 2012 onwards. In most cases there is also a high correlation of over 90% between the h-index and the absolute number of citations from all the citation periods, as well as between the h-index for each citation period. The rank correlation coefficients also have high values. Accordingly, there is no great change between rankings for the h-index depending on what citation period is looked at. The high rank correlations between the absolute number of citations and the h-index indicate that the rankings resulting from both indicators are similar (see also the results in Clermont et al. 2017).

The J-factor reveals another pattern here. The correlation coefficients for the J-factor primarily indicate a positive linear correlation. However, it is apparent that the concrete properties of the rank correlations for the J-factor diverge with regard to the different citation periods. Rank correlations of over 80% are achieved in business administration between the J-factors for 2013 and 2014, for biology between 2012 and 2013 and for medicine between 2014 and 2015. For business administration and medicine, J-factor rank correlations exceed 90% between the citation periods 2007 to 2016 and 2007 to 2017. For biology, however, rank correlations of 90% are already available between 2007 to 2015 and 2007 to 2016.

Implications

In this section, we answer our formulated research questions based on the results presented above. In order to do this, we will in each case begin by repeating the research question.

(R1) Does the informative content of citation indicators increases for longer citation periods?

As already explained in our literature overview, a distinction should be drawn between additional informative value from an increase in citations and thus the impact of a paper, as well as a change in the rankings. With regard to the general impact of a paper, our analyses show that the citations received increase sharply between the first four citation periods up to 2012. From then on, a moderate increase in citations runs up to a stagnation. This produces a larger population of received citations, which leads us to anticipate that generally speaking, this is accompanied by a greater informative value between a short and medium citation period length in terms of increased validity. Regarding the issue of the rankings, it has emerged that high rank correlations of more than 90% result between all the citation periods for the indicators “number of citations” in medicine, for biology one period later. For business administration 90% are reached for the citation period 2007–2012. This shows that, regarding the formation of a ranking, although the absolute numbers of citations increase over time, an additional analysis of the citation metric at a later point in the survey only provides redundant information and is not necessary. For this study, this means that considering citation periods from 2012 on is obsolete concerning the formation of a ranking.

(R2) Is there a citation period for which the long-term impact of citations can be determined and after which a longer period of analysis of the citation rate does not provide any additional informative value?

The above findings support this research question. It exists a citation period in which the trend of the citation rate can be detected early. Other extended periods do not necessarily provide any additional informative value. The position parameters illustrate that the citation numbers no longer increase sharply between the citation periods of 2013 and 2018. This allows us to conclude that the transition between the maturing phase and decline phase of the publications took place near 2013. From a ranking perspective, this period is earlier (when regarding the absolute number of citations). This result is mainly in line with the results in Glänzel (2008), who stated: “A 3-year citation window suffices at both the national and the institutional level if properly standardised and normalised citation indicators are used.” Finally, in a sound bibliometric evaluation, the same rules of the game are applied to all units of assessment. However, as we will discuss later, the first part of this statement is not valid for the standardised and normalised citation indicator which we used here.

(R3) To what extent does the uneven distribution of citation numbers differ for the three disciplines under consideration and does it change over time?

In the course of the descriptive analysis of the citation numbers over time and for all three disciplines, our results confirm the current state of the art on research concerning citation distribution, since it shows a skewed distribution independent of the discipline and independent from time. This means that the results of all the disciplines are in line with Lotka’s Law: a small share of authors generate a large proportion of the total citations (Ruiz-Castillo and Costas 2018). However, differences in the characteristic of the citations unequal distribution have occurred between the disciplines. However, a trend appears to be emerging that more uniform distributions of citations occur as the number of publications and citations increases. One explanation could be that outliers in the form of, for example, highly cited publications, can distort the citation distribution. On the other hand, a larger number of publications leads to a greater diversity of data sets, which results in a balancing effect in the sum. However, this does not mean that a uniform distribution could be reached. This would only occur in the almost impossible case that all the citations were homogeneously distributed among the publications. Our investigations show that even if the time horizon of citation generation is extended, the citation distribution remains largely constant. However, Katz (2016) finds out that longer citation periods are associated with an increase in the skewness of the citation distribution. To state that means, whether a publication receives a high or low impact and thus high or low citation numbers is already crystallized in an ‘early stage’ of citation accumulation. It seems that publications that were already heavily cited at the beginning are also heavily cited in the maturing process up to the peak of citation, and publications that are cited infrequently receive relatively little attention in later periods. We should be aware of the fact that by citing a scientific publication, the author is making a decision about its relevance that can be reflected in performance indicators. It could not be explained otherwise that there is, e.g., a correlation between publications with intellectual relevance assessment and bibliometric indicators (Breuer et al. 2020). This is made all the more apparent by the fact that the underlying publication set was not changed at any time during this study. This means that all changes in distribution patterns that we can observe here are based solely on the perception of the underlying publications. The perception of a publication, whether it makes an essential contribution to a research topic or not, does not really change during the period of perception.

Consequently, a kind of citation pattern emerges for each publication that hardly changes over time. Only “sleeping beauties” change their patterns over time, these are however, as already noted, very rare (Glänzel et al. 2003), so they do not influence the general citation pattern.

(R4) Does the added informative value of the h-index remain limited in comparison with the absolute number of citations and the citations per publication when the citation period is extended?

The findings regarding the informative content of research question R1 can almost be transferred to the h-index. We can see similar tendencies, and there are no specific differences between the disciplines. While the rankings based on the number of citations are slightly influenced by subject-specific citation behaviour, the rankings of the h-index hardly change and remain stable. This can be attributed to the construction of the h-index. Although it is clear that the fundamental absolute dimensions increase by less for the h-index, this is precisely what we expected. This is because the metric is designed in such a way that it does not uniformly increase with each additional citation. Therefore, a general increase in the added informative value can also be assumed for the h-index, while the ranking proves to be stable after a period of just 6 years for all three disciplines.

(R5) Does the informative value of the J-factor increase as the citation period grows longer?

Regarding the J-factor, in contrast with the other citation indicators, we have established differences in the properties between the citation periods. Only small correlations of between 20% and 60% can be registered between the first three citation periods between 2007 to 2009 and 2007 to 2011 for all three disciplines. The low correlations imply that the informative value of the J-factor diverges when the stated citation periods are used. As a consequence, considering extended citation periods provide additional informative value. This is due to the fact that numerous publications have not yet been cited in a relatively short citation period. If the citation periods are extended, the number of publications that have not been cited decreases, with the result that the J-factors for business administration and medicine for the 2016 period and biology for the 2015 period can be viewed as valid and informative. Thus, at least in relation to the ranking, the extension no longer provides any significant additional value.

(R6) Do the results concerning the length of the citation periods with regard to the effect of the research differ depending on the discipline being studied?

Over the course of the analyses of the position parameters and correlations, a nearly homogeneous pattern for all the disciplines being studied was noted. Specifically, a sharp increase in the citations is registered for the position parameters between the citation periods 2007 and 2009 up to 2007 and 2012, while only small increases result between 2007 to 2013 onwards. The distribution parameters show great differences. This can be attributed to the fact that the disciplines have different citation volumes, i.e. the more citations there are, the more the effect of outliers is reduced and the more uniform the distribution is. Because the citation distributions are slightly affected by the length of the citation periods, except for business administration, they can be ignored hereafter. Looking at the correlation coefficients shows that there are high correlations between the ten consecutive citation periods for all three disciplines and the difference in the degree of correlation is negligible. Based on the empirical findings, we can draw the conclusion that the length of the citation periods with regard to the effect of the research does not differ between the disciplines.

Limitations and outlook

With regard to collecting real data sets, it should be noted that only a portion of all the available publications can be included in the analysis. This can be attributed to the fact that coverage of the publications in the existing literature databases is not complete. The existing literature databases differ in the extent of the coverage and also show great differences for the various disciplines. According to a study by Craig et al. (2014), coverage in WoS is at 47% in economics, between 69% and 90% for biology, and at 84% for medicine. According to Heinze et al. (2019), coverage of economics is at 53,8%, biology is at 83% and medicine is at 80,5%. Ultimately, there always needs to be a check for each discipline whether the section used is a sufficiently representative sample.

All WoS publications from the examined disciplines of the “article” or “review” document types have been included in this study. Other document types, particularly little-cited ones such as proceedings papers or letters, can cause fluctuations, especially in the J-factor, because relatively low absolute publication figures on the part of the benchmark could relatively quickly cause ratios well above the mean.

Finally, it should be noted that this study and its findings are based on a general approach. As a consequence, correlations are determined for complete rankings, based on the correlation and rank correlation analyses. However, it should be noted that major changes may still occur for a specific university in a particular discipline, even if the correlation coefficients are high.

Regarding further research, the extent to which our findings related to the J-factor are also valid for other normalizing citation indicators should also be studied. For this purpose, the analysis could be expanded to include another journal-normalized indicator and a field-normalized indicator. Looking at just one further journal-normalized indicator would not be sufficient, as it would not be possible to distinguish whether the findings are due simply to the method of normalization per se or are only caused by the method of journal-normalization. With regard to the internationalization of research, it would be of interest to compare the effects of research in Germany with other countries (Haustein and Tunger 2013).