Elsevier

Journal of Informetrics

Volume 6, Issue 4, October 2012, Pages 645-654
Journal of Informetrics

How important is choice of the scaling factor in standardizing citations?

https://doi.org/10.1016/j.joi.2012.07.002Get rights and content

Abstract

Because of the variations in citation behavior across research fields, appropriate standardization must be applied as part of any bibliometric analysis of the productivity of individual scientists and research organizations. Such standardization involves scaling by some factor that characterizes the distribution of the citations of articles from the same year and subject category. In this work we conduct an analysis of the sensitivity of researchers’ productivity rankings to the scaling factor chosen to standardize their citations. To do this we first prepare the productivity rankings for all researchers (more than 30,000) operating in the hard sciences in Italy, over the period 2004–2008. We then measure the shifts in rankings caused by adopting scaling factors other than the particular factor that seems more effective for comparing the impact of publications in different fields: the citation average of the distribution of cited-only publications.

Highlights

► We compare different citations scaling factors in individual productivity assessment. ► We contrast different rankings stemming from five different standardization modes. ► The citation average of cited-only publications is considered as the benchmark. ► The second best seems to be the citations median of cited-only publications. ► Some fields show heavy variations when using scaling factors other than benchmark.

Introduction

Field-standardization of citations is now common practice for any serious bibliometric analysis, applied to comparative measurement of research performance of individuals, entire organizations, departments, or other units. This is necessary because of the different citation behavior of researchers in various fields. A number of studies have shown that there is generally a different time distribution of citations across fields (Gupta et al., 2005, Hurt, 1987, Peters and Van Raan, 1994, Peterson et al., 2010, Redner, 1998, Stringer et al., 2010, Vieira and Gomes, 2010). In schematic terms, the number of citations observed at time t for an article in mathematics is different from the number observed at the same time for an article of the same quality in physics, published in the same year. To make citations comparable for articles that belong to different fields, bibliometricians standardize citations by applying a scaling factor. Failure to carry out such field standardization can cause notable distortions in measures of performance, as demonstrated by various studies (Abramo and D’Angelo, 2007, Leydesdorff, 2011, Lundberg, 2007).

Standardization involves first classifying each article according to its subject category and then scaling the citations. The scaling is carried out by multiplying the citations of each article by a factor that characterizes the distribution of the citations of all articles from the same year and subject category (for example the inverse of the median or mean). In actual practice, bibliometricians adopt different scaling factors. The well-known “crown indicator”, originated by the Leiden University CWTS, scales the citations of a given publication set with respect to mean of the category distribution (Moed, De Bruin, & van Leeuwen, 1995). The Karolinska Institute's “field normalized citation score” also uses the mean as scaling factor, applied to the citations for each publication (Rehn, Kronman, & Wadsko, 2007). Vinkler (1996) in his relative subfield citedness (Rw) (where w refers to “world”) indicator, relates the number of citations obtained by the set of papers evaluated to the number of citations received by a same number of papers published in journals dedicated to the respective discipline, field or subfield. The current authors introduced the “scientific strength” indicator. For this performance indicator, they originally standardized citations by the mean (Abramo & D’Angelo, 2011) but recently, observing the strong skewness of the citation distributions, have switched to the median of the distribution (Abramo, D’Angelo, & Di Costa, 2011). A different overall approach is seen in the “relative impact index” indicator, developed by the Swiss Federal Government's Centre for Science and Technology Studies and reported in the bibliometric handbook for Karolinska Institutet.1 Here, the citation count is fractionalized with regard to the length of the reference list. Other citation indicators take into account the highly skewness of citation distributions, rating each publication in terms of its percentile in the citation distribution (Bornmann and Mutz, 2010, Leydesdorff et al., 2011). Few scholars have carried out studies aimed at identifying the most appropriate scaling factor. Radicchi, Fortunato, and Castellano (2008) showed that citations distributions from 20 different disciplines and years could be rescaled on a universal curve, by applying the scaling factor of average number of citations per article. Following up this work, Radicchi and Castellano (2011) later provided a deeper study of the fields exclusive to physics, and confirmed that “when a rescaling procedure by the average is used, it is possible to compare impartially articles across years and fields” and added that “the median is less sensitive to possible extreme events such as the presence of highly cited papers, but dividing the raw number of cites by the median value leads to less fair comparisons and only for sufficiently old publications”. These empirical analyses refer to specific disciplines, and the extension of the results to other disciplines is not so readily assumed. Albarran, Crespo, Ortuno, and Ruiz-Castillo (2011) and Waltman, van Eck, and van Raan (2012), analyzing a much larger dataset of publications, confirmed that the results hold for many but not for all scientific fields. Recently, Radicchi and Castellano (2012), expanding the dataset for their analysis (about 4,000,000 documents published in 6 distinct years in 8304 scientific journals), introduced a simple mapping able to transform the citations distribution within a specific field into a universal power law, which depends on two parameters. Each of them is specific of a field (i.e. subject category), but for the vast majority of subject categories, the power law exponent is constant. The only subject categories for which the transformation is not a power-law function are hybrid, such as multidisciplinary sciences, or not well defined such as engineering, petroleum or biodiversity conservation. In contrast, Lundberg (2007) suggested that due to the strong skewness of distributions of citations, it was preferable to use the median or geometric mean to scale citations, but he then demonstrated that the “item oriented field normalized logarithm-based citation z-score average” (or citation z-score) was still better.

All these studies, intended to support the choice of the most effective scaling factor for evaluation exercises, suffer from the conditions surrounding the tests, which have not simulated the typical practices of an evaluation exercise. Recently Abramo, Cicero, and D’Angelo (2012) overcame this limitation by simulating the terms of reference of a typical national research assessment exercise. With reference to all Italian universities’ publications in two different years, they compared the effectiveness of six different methods of standardizing citations for all subject categories in the hard sciences, and concluded that the citations average seems the most effective scaling factor, when the average is based only on the publications actually cited.

Observing that different practitioners adopt different methods, in this work we propose to conduct an analysis of the sensitivity of individual researchers’ productivity rankings to the scaling factor chosen to standardize citations. The reference context for the study is the Italian university system, limited to the disciplines where scientific performance can be evaluated by means of bibliometric techniques, meaning the hard sciences. For each standardization mode we calculate the performance rankings for all researchers belonging to these science disciplines over the period 2004–2008. In light of the findings from the work cited above (Abramo et al., 2012), we take the performance rankings derived from standardization by citation average of cited-only publications as benchmark for our analysis. Finally, we measure shifts in rankings from the benchmark, caused by adopting different scaling factors. To the best of our knowledge, the literature does not offer any similar studies that compare and evaluate the results obtained from different scaling factors.

In the next section of the paper we illustrate the methodology for measurement of individual research performance, the reference dataset and the different scaling factors adopted. In the third section we compare the performance rankings obtained from the application of different scaling factors to the study population. In the final section we comment on the results and draw conclusions.

Section snippets

Methodology and dataset

Research activity is a production process in which the inputs consist of human, tangible (scientific instruments, materials, etc.), and intangible (accumulated knowledge, social networks, etc.) resources; and where output, i.e. the new knowledge, has a complex character of both tangible nature (publications, patents, conference presentations, databases, protocols, etc.), and intangible nature (tacit knowledge, consulting activity, etc.). The new-knowledge production function has therefore a

Results and analysis

To measure the shifts from benchmark we carry out a four step process. First is the standardization of citations for each publication in the dataset, in each SDS and for each standardization mode. Second is the calculation of rankings of individual performance by the researchers, in their SDSs. The third step is the overall measurement of the shifts in rankings with respect to the benchmark. Finally, we focus analysis on measurement of shifts for the top 25% and the bottom 25% of performers.

As

Conclusions

In bibliometrics, any serious comparative analysis of research performance for individuals or organizations requires field standardization of citations, due to the presence of different citations behaviors across different fields of research. A preceding study by the authors (Abramo et al., 2012) demonstrated that the citations average (with the average based only on publications actually cited) seems the most effective scaling factor. This work has followed up on the previous analysis,

References (26)

  • H.M. Gupta et al.

    Power-law distributions for the citation index of scientific publications and scientists

    Brazilian Journal of Physics

    (2005)
  • Leydesdorf

    An evaluation of impacts in “nanoscience & nanotechnology:” Steps towards standards for citation analysis

  • L. Leydesdorff et al.

    Turning the tables in citation analysis one more time: Principles for comparing sets of documents

    Journal of the American Society of Information Science and Technology

    (2011)
  • Cited by (18)

    • Does your surname affect the citability of your publications?

      2017, Journal of Informetrics
      Citation Excerpt :

      We also normalize the citations by scientific field. This avoids the distortions otherwise caused by variations in citation behavior across fields (Abramo, Cicero, & D’Angelo, 2012a). We use the indicator called “Article Impact Index” (AII), calculated as the ratio of the number of citations received by the publication, to the average of the citations for all cited Italian publications10 of the same year and WoS journal subject category.

    • A review of the literature on citation impact indicators

      2016, Journal of Informetrics
      Citation Excerpt :

      However, Albarrán et al. (2011c) and Waltman, Van Eck, and Van Raan (2012b) claim that this conclusion is too strong and that no perfect universality of citation distributions is obtained. Abramo, Cicero, and D’Angelo (2012c, 2012d compare a number of normalization approaches and suggest that the best normalization is obtained by dividing the actual number of citations of a publication by the average number of citations of all publications that are in the same field and that have at least one citation. Radicchi and Castellano (2012b) introduce a normalization approach that is based on a transformation of citation counts by a two-parameter power–law function.

    • Investigating the universal distributions of normalized indicators and developing field-independent index

      2013, Journal of Informetrics
      Citation Excerpt :

      Therefore, developing a field-independent index is necessary. To develop a field-independent index, scholars usually argue the issues of scaling methods (Abramo, Cicero, & D’Angelo, 2012; Alonso, Cabrerizo, Herrera-Viedma, & Herrera, 2009; Waltman, Eck, Leeuwen, Visser, & Raan, 2011a; Waltman, Eck, Leeuwen, Visser, & Raan, 2011b). Iglesias and Pecharromán (2007) proposed the multiplication of the h-index of one author by the ratio of the average number of citations received by all papers published in the field of “Physics” to that in the field where the author belongs.

    View all citing articles on Scopus
    View full text