How important is choice of the scaling factor in standardizing citations?
Highlights
► We compare different citations scaling factors in individual productivity assessment. ► We contrast different rankings stemming from five different standardization modes. ► The citation average of cited-only publications is considered as the benchmark. ► The second best seems to be the citations median of cited-only publications. ► Some fields show heavy variations when using scaling factors other than benchmark.
Introduction
Field-standardization of citations is now common practice for any serious bibliometric analysis, applied to comparative measurement of research performance of individuals, entire organizations, departments, or other units. This is necessary because of the different citation behavior of researchers in various fields. A number of studies have shown that there is generally a different time distribution of citations across fields (Gupta et al., 2005, Hurt, 1987, Peters and Van Raan, 1994, Peterson et al., 2010, Redner, 1998, Stringer et al., 2010, Vieira and Gomes, 2010). In schematic terms, the number of citations observed at time t for an article in mathematics is different from the number observed at the same time for an article of the same quality in physics, published in the same year. To make citations comparable for articles that belong to different fields, bibliometricians standardize citations by applying a scaling factor. Failure to carry out such field standardization can cause notable distortions in measures of performance, as demonstrated by various studies (Abramo and D’Angelo, 2007, Leydesdorff, 2011, Lundberg, 2007).
Standardization involves first classifying each article according to its subject category and then scaling the citations. The scaling is carried out by multiplying the citations of each article by a factor that characterizes the distribution of the citations of all articles from the same year and subject category (for example the inverse of the median or mean). In actual practice, bibliometricians adopt different scaling factors. The well-known “crown indicator”, originated by the Leiden University CWTS, scales the citations of a given publication set with respect to mean of the category distribution (Moed, De Bruin, & van Leeuwen, 1995). The Karolinska Institute's “field normalized citation score” also uses the mean as scaling factor, applied to the citations for each publication (Rehn, Kronman, & Wadsko, 2007). Vinkler (1996) in his relative subfield citedness (Rw) (where w refers to “world”) indicator, relates the number of citations obtained by the set of papers evaluated to the number of citations received by a same number of papers published in journals dedicated to the respective discipline, field or subfield. The current authors introduced the “scientific strength” indicator. For this performance indicator, they originally standardized citations by the mean (Abramo & D’Angelo, 2011) but recently, observing the strong skewness of the citation distributions, have switched to the median of the distribution (Abramo, D’Angelo, & Di Costa, 2011). A different overall approach is seen in the “relative impact index” indicator, developed by the Swiss Federal Government's Centre for Science and Technology Studies and reported in the bibliometric handbook for Karolinska Institutet.1 Here, the citation count is fractionalized with regard to the length of the reference list. Other citation indicators take into account the highly skewness of citation distributions, rating each publication in terms of its percentile in the citation distribution (Bornmann and Mutz, 2010, Leydesdorff et al., 2011). Few scholars have carried out studies aimed at identifying the most appropriate scaling factor. Radicchi, Fortunato, and Castellano (2008) showed that citations distributions from 20 different disciplines and years could be rescaled on a universal curve, by applying the scaling factor of average number of citations per article. Following up this work, Radicchi and Castellano (2011) later provided a deeper study of the fields exclusive to physics, and confirmed that “when a rescaling procedure by the average is used, it is possible to compare impartially articles across years and fields” and added that “the median is less sensitive to possible extreme events such as the presence of highly cited papers, but dividing the raw number of cites by the median value leads to less fair comparisons and only for sufficiently old publications”. These empirical analyses refer to specific disciplines, and the extension of the results to other disciplines is not so readily assumed. Albarran, Crespo, Ortuno, and Ruiz-Castillo (2011) and Waltman, van Eck, and van Raan (2012), analyzing a much larger dataset of publications, confirmed that the results hold for many but not for all scientific fields. Recently, Radicchi and Castellano (2012), expanding the dataset for their analysis (about 4,000,000 documents published in 6 distinct years in 8304 scientific journals), introduced a simple mapping able to transform the citations distribution within a specific field into a universal power law, which depends on two parameters. Each of them is specific of a field (i.e. subject category), but for the vast majority of subject categories, the power law exponent is constant. The only subject categories for which the transformation is not a power-law function are hybrid, such as multidisciplinary sciences, or not well defined such as engineering, petroleum or biodiversity conservation. In contrast, Lundberg (2007) suggested that due to the strong skewness of distributions of citations, it was preferable to use the median or geometric mean to scale citations, but he then demonstrated that the “item oriented field normalized logarithm-based citation z-score average” (or citation z-score) was still better.
All these studies, intended to support the choice of the most effective scaling factor for evaluation exercises, suffer from the conditions surrounding the tests, which have not simulated the typical practices of an evaluation exercise. Recently Abramo, Cicero, and D’Angelo (2012) overcame this limitation by simulating the terms of reference of a typical national research assessment exercise. With reference to all Italian universities’ publications in two different years, they compared the effectiveness of six different methods of standardizing citations for all subject categories in the hard sciences, and concluded that the citations average seems the most effective scaling factor, when the average is based only on the publications actually cited.
Observing that different practitioners adopt different methods, in this work we propose to conduct an analysis of the sensitivity of individual researchers’ productivity rankings to the scaling factor chosen to standardize citations. The reference context for the study is the Italian university system, limited to the disciplines where scientific performance can be evaluated by means of bibliometric techniques, meaning the hard sciences. For each standardization mode we calculate the performance rankings for all researchers belonging to these science disciplines over the period 2004–2008. In light of the findings from the work cited above (Abramo et al., 2012), we take the performance rankings derived from standardization by citation average of cited-only publications as benchmark for our analysis. Finally, we measure shifts in rankings from the benchmark, caused by adopting different scaling factors. To the best of our knowledge, the literature does not offer any similar studies that compare and evaluate the results obtained from different scaling factors.
In the next section of the paper we illustrate the methodology for measurement of individual research performance, the reference dataset and the different scaling factors adopted. In the third section we compare the performance rankings obtained from the application of different scaling factors to the study population. In the final section we comment on the results and draw conclusions.
Section snippets
Methodology and dataset
Research activity is a production process in which the inputs consist of human, tangible (scientific instruments, materials, etc.), and intangible (accumulated knowledge, social networks, etc.) resources; and where output, i.e. the new knowledge, has a complex character of both tangible nature (publications, patents, conference presentations, databases, protocols, etc.), and intangible nature (tacit knowledge, consulting activity, etc.). The new-knowledge production function has therefore a
Results and analysis
To measure the shifts from benchmark we carry out a four step process. First is the standardization of citations for each publication in the dataset, in each SDS and for each standardization mode. Second is the calculation of rankings of individual performance by the researchers, in their SDSs. The third step is the overall measurement of the shifts in rankings with respect to the benchmark. Finally, we focus analysis on measurement of shifts for the top 25% and the bottom 25% of performers.
As
Conclusions
In bibliometrics, any serious comparative analysis of research performance for individuals or organizations requires field standardization of citations, due to the presence of different citations behaviors across different fields of research. A preceding study by the authors (Abramo et al., 2012) demonstrated that the citations average (with the average based only on publications actually cited) seems the most effective scaling factor. This work has followed up on the previous analysis,
References (26)
- et al.
Revisiting the scaling of citations for research assessment
Journal of Informetrics
(2012) Conceptual citation differences in science, technology, and social sciences literature
Information Processing & Management
(1987)Lifting the crown-citation z-score
Journal of Informetrics
(2007)- et al.
Citations to scientific articles: Its distribution and dependence on the article features
Journal of Informetrics
(2010) - et al.
Measuring science: Irresistible temptations, easy shortcuts and dangerous consequences
Current Science
(2007) - et al.
National-scale research performance assessment at the individual level
Scientometrics
(2011) - et al.
Research productivity: Are higher academic ranks more productive than lower ones?
Scientometrics
(2011) - et al.
The skewness of science in 219 sub-fields and a number of aggregates
- et al.
A heuristic approach to author name disambiguation in large-scale bibliometric databases
Journal of the American Society for Information Science and Technology
(2011) - et al.
Further steps towards an ideal method of measuring citation performance: The avoidance of citation (ratio) averages in field-normalization
Journal of Informetrics
(2010)
Power-law distributions for the citation index of scientific publications and scientists
Brazilian Journal of Physics
An evaluation of impacts in “nanoscience & nanotechnology:” Steps towards standards for citation analysis
Turning the tables in citation analysis one more time: Principles for comparing sets of documents
Journal of the American Society of Information Science and Technology
Cited by (18)
Does your surname affect the citability of your publications?
2017, Journal of InformetricsCitation Excerpt :We also normalize the citations by scientific field. This avoids the distortions otherwise caused by variations in citation behavior across fields (Abramo, Cicero, & D’Angelo, 2012a). We use the indicator called “Article Impact Index” (AII), calculated as the ratio of the number of citations received by the publication, to the average of the citations for all cited Italian publications10 of the same year and WoS journal subject category.
A review of the literature on citation impact indicators
2016, Journal of InformetricsCitation Excerpt :However, Albarrán et al. (2011c) and Waltman, Van Eck, and Van Raan (2012b) claim that this conclusion is too strong and that no perfect universality of citation distributions is obtained. Abramo, Cicero, and D’Angelo (2012c, 2012d compare a number of normalization approaches and suggest that the best normalization is obtained by dividing the actual number of citations of a publication by the average number of citations of all publications that are in the same field and that have at least one citation. Radicchi and Castellano (2012b) introduce a normalization approach that is based on a transformation of citation counts by a two-parameter power–law function.
Quantitative evaluation of alternative field normalization procedures
2013, Journal of InformetricsInvestigating the universal distributions of normalized indicators and developing field-independent index
2013, Journal of InformetricsCitation Excerpt :Therefore, developing a field-independent index is necessary. To develop a field-independent index, scholars usually argue the issues of scaling methods (Abramo, Cicero, & D’Angelo, 2012; Alonso, Cabrerizo, Herrera-Viedma, & Herrera, 2009; Waltman, Eck, Leeuwen, Visser, & Raan, 2011a; Waltman, Eck, Leeuwen, Visser, & Raan, 2011b). Iglesias and Pecharromán (2007) proposed the multiplication of the h-index of one author by the ratio of the average number of citations received by all papers published in the field of “Physics” to that in the field where the author belongs.
Citation-based Quantitative Evaluations on Scientific Publications: A Literature Review on Citation-based Impact Indicator
2021, Documentation, Information and KnowledgeField normalization of scientometric indicators
2019, Springer Handbooks