Skip to main content
Log in

A generative model of article citation networks of a subject from a large-scale citation database

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In this paper, we analyze the structure of the article citation network of a particular subject obtained from the Web of Science (WoS) database. In specific, we modify a model proposed in Caldarelli et al. (Phys Rev Lett 89(25):258702, 2002) and develop a generative model for article citation networks in which an article receives citations based on a newly defined property called “importance” introduced in this paper. Since the importance of an article is quantitatively unmeasurable, we consider to use the in-degree of articles, which is the number of citations that an article of interest is cited by other articles, as a surrogate quantity to describe an article’s importance. We simulate some in-degree distributions to estimate the parameters of the tapered Pareto distribution. The generative model shows good performance in the comparison between the generated data and data from the real network, especially the citation network of recent years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Albert, R., Jeong, H., & Barabási, A. L. (1999). Diameter of the world-wide web. Nature, 401, 130–131.

    Article  Google Scholar 

  • Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.

    Article  MathSciNet  Google Scholar 

  • Barabási, A. L., Albert, R., Jeong, H., & Bianconi, G. (2000). Power-law distribution of the world wide web. Science, 287, 2115a.

    Article  Google Scholar 

  • Bianconi, G., & Barabási, A. L. (2001). Competition and multiscaling in evolving networks. Europhysics Letters (EPL), 54(4), 436–442.

    Article  Google Scholar 

  • Caldarelli, G., Capocci, A., De Rios, P., & Munoz, M. A. (2002). Scale-free networks from varying vertex intrinsic fitness. Physical Review Letters, 89(25), 258702.

    Article  Google Scholar 

  • Clauset, A., Shalizi, C. R., & Newman, N. E. (2009). Power-law distribution in empirical data. SIAM Review, 51(4), 661–703.

    Article  MathSciNet  Google Scholar 

  • Dorogovtsev, S. N., & Mendes, J. F. F. (2001). Effect of the accelerating growth of communications networks on their structure. Physical Review E, 63(2), 025101.

    Article  Google Scholar 

  • Dorogovtsev, S. N., Mendes, J. F. F., & Samukhin, A. N. (2000). Structure of growing networks with preferential linking. Physical Review Letters, 85(21), 4633.

    Article  Google Scholar 

  • Erdős, P., & Rényi, A. (1959). On random graphs I. Publicationes Mathematicae Debrecen, 6, 290.

    MathSciNet  MATH  Google Scholar 

  • Gilbert, E. N. (1959). Random graphs. Annals of Mathematical Statistics, 30(4), 1141–1144.

    Article  MathSciNet  Google Scholar 

  • Jung, H., Lee, J. G., Lee, N., & Kim, S. H. (2018). Comparison of fitness and popularity: Fitness-popularity dynamic network model. Journal of Statistical Mechanics, 2018(12), 123403.

    Article  MathSciNet  Google Scholar 

  • Ke, Q., Ferrara, E., Radicchi, F., & Flammini, A. (2015). Defining and identifying sleeping beauties in science. Proceedings of the National Academy of Sciences, 112(24), 7246–7431.

    Article  Google Scholar 

  • Krapivsky, P. L., & Redner, S. (2001). Organization of growing random networks. Physical Review E, 63(6), 066123.

    Article  Google Scholar 

  • Kagan, Y. Y., & Schoenberg, F. P. (2001). Estimation of the upper cutoff parameter for the tapered Pareto distribution. Journal of Applied Probability, 38A, 168–185.

    MathSciNet  MATH  Google Scholar 

  • Mandolbrot, B. B. (1965). Information theory and psycholinguistics. In B. B. Wolman & E. Nagel (Eds.), Scientific psychology. Basic Books.

  • Newman, M. E. (2003). The structure and function of complex networks. SIAM Review, 45(2), 167–256.

    Article  MathSciNet  Google Scholar 

  • Newman, M. E. (2005). Power laws, pareto distributions and zipf’s law. Contemporary Physics, 46(5), 323–351.

    Article  Google Scholar 

  • Pennock, D. M., Flake, G. W., Lawrence, S., Glover, E. J., & Giles, C. L. (2002). Winners don’t take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences, 99, 5207–5211.

    Article  Google Scholar 

  • Pham, T., Sheridan, P., & Shimodaira, H. (2016). Joint estimation of preferential attachment and node fitness in growing complex networks. Science Reports, 6, 32558.

    Article  Google Scholar 

  • Phoa, F. K. H., & Sanchez, J. (2013). Modeling the browsing behaviour of world wide web users. Open Journal of Statistics, 3, 145–154.

    Article  Google Scholar 

  • Phoa, F. K. H., & Lin, W. C. (2013). High-quality winners take more: Modeling non-scale-free bulletin forums with content variations. Journal of Data Science, 11, 559–573.

    Article  MathSciNet  Google Scholar 

  • Pritchard, A. (1969). Statistical bibliography or bibliometrics? Journal of Documentation, 25(4), 348–349.

    Google Scholar 

  • Van Noortwijk, J. M. (2009). A survey of the application of Gamma processes in maintenance. Reliability Engineering and System Safety, 94(1), 2–11.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Clarivate Analytics to provide access to the raw data of the Web of Science database for research investigations, the URA team of ISM for transforming the data into the neo4j database and providing the neo4j database for analysis in this work, and Ms. Ula Tzu-Ning Kung to provide English editing service in this paper. In addition, the authors would like to thank the two reviewers provided many constructive comments and suggestions to improve the quality of this paper. This project was partly supported by Academia Sinica Grant No. AS-TP-109-M07 and the Ministry of Science and Technology (Taiwan) Grant Nos. 107-2118-M-001-011-MY3, 107-2321-B-001-038, 108-2321-B-001-016, and 109-2321-B-001-013. The third author was partly supported by JSPS KAKENHI Grant Number JP20K11715.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frederick Kin Hing Phoa.

Appendix

Appendix

See appendix Figs. 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 and 26

Fig. 10
figure 10

1981–1990 network

Fig. 11
figure 11

1982–1991 network

Fig. 12
figure 12

1983–1992 network

Fig. 13
figure 13

1984–1993 networ

Fig. 14
figure 14

1985–1994 network

Fig. 15
figure 15

1986–1995 network

Fig. 16
figure 16

1987–1996 network

Fig. 17
figure 17

1988–1997 network

Fig. 18
figure 18

1989–1998 network

Fig. 19
figure 19

1990–1999 network

Fig. 20
figure 20

1991–2000 network

Fig. 21
figure 21

1992–2001 network

Fig. 22
figure 22

1993–2002 network

Fig. 23
figure 23

1994–2003 network

Fig. 24
figure 24

1995–2004 network

Fig. 25
figure 25

1996–2005 network

Fig. 26
figure 26

1997–2006 network

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, L.LH., Phoa, F.K.H. & Nakano, J. A generative model of article citation networks of a subject from a large-scale citation database. Scientometrics 126, 7373–7395 (2021). https://doi.org/10.1007/s11192-021-04037-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-021-04037-3

Keywords

Navigation