Abstract
This paper examines the difference and similarities between the two on-line computer science citation databases DBLP and CiteSeer. The database entries in DBLP are inserted manually while the CiteSeer entries are obtained autonomously via a crawl of the Web and automatic processing of user submissions. CiteSeer’s autonomous citation database can be considered a form of self-selected on-line survey. It is important to understand the limitations of such databases, particularly when citation information is used to assess the performance of authors, institutions and funding bodies.
We show that the CiteSeer database contains considerably fewer single author papers. This bias can be modeled by an exponential process with intuitive explanation. The model permits us to predict that the DBLP database covers approximately 24% of the entire literature of Computer Science. CiteSeer is also biased against low-cited papers.
Despite their difference, both databases exhibit similar and significantly different citation distributions compared with previous analysis of the Physics community. In both databases, we also observe that the number of authors per paper has been increasing over time.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Arxiv e-print archive, http://arxiv.org/
Compuscience database, http://www.zblmath.fiz-karlsruhe.de/COMP/quick.htm
Cs bibtex database, http://liinwww.ira.uka.de/bibliography/
Scientific citation index, http://www.isinet.com/products/citation/sci/
Spires high energy physics literature database, http://www.slac.stanford.edu/spires/hep/
Sciencedirect digital library (2003), http://www.sciencedirect.com
Bailey, P., Craswell, N., Hawking, D.: Dark matter on the web. In: Poster Proceedings of 9th International World Wide Web Conference. ACM Press, New York (2000)
Batty, M.: Citation geography: It’s about location. The Scientist 17(16) (2003)
Batty, M.: The geography of scientific citation. Environment and Planning A 35, 761–770 (2003)
T.: C and de Albuquerque MP. Are citations of scientific papers a case of nonextensivity (2000)
Cosley, D., Lawrence, S., Pennock, D.M.: REFEREE: An open framework for practical testing of recommender systems using researchindex. In: 28th International Conference on Very Large Databases, VLDB 2002, Hong Kong, August 20–23 (2002)
Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines (2003)
Kim, M.-J.: Comparative study of citations from papers by korean scientists and their journal attributes (1998)
Klink, S., Ley, M., Rabbidge, E., Reuther, P., Walter, B., Weber, A.: Browsing and visualizing digital bibliographic data (2004)
Kotiaho, J.S.: Papers vanish in mis-citation black hole (1999)
Kotiaho, J.S.: Unfamiliar citations breed mistakes (1999)
Laherrére, J., Sornette, D.: Stretched exponential distributions in nature and economy: ’fat tails’ with characteristic scales. The European Physical Journal B - Condensed Matter 2(4), 525–539 (1998)
Lam, S.K., Riedl, J.: Shilling recommender systems for fun and profit. In: Proceedings of the 13th international conference on World Wide Web, pp. 393–402. ACM Press, New York (2004)
Lawrence, S.: Online or invisible? Nature 411(6837), 521 (2001)
Lawrence, S., Giles, C.L., Bollacker, K.: Digital libraries and autonomous citation indexing. IEEE Computer 32(6), 67–71 (1999)
Lehmann, S., Lautrup, B., Jackson, A.D.: Citation networks in high energy physics. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) 68(2), 26–113 (2003)
L.M.: The dblp computer science bibliography: Evolution, research issues, perspectives (2002)
May, R.M.: The scientific wealth of nations. Science 275, 793–795 (1997)
Newman, M.E.J.: The structure of scientific collaboration networks (2000)
Price, D.D.S.: Price, d. de solla, little science, big science. columbia univ. press, new york (1963)
Redner, S.: How popular is your paper? an empirical study of the citation distribution. European Physics Journal B 4, 131–134 (1998)
Simkin, M., Roychowdhury, V.: Read before you cite (2002)
Vazquez, A.: Statistics of citation networks (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Petricek, V., Cox, I.J., Han, H., Councill, I.G., Giles, C.L. (2005). A Comparison of On-Line Computer Science Citation Databases. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2005. Lecture Notes in Computer Science, vol 3652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551362_39
Download citation
DOI: https://doi.org/10.1007/11551362_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28767-4
Online ISBN: 978-3-540-31931-3
eBook Packages: Computer ScienceComputer Science (R0)