Skip to main content

A Comparison of On-Line Computer Science Citation Databases

  • Conference paper
Research and Advanced Technology for Digital Libraries (ECDL 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3652))

Included in the following conference series:

Abstract

This paper examines the difference and similarities between the two on-line computer science citation databases DBLP and CiteSeer. The database entries in DBLP are inserted manually while the CiteSeer entries are obtained autonomously via a crawl of the Web and automatic processing of user submissions. CiteSeer’s autonomous citation database can be considered a form of self-selected on-line survey. It is important to understand the limitations of such databases, particularly when citation information is used to assess the performance of authors, institutions and funding bodies.

We show that the CiteSeer database contains considerably fewer single author papers. This bias can be modeled by an exponential process with intuitive explanation. The model permits us to predict that the DBLP database covers approximately 24% of the entire literature of Computer Science. CiteSeer is also biased against low-cited papers.

Despite their difference, both databases exhibit similar and significantly different citation distributions compared with previous analysis of the Physics community. In both databases, we also observe that the number of authors per paper has been increasing over time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arxiv e-print archive, http://arxiv.org/

  2. Compuscience database, http://www.zblmath.fiz-karlsruhe.de/COMP/quick.htm

  3. Corr, http://xxx.lanl.gov/archive/cs/

  4. Cs bibtex database, http://liinwww.ira.uka.de/bibliography/

  5. Dblp, http://dblp.uni-trier.de/

  6. Scientific citation index, http://www.isinet.com/products/citation/sci/

  7. Spires high energy physics literature database, http://www.slac.stanford.edu/spires/hep/

  8. Sciencedirect digital library (2003), http://www.sciencedirect.com

  9. Bailey, P., Craswell, N., Hawking, D.: Dark matter on the web. In: Poster Proceedings of 9th International World Wide Web Conference. ACM Press, New York (2000)

    Google Scholar 

  10. Batty, M.: Citation geography: It’s about location. The Scientist 17(16) (2003)

    Google Scholar 

  11. Batty, M.: The geography of scientific citation. Environment and Planning A 35, 761–770 (2003)

    Article  Google Scholar 

  12. T.: C and de Albuquerque MP. Are citations of scientific papers a case of nonextensivity (2000)

    Google Scholar 

  13. Cosley, D., Lawrence, S., Pennock, D.M.: REFEREE: An open framework for practical testing of recommender systems using researchindex. In: 28th International Conference on Very Large Databases, VLDB 2002, Hong Kong, August 20–23 (2002)

    Google Scholar 

  14. Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines (2003)

    Google Scholar 

  15. Kim, M.-J.: Comparative study of citations from papers by korean scientists and their journal attributes (1998)

    Google Scholar 

  16. Klink, S., Ley, M., Rabbidge, E., Reuther, P., Walter, B., Weber, A.: Browsing and visualizing digital bibliographic data (2004)

    Google Scholar 

  17. Kotiaho, J.S.: Papers vanish in mis-citation black hole (1999)

    Google Scholar 

  18. Kotiaho, J.S.: Unfamiliar citations breed mistakes (1999)

    Google Scholar 

  19. Laherrére, J., Sornette, D.: Stretched exponential distributions in nature and economy: ’fat tails’ with characteristic scales. The European Physical Journal B - Condensed Matter 2(4), 525–539 (1998)

    Article  Google Scholar 

  20. Lam, S.K., Riedl, J.: Shilling recommender systems for fun and profit. In: Proceedings of the 13th international conference on World Wide Web, pp. 393–402. ACM Press, New York (2004)

    Google Scholar 

  21. Lawrence, S.: Online or invisible? Nature 411(6837), 521 (2001)

    Article  Google Scholar 

  22. Lawrence, S., Giles, C.L., Bollacker, K.: Digital libraries and autonomous citation indexing. IEEE Computer 32(6), 67–71 (1999)

    Google Scholar 

  23. Lehmann, S., Lautrup, B., Jackson, A.D.: Citation networks in high energy physics. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) 68(2), 26–113 (2003)

    Google Scholar 

  24. L.M.: The dblp computer science bibliography: Evolution, research issues, perspectives (2002)

    Google Scholar 

  25. May, R.M.: The scientific wealth of nations. Science 275, 793–795 (1997)

    Article  Google Scholar 

  26. Newman, M.E.J.: The structure of scientific collaboration networks (2000)

    Google Scholar 

  27. Price, D.D.S.: Price, d. de solla, little science, big science. columbia univ. press, new york (1963)

    Google Scholar 

  28. Redner, S.: How popular is your paper? an empirical study of the citation distribution. European Physics Journal B 4, 131–134 (1998)

    Article  Google Scholar 

  29. Simkin, M., Roychowdhury, V.: Read before you cite (2002)

    Google Scholar 

  30. Vazquez, A.: Statistics of citation networks (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Petricek, V., Cox, I.J., Han, H., Councill, I.G., Giles, C.L. (2005). A Comparison of On-Line Computer Science Citation Databases. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2005. Lecture Notes in Computer Science, vol 3652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551362_39

Download citation

  • DOI: https://doi.org/10.1007/11551362_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28767-4

  • Online ISBN: 978-3-540-31931-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics