Abstract
Computer science has experienced dramatic growth and diversification over the last twenty years. Towards a current understanding of the structure of this discipline, we analyze a large sample of the computer science literature from the DBLP database. For insight on the features of this cohort and the relationship within its components, we have constructed article level clusters based on either direct citations or co-citations, and reconciled them with major and minor subject categories in the All Science Journal Classification. We describe complementary insights from clustering by direct citation and co-citation, and both point to the increase in computer science publications and their scope. Our analysis reveals cross-category clusters, some that interact with external fields, such as the biological sciences, while others remain inward looking. Overall, we document an increase in computer science publications and their scope.
Similar content being viewed by others
References
Almeida, H., Guedes, D., Meira, W, Jr., & Zaki, M. (2012). Towards a better quality metric for graph cluster evaluation. Journal of Information and Data Management (JIDM), 3, 378–393.
Archambault, E., Campbell, D., Gingras, Y., & Lariviere, V. (2009). Comparing bibliometric statistics obtained from the web of science and scopus. Journal of the American Society for for Information Science and Technology,. https://doi.org/10.1002/asi.21062.
Association for Computing Machinery: Computing Classification System (2012). https://dl.acm.org/ccs/ccs.cfm. Accessed June 2019.
Boyack, K., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389–2404. https://doi.org/10.1002/asi.21419.
Boyack, K. W. (2017). Investigating the effect of global data on topic detection. Scientometrics, 111(2), 999–1015. https://doi.org/10.1007/s11192-017-2297-y.
Boyack, K. W., Newman, D., Duhon, R. J., Klavans, R., Patek, M., Biberstine, J. R., et al. (2011). Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLOS ONE, 6(3), e18029. https://doi.org/10.1371/journal.pone.0018029.
Boyack, K. W., Patek, M., Ungar, L. H., Yoon, P., & Klavans, R. (2014). Classification of individual articles from all of science by research level. Journal of Informetrics, 8(1), 1–12. https://doi.org/10.1016/j.joi.2013.10.005.
Boyack, K. W., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text: Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology, 64(9), 1759–1767. https://doi.org/10.1002/asi.22896.
Chakraborty, T. (2018). Role of interdisciplinarity in computer sciences: Quantification, impact and life trajectory. Scientometrics, 114(3), 1011–1029. https://doi.org/10.1007/s11192-017-2628-z.
Clarivate Analytics: Web of Science (2019). https://clarivate.com/webofsciencegroup/solutions/web-of-science/. Accessed Dec 2019.
Dhillon, I., Guan, Y., Kulis, B. (2007). Weighted graph cuts without eigenvectors: A multilevel approach. In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 29:11, (pp 1944–1957). ACM Press.
Elsevier: Scopus (2019). https://www.scopus.com/home.uri. Accessed Dec 2019.
Emmons, S., Kobourov, S., Gallant, M., & Börner, K. (2016). Analysis of network clustering algorithms and cluster quality metrics at scale. PloS one, 11(7), e0159161.
Glänzel, W., & Thijs, B. (2017). Using hybrid methods and ‘core documents’ for the representation of clusters and topics: the astronomy dataset. Scientometrics, 111(2), 1071–1087. https://doi.org/10.1007/s11192-017-2301-6.
Kessler, M. M. (1965). Comparison of the results of bibliographic coupling and analytic subject indexing. American Documentation, 16(3), 223–233. https://doi.org/10.1002/asi.5090160309.
Klavans, R., & Boyack, K. W. (2017). Which Type of Citation Analysis Generates the Most Accurate Taxonomy of Scientific and Technical Knowledge? Journal of the Association for Information Science and Technology, 68(4), 984–998. https://doi.org/10.1002/asi.23734.
Korobskiy, D., Davey, A., Liu, S., Devarakonda, S., Chacko, G. (2019). Enhanced Research Network Informatics Environment (ERNIE). Github repository, NET ESolutions Corporation. https://github.com/NETESOLUTIONS/ERNIE. Accessed Dec 2019.
Marshakova-Shaikevich, I. (1973). System of document connections based on references. Nauchno–Tekhnicheskaya Informatsiya Seriya 2-Informatsionnye Protsessy I Sistemy, 6(4), 3–8. https://doi.org/10.1002/asi.4630240406.
National Academies of Sciences, Engineering, and Medicine, et al. (2018). Assessing and Responding to the Growth of Computer Science Undergraduate Enrollments. The National Academies Press, Washington, DC. https://doi.org/10.17226/24926
National Science Foundation: Classification of Fields of Study (2012). https://www.nsf.gov/statistics/nsf13327/pdf/tabb1.pdf. Accessed June 2019.
Perianes-Rodriguez, A., & Ruiz-Castillo, J. (2017). A comparison of the Web of Science and publication-level classification systems of science. Journal of Informetrics, 11, 32–45. https://doi.org/10.1016/j.joi.2016.10.007.
Pham, M.C., Klemma, R. (2010). The structure of the computer science knowledge network. In 2010 International Conference on Advances in Social Networks Analysis and Mi. https://doi.org/10.1109/ASONAM.2010.58
Salton, G., & Bergmark, D. (1979). A citation study of computer science literature. IEEE Transactions on Professional Communication, PC–22(3), 146–158. https://doi.org/10.1109/TPC.1979.6501740.
Shu, F., Julien, C. A., Zhang, L., Qiu, J., Zhang, J., & Larivière, V. (2019). Comparing journal and paper level classifications of science. Journal of Informetrics, 13(1), 202–225. https://doi.org/10.1016/j.joi.2018.12.005.
Shun, J., Roosta-Khorasani, F., Fountoulakis, K., & Mahoney, M. W. (2016). Parallel local graph clustering. Proceedings of the VLDB Endowment, 9(12), 1041–1052. https://doi.org/10.14778/2994509.2994522.
Siebel, T. (2019). Digital transformation: survive and thrive in an era of mass extinction. New York: RosettaBooks.
Sjögårde, P., Ahlgren, P. (2019). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of specialties. Quantitative Science Studies (pp 1–32). https://doi.org/10.1162/qss_a_00004.
Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269. https://doi.org/10.1002/asi.4630240406.
Small, H., & Griffith, B. C. (1974). The structure of scientific literatures I: Identifying and graphing specialties. Science Studies, 4(1), 17–40. https://doi.org/10.1177/030631277400400102.
Small, H., & Sweeney, E. (1985). Clustering the science citation index using co-citations. Scientometrics, 7(3), 391–409. https://doi.org/10.1007/BF02017157.
The dblp Team: dblp Computer Science Bibliography (2018). https://dblp.org/xml/release/dblp-2018-08-01.xml.gz. Accessed June 2019.
Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain to Leiden: Guaranteeing well-connected communities. Scientific Reports, 9(1), 1–12. https://doi.org/10.1038/s41598-019-41695-z.
Šubelj, L., van Eck, N. J., & Waltman, L. (2016). Clustering scientific publications based on citation relations: A systematic comparison of different methods. PLOS ONE, 11(4), e0154404. https://doi.org/10.1371/journal.pone.0154404.
Waltman, L., & van Eck, N. J. (2012). A new methodology for constructing a publication-level classification system of science. Journal of the American Society for Information Science and Technology, 63(12), 2378–2392. https://doi.org/10.1002/asi.22748.
Wang, Q., & Waltman, L. (2016). Large-scale analysis of the accuracy of the journal classification systems of Web of Science and Scopus. Journal of Informetrics, 10(2), 347–364. https://doi.org/10.1016/j.joi.2016.02.003.
Acknowledgements
The authors thank Henry Small for very helpful discussions. Research and development reported in this publication was partially supported by funds from the National Institute on Drug Abuse, National Institutes of Health, US Department of Health and Human Services, under Contract No HHSN271201800040C (N44DA-18-1216). TW is supported by the Grainger Foundation. Citation data used in this paper relied on Scopus data as implemented in the ERNIE project (Korobskiy et al., 2019), which is collaborative between NET ESolutions Corporation and Elsevier Inc. We thank our Elsevier colleagues for their support of the ERNIE project.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest. Elsevier personnel played no role in conceptualization, experimental design, review of results, or conclusions presented. The content of this publication is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, NET ESolutions Corporation, or Elsevier Inc.
Rights and permissions
About this article
Cite this article
Devarakonda, S., Korobskiy, D., Warnow, T. et al. Viewing computer science through citation analysis: Salton and Bergmark Redux. Scientometrics 125, 271–287 (2020). https://doi.org/10.1007/s11192-020-03624-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-020-03624-0