skip to main content
10.1145/3529372.3533285acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
short-paper

TinyGenius: intertwining natural language processing with microtask crowdsourcing for scholarly knowledge graph creation

Authors Info & Claims
Published:20 June 2022Publication History

ABSTRACT

As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article content. However, autonomous NLP methods are by far not sufficiently accurate to create a high-quality knowledge graph. Yet quality is crucial for the graph to be useful in practice. We present TinyGenius, a methodology to validate NLP-extracted scholarly knowledge statements using microtasks performed with crowdsourcing. The scholarly context in which the crowd workers operate has multiple challenges. The explainability of the employed NLP methods is crucial to provide context in order to support the decision process of crowd workers. We employed TinyGenius to populate a paper-centric knowledge graph, using five distinct NLP methods. In the end, the resulting knowledge graph serves as a digital library for scholarly articles.

References

  1. Rubayyi Alghamdi and Khalid Alfalqi. 2015. A Survey of Topic Modeling in Text Mining. International Journal of Advanced Computer Science and Applications 6 (01 2015). Google ScholarGoogle ScholarCross RefCross Ref
  2. Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI '17). Association for Computing Machinery, New York, NY, USA, 2334--2346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Justin Cheng, Jaime Teevan, Shamsi T. Iqbal, and Michael S. Bernstein. 2015. Break It Down: A Comparison of Macro- and Microtasks. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI '15). Association for Computing Machinery, New York, NY, USA, 4061--4064. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gobinda G. Chowdhury. 2003. Natural language processing. Annual Review of Information Science and Technology 37, 1 (2003), 51--89. Google ScholarGoogle ScholarCross RefCross Ref
  5. Benjamin M. Good and Andrew I. Su. 2013. Crowdsourcing for bioinformatics. Bioinformatics 29, 16 (06 2013), 1925--1933. Google ScholarGoogle ScholarCross RefCross Ref
  6. Olaf Hartig. 2017. Foundations of RDF* and SPARQL* : (An Alternative Approach to Statement-Level Metadata in RDF). In Proceedings of the 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web 2017 : (CEUR Workshop Proceedings, Vol. 1912). Article 12. http://ceur-ws.org/Vol-1912/paper12.pdfGoogle ScholarGoogle Scholar
  7. Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D'Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. 2019. Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. K-CAP 2019 - Proceedings of the 10th International Conference on Knowledge Capture (2019), 243--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Arif Jinha. 2010. Article 50 million: An estimate of the number of scholarly articles in existence. Learned Publishing 23, 3 (2010), 258--263. Google ScholarGoogle ScholarCross RefCross Ref
  9. Steven Komarov, Katharina Reinecke, and Krzysztof Z. Gajos. 2013. Crowd-sourcing Performance Evaluations of User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI '13). Association for Computing Machinery, New York, NY, USA, 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ora Lassila, Ralph R Swick, et al. 1998. Resource description framework (RDF) model and syntax specification. (1998).Google ScholarGoogle Scholar
  11. Thomas D. LaToza, W. Ben Towne, Christian M. Adriano, and André van der Hoek. 2014. Microtask Programming: Building Software with a Crowd. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (Honolulu, Hawaii, USA) (UIST '14). Association for Computing Machinery, New York, NY, USA, 43--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li. 2022. A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering 34, 1 (2022), 50--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Barend Mons and Jan Velterop. 2009. Nano-publication in the e-science era. CEUR Workshop Proceedings 523 (2009).Google ScholarGoogle Scholar
  14. Allard Oelen, Markus Stocker, and Sören Auer. 2021. Crowdsourcing Scholarly Discourse Annotations. In 26th International Conference on Intelligent User Interfaces (College Station, TX, USA) (IUI '21). 464--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Eric Prudhommeaux and Andy Seaborne. 2008. SPARQL query language for RDF. (2008). http://www.w3.org/TR/rdf-sparql-query/Google ScholarGoogle Scholar
  16. Cristina Sarasua, Elena Simperl, and Natalya F. Noy. 2012. CrowdMap: Crowdsourcing Ontology Alignment with Microtasks. In The Semantic Web - ISWC 2012, Philippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tudorache, Jérôme Euzenat, Manfred Hauswirth, Josiane Xavier Parreira, Jim Hendler, Guus Schreiber, Abraham Bernstein, and Eva Blomqvist (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 525--541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering 27, 2 (2015), 443--460. Google ScholarGoogle ScholarCross RefCross Ref
  18. Markus Stocker, Pauli Paasonen, Markus Fiebig, Martha A Zaidan, and Alex Hardisty. 2018. Curating Scientific Information in Knowledge Infrastructures. Data Science Journal 17 (2018). Google ScholarGoogle ScholarCross RefCross Ref
  19. Oguzhan Tas and Farzad Kiyani. 2007. A survey automatic text summarization. PressAcademia Procedia 5, 1 (2007), 205--213. Google ScholarGoogle ScholarCross RefCross Ref
  20. Jaime Teevan, Daniel J. Liebling, and Walter S. Lasecki. 2014. Selfsourcing Personal Tasks. In CHI '14 Extended Abstracts on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI EA '14). 2527--2532. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TinyGenius: intertwining natural language processing with microtask crowdsourcing for scholarly knowledge graph creation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries
          June 2022
          392 pages
          ISBN:9781450393454
          DOI:10.1145/3529372
          • General Chairs:
          • Akiko Aizawa,
          • Thomas Mandl,
          • Zeljko Carevic,
          • Program Chairs:
          • Annika Hinze,
          • Philipp Mayr,
          • Philipp Schaer

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 June 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          JCDL '22 Paper Acceptance Rate35of132submissions,27%Overall Acceptance Rate415of1,482submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader