short-paper

TinyGenius: intertwining natural language processing with microtask crowdsourcing for scholarly knowledge graph creation

Authors:
Allard Oelen

TIB Leibniz Information Centre for Science and Technology, Hannover, Germany

TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
View Profile

,
Markus Stocker

TIB Leibniz Information Centre for Science and Technology, Hannover, Germany

TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
View Profile

,
Sören Auer

TIB Leibniz Information Centre for Science and Technology, Hannover, Germany

TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
View Profile

JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital LibrariesJune 2022Article No.: 5Pages 1–5https://doi.org/10.1145/3529372.3533285

Published:20 June 2022Publication History

JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

Pages 1–5

ABSTRACT

As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article content. However, autonomous NLP methods are by far not sufficiently accurate to create a high-quality knowledge graph. Yet quality is crucial for the graph to be useful in practice. We present TinyGenius, a methodology to validate NLP-extracted scholarly knowledge statements using microtasks performed with crowdsourcing. The scholarly context in which the crowd workers operate has multiple challenges. The explainability of the employed NLP methods is crucial to provide context in order to support the decision process of crowd workers. We employed TinyGenius to populate a paper-centric knowledge graph, using five distinct NLP methods. In the end, the resulting knowledge graph serves as a digital library for scholarly articles.

References

Rubayyi Alghamdi and Khalid Alfalqi. 2015. A Survey of Topic Modeling in Text Mining. International Journal of Advanced Computer Science and Applications 6 (01 2015). Google ScholarCross Ref
Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI '17). Association for Computing Machinery, New York, NY, USA, 2334--2346. Google ScholarDigital Library
Justin Cheng, Jaime Teevan, Shamsi T. Iqbal, and Michael S. Bernstein. 2015. Break It Down: A Comparison of Macro- and Microtasks. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI '15). Association for Computing Machinery, New York, NY, USA, 4061--4064. Google ScholarDigital Library
Gobinda G. Chowdhury. 2003. Natural language processing. Annual Review of Information Science and Technology 37, 1 (2003), 51--89. Google ScholarCross Ref
Benjamin M. Good and Andrew I. Su. 2013. Crowdsourcing for bioinformatics. Bioinformatics 29, 16 (06 2013), 1925--1933. Google ScholarCross Ref
Olaf Hartig. 2017. Foundations of RDF^* and SPARQL^* : (An Alternative Approach to Statement-Level Metadata in RDF). In Proceedings of the 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web 2017 : (CEUR Workshop Proceedings, Vol. 1912). Article 12. http://ceur-ws.org/Vol-1912/paper12.pdfGoogle Scholar
Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D'Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. 2019. Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. K-CAP 2019 - Proceedings of the 10th International Conference on Knowledge Capture (2019), 243--246. Google ScholarDigital Library
Arif Jinha. 2010. Article 50 million: An estimate of the number of scholarly articles in existence. Learned Publishing 23, 3 (2010), 258--263. Google ScholarCross Ref
Steven Komarov, Katharina Reinecke, and Krzysztof Z. Gajos. 2013. Crowd-sourcing Performance Evaluations of User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI '13). Association for Computing Machinery, New York, NY, USA, 207--216. Google ScholarDigital Library
Ora Lassila, Ralph R Swick, et al. 1998. Resource description framework (RDF) model and syntax specification. (1998).Google Scholar
Thomas D. LaToza, W. Ben Towne, Christian M. Adriano, and André van der Hoek. 2014. Microtask Programming: Building Software with a Crowd. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (Honolulu, Hawaii, USA) (UIST '14). Association for Computing Machinery, New York, NY, USA, 43--54. Google ScholarDigital Library
Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li. 2022. A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering 34, 1 (2022), 50--70. Google ScholarDigital Library
Barend Mons and Jan Velterop. 2009. Nano-publication in the e-science era. CEUR Workshop Proceedings 523 (2009).Google Scholar
Allard Oelen, Markus Stocker, and Sören Auer. 2021. Crowdsourcing Scholarly Discourse Annotations. In 26th International Conference on Intelligent User Interfaces (College Station, TX, USA) (IUI '21). 464--474. Google ScholarDigital Library
Eric Prudhommeaux and Andy Seaborne. 2008. SPARQL query language for RDF. (2008). http://www.w3.org/TR/rdf-sparql-query/Google Scholar
Cristina Sarasua, Elena Simperl, and Natalya F. Noy. 2012. CrowdMap: Crowdsourcing Ontology Alignment with Microtasks. In The Semantic Web - ISWC 2012, Philippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tudorache, Jérôme Euzenat, Manfred Hauswirth, Josiane Xavier Parreira, Jim Hendler, Guus Schreiber, Abraham Bernstein, and Eva Blomqvist (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 525--541. Google ScholarDigital Library
Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering 27, 2 (2015), 443--460. Google ScholarCross Ref
Markus Stocker, Pauli Paasonen, Markus Fiebig, Martha A Zaidan, and Alex Hardisty. 2018. Curating Scientific Information in Knowledge Infrastructures. Data Science Journal 17 (2018). Google ScholarCross Ref
Oguzhan Tas and Farzad Kiyani. 2007. A survey automatic text summarization. PressAcademia Procedia 5, 1 (2007), 205--213. Google ScholarCross Ref
Jaime Teevan, Daniel J. Liebling, and Walter S. Lasecki. 2014. Selfsourcing Personal Tasks. In CHI '14 Extended Abstracts on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI EA '14). 2527--2532. Google ScholarDigital Library

Index Terms

TinyGenius: intertwining natural language processing with microtask crowdsourcing for scholarly knowledge graph creation

Recommendations

Creating a Scholarly Knowledge Graph from Survey Article Tables
Digital Libraries at Times of Massive Societal Transition
Abstract
Due to the lack of structure, scholarly knowledge remains hardly accessible for machines. Scholarly knowledge graphs have been proposed as a solution. Creating such a knowledge graph requires manual effort and domain experts, and is therefore time-...
Read More
A Novel Curated Scholarly Graph Connecting Textual and Data Publications
In the last decade, scholarly graphs became fundamental to storing and managing scholarly knowledge in a structured and machine-readable way. Methods and tools for discovery and impact assessment of science rely on such graphs and their quality to serve ...
Read More
Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles
Towards Open and Trustworthy Digital Societies
Abstract
We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article’s ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries
June 2022
392 pages
ISBN:9781450393454
DOI:10.1145/3529372
General Chairs:
Akiko Aizawa
National Institute of Informatics, Japan
,
Thomas Mandl
University of Hildesheim, Germany
,
Zeljko Carevic
GESIS - Leibniz Institute for the Social Sciences, Germany
,
Program Chairs:
Annika Hinze
University of Waikato, New Zealand
,
Philipp Mayr
GESIS - Leibniz Institute for the Social Sciences, Germany
,
Philipp Schaer
TH Köln (University of Applied Sciences), Germany
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdsourcing microtasks
intelligent user interfaces
knowledge graph validation
scholarly knowledge graphs
Qualifiers
- short-paper
Conference

Acceptance Rates
JCDL '22 Paper Acceptance Rate35of132submissions,27%Overall Acceptance Rate415of1,482submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 98
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TinyGenius: intertwining natural language processing with microtask crowdsourcing for scholarly knowledge graph creation

JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

ABSTRACT

References

Cited By

Index Terms

Recommendations

Creating a Scholarly Knowledge Graph from Survey Article Tables

A Novel Curated Scholarly Graph Connecting Textual and Data Publications

Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles