Abstract
Multiword expressions are omnipresent element of natural language, whose construal as a linguistic resource has significant importance in various applications. This paper presents an architecture-MwTExt, for automatic extraction of multi-word terms-MWTs from such expressions within un-annotated English documents. Natural Language Processing techniques such as Shallow parsing and syntactic structure analysis are used to extract MWTs, with specific focus on lexical patterns as (Noun Preposition Noun), (Noun Preposition Noun + Noun) and (Noun Preposition Noun Preposition Noun). The MWTs extracted can be further used to form compound concepts within Ontology. The lexical descriptions of MWTs are encoded in Web Ontology Language OWL/XML. MwTExt has been tested on Computer Science domain texts, and the results obtained are compared with those obtained by Text2Onto, an Ontology learning tool and term extractors such as TermRaider and TerMine. The result signifies that MwTExt performs better for extraction of accurate lexicalized MWTs with average precision of 97%.
Similar content being viewed by others
Abbreviations
- MWTs:
-
Multi-word terms
- MWEs:
-
Multi-word expressions
- MwTExt:
-
Multi-word terms extraction
- NLP:
-
Natural language processing
- OWL:
-
Web ontology language
- XML:
-
eXtensible markup language
References
Proceedings references
Graliński F, Savary A, Czerepowicka M, Makowiecki F (2010) Computational lexicography of multi-word units: how efficient can it be? In: Proceedings of the workshop on multiword expressions: from theory to applications (MWE), pp 1–9
Attia M, Toral A, Tounsi L, Pecina P, van Genabith j (2010) Automatic extraction of arabic multiword expressions. In: Proceedings of the workshop on multiword expressions: from theory to applications (mWE), pp 18–26
Cimiano P, Völker J (2005) Text2Onto: a framework for ontology learning and data-driven change discovery. In: Proceedings of the 10th international conference on applications of natural language to information systems (NLDB), vol 3513, pp 227–238
Stanković R, Krstev C, Obradović I, Lazić B, Trtovac A (2016) Rule-based automatic multi-word term extraction and lemmatization. In: Tenth international conference on language resources and evaluation
Liu Y, Shi M, Li C (2016) Domain ontology concept extraction method based on text. In: IEEE ICIS
Riedl M, Biemann C (2015) A single word is not enough: ranking multiword expressions using distributional semantics. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2430–2440
Ramisch C (2012) A generic framework for multiword expressions treatment: from acquisition to applications. In: Proceedings of the ACL 2012 student research workshop, Jeju, Republic of Korea
Drymonas EG (2009) Ontology learning from text based on multi-word term concepts: the OntoGain method. Master of Science thesis, Technical University of Crete, Greece
Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG (1999) Domain specific keyphrase extraction. In: Proc. sixteenth international joint conference on artificial intelligence, Morgan Kaufmann Publishers, pp 668–673
Bonin F, Dell’Orletta F, Venturi G, Montemagni S (2010) Contrastive filtering of domain-specific multi-word terms from different types of corpora. In: Proceedings of the workshop on multiword expressions: from theory to applications, pp 76–79
Jiang X, Tan A-H (2005) Mining ontological knowledge from domain-specific text documents. In: Proceedings of the fifth IEEE international conference on data mining
Journal references
Gruber T (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220
Buitelaar P, Olejnik D, Sintek M (2004) A protégé plug-in for ontology extraction from text based on linguistic analysis. In: Davies J et al (eds) The semantic web: research and applications. ESWS 2004, LNCS 3053. Springer, Berlin
Velardi P, Faralli S, Navigli R (2013) OntoLearn reloaded: a graph-based algorithm for taxonomy induction. Assoc Comput Linguist 39(3):665–707
Wong W, Liu W, Bennamoun M (2012) Ontology learning from text: a look back and into the future. ACM Comput Surv (CSUR) 44(4):20
Biemann C (2005) Ontology learning from text: a survey of methods. LDV Forum 20(2):75–93
Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multi-word terms. Int J Digit Librar 3(2):117–132 (TerMine)
Meryem H, Ouatik SA, Lachkar A (2014) A novel method for arabic multi-word term extraction. Int J Database Manag Syst (IJDMS) 6(3):53–67
Other references
TermRaider. https://gate.ac.uk/projects/neon/termraider.html
The Standford CoreNLP. http://nlp.stanford.edu/software/corenlp.shtml
Web Ontology Language—Wikipedia. https://en.wikipedia.org/wiki/Web_Ontology_Language
Precision and recall—Wikipedia. https://en.wikipedia.org/wiki/Precision_and_recall
Weka—Wikipedia. https://en.wikipedia.org/wiki/Weka_(machine_learning)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Thanawala, P., Pareek, J. MwTExt: automatic extraction of multi-word terms to generate compound concepts within ontology. Int. j. inf. tecnol. 10, 303–311 (2018). https://doi.org/10.1007/s41870-018-0111-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-018-0111-6