Skip to main content
Log in

MwTExt: automatic extraction of multi-word terms to generate compound concepts within ontology

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Multiword expressions are omnipresent element of natural language, whose construal as a linguistic resource has significant importance in various applications. This paper presents an architecture-MwTExt, for automatic extraction of multi-word terms-MWTs from such expressions within un-annotated English documents. Natural Language Processing techniques such as Shallow parsing and syntactic structure analysis are used to extract MWTs, with specific focus on lexical patterns as (Noun Preposition Noun), (Noun Preposition Noun + Noun) and (Noun Preposition Noun Preposition Noun). The MWTs extracted can be further used to form compound concepts within Ontology. The lexical descriptions of MWTs are encoded in Web Ontology Language OWL/XML. MwTExt has been tested on Computer Science domain texts, and the results obtained are compared with those obtained by Text2Onto, an Ontology learning tool and term extractors such as TermRaider and TerMine. The result signifies that MwTExt performs better for extraction of accurate lexicalized MWTs with average precision of 97%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Abbreviations

MWTs:

Multi-word terms

MWEs:

Multi-word expressions

MwTExt:

Multi-word terms extraction

NLP:

Natural language processing

OWL:

Web ontology language

XML:

eXtensible markup language

References

Proceedings references

  1. Graliński F, Savary A, Czerepowicka M, Makowiecki F (2010) Computational lexicography of multi-word units: how efficient can it be? In: Proceedings of the workshop on multiword expressions: from theory to applications (MWE), pp 1–9

  2. Attia M, Toral A, Tounsi L, Pecina P, van Genabith j (2010) Automatic extraction of arabic multiword expressions. In: Proceedings of the workshop on multiword expressions: from theory to applications (mWE), pp 18–26

  3. Cimiano P, Völker J (2005) Text2Onto: a framework for ontology learning and data-driven change discovery. In: Proceedings of the 10th international conference on applications of natural language to information systems (NLDB), vol 3513, pp 227–238

  4. Stanković R, Krstev C, Obradović I, Lazić B, Trtovac A (2016) Rule-based automatic multi-word term extraction and lemmatization. In: Tenth international conference on language resources and evaluation

  5. Liu Y, Shi M, Li C (2016) Domain ontology concept extraction method based on text. In: IEEE ICIS

  6. Riedl M, Biemann C (2015) A single word is not enough: ranking multiword expressions using distributional semantics. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2430–2440

  7. Ramisch C (2012) A generic framework for multiword expressions treatment: from acquisition to applications. In: Proceedings of the ACL 2012 student research workshop, Jeju, Republic of Korea

  8. Drymonas EG (2009) Ontology learning from text based on multi-word term concepts: the OntoGain method. Master of Science thesis, Technical University of Crete, Greece

  9. Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG (1999) Domain specific keyphrase extraction. In: Proc. sixteenth international joint conference on artificial intelligence, Morgan Kaufmann Publishers, pp 668–673

  10. Bonin F, Dell’Orletta F, Venturi G, Montemagni S (2010) Contrastive filtering of domain-specific multi-word terms from different types of corpora. In: Proceedings of the workshop on multiword expressions: from theory to applications, pp 76–79

  11. Jiang X, Tan A-H (2005) Mining ontological knowledge from domain-specific text documents. In: Proceedings of the fifth IEEE international conference on data mining

Journal references

  1. Gruber T (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220

    Article  Google Scholar 

  2. Buitelaar P, Olejnik D, Sintek M (2004) A protégé plug-in for ontology extraction from text based on linguistic analysis. In: Davies J et al (eds) The semantic web: research and applications. ESWS 2004, LNCS 3053. Springer, Berlin

    Google Scholar 

  3. Velardi P, Faralli S, Navigli R (2013) OntoLearn reloaded: a graph-based algorithm for taxonomy induction. Assoc Comput Linguist 39(3):665–707

    Article  Google Scholar 

  4. Wong W, Liu W, Bennamoun M (2012) Ontology learning from text: a look back and into the future. ACM Comput Surv (CSUR) 44(4):20

    Article  MATH  Google Scholar 

  5. Biemann C (2005) Ontology learning from text: a survey of methods. LDV Forum 20(2):75–93

    Google Scholar 

  6. Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multi-word terms. Int J Digit Librar 3(2):117–132 (TerMine)

    Google Scholar 

  7. Meryem H, Ouatik SA, Lachkar A (2014) A novel method for arabic multi-word term extraction. Int J Database Manag Syst (IJDMS) 6(3):53–67

    Article  Google Scholar 

Other references

  1. TermRaider. https://gate.ac.uk/projects/neon/termraider.html

  2. The Standford CoreNLP. http://nlp.stanford.edu/software/corenlp.shtml

  3. Web Ontology Language—Wikipedia. https://en.wikipedia.org/wiki/Web_Ontology_Language

  4. Precision and recall—Wikipedia. https://en.wikipedia.org/wiki/Precision_and_recall

  5. Weka—Wikipedia. https://en.wikipedia.org/wiki/Weka_(machine_learning)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pratik Thanawala.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thanawala, P., Pareek, J. MwTExt: automatic extraction of multi-word terms to generate compound concepts within ontology. Int. j. inf. tecnol. 10, 303–311 (2018). https://doi.org/10.1007/s41870-018-0111-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-018-0111-6

Keywords

Navigation