Abstract
We present a novel end-to-end model to jointly extract semantic relations and argument entities from sentence texts. This model does not require any handcrafted feature set or auxiliary toolkit, and hence it could be easily extended to a wide range of sequence tagging tasks. A new method of using the word morphology feature for relation extraction is studied in this paper. We combine the word morphology feature and the semantic feature to enrich the representing capacity of input vectors. Then, an input information enhanced unit is developed for the bidirectional long short-term memory network (Bi-LSTM) to overcome the information loss caused by the gate operations and the concatenation operations in the LSTM memory unit. A new tagging scheme using uncertain labels and a corresponding objective function are exploited to reduce the interference information from non-entity words. Experiments are performed on three datasets: The New York Times (NYT) and ACE2005 datasets for relation extraction and the SemEval 2010 task 8 dataset for relation classification. The results demonstrate that our model achieves a significant improvement over the state-of-the-art model for relation extraction on the NYT dataset and achieves a competitive performance on the ACE2005 dataset.
Similar content being viewed by others
Notes
The NYT dataset can be downloaded at: https://github.com/shanzhenren/CoType.
References
Zhang X, Zhao J, LeCun Y, (2015) Character-level convolutional networks for text classification. In: Proceedings of the 28th international conference on neural information processing systems - volume 1, NIPS’15, pp 649–657
Chiu J, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. Trans Assoc Comput Linguist 4:357
Cao K, Rei M (2016) A joint model for word embedding and word morphology. In: Proceedings of the 1st workshop on representation learning for NLP (Association for Computational Linguistics, 2016), pp 18–26. https://doi.org/10.18653/v1/W16-1603. http://www.aclweb.org/anthology/W16-1603
Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers) (Association for Computational Linguistics, 2016), pp 1064–1074. https://doi.org/10.18653/v1/P16-1101. http://www.aclweb.org/anthology/P16-1101
Miwa M, Bansal M (2016) Modeling joint entity and relation extraction with table representation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers) (Association for Computational Linguistics, 2016), pp 1105–1116. https://doi.org/10.18653/v1/P16-1105. http://www.aclweb.org/anthology/P16-1105
Zheng S, Wang F, Bao H, Hao Y, Zhou P, Xu B (2017) Joint extraction of entities and relations based on a novel tagging scheme. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers) (Association for Computational Linguistics, 2017), pp 1227–1236. https://doi.org/10.18653/v1/P17-1113. http://www.aclweb.org/anthology/P17-1113
Hochreiter S, Schmidhuber J (1997) Backpropagation applied to handwritten zip code recognition. Neural Comput 9(8):1735. https://doi.org/10.1162/neco.1997.9.8.1735
Graves A, Jaitly N, Mohamed AR (2014) Hybrid speech recognition with deep bidirectional LSTM. Automatic speech recognition and understanding IEEE, pp 273–278
Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: COLING 1992 volume 2: the 15th international conference on computational linguistics. http://www.aclweb.org/anthology/C92-2082
Brin S (1999) Extracting patterns and relations from the World Wide Web. In: Selected papers from the international workshop on The World Wide Web and databases, WebDB ’98. Springer, London. pp 172–183. http://dl.acm.org/citation.cfm?id=646543.696220
Agichtein E, Gravano L (2000) Snowball: extracting relations from large plain-text collections. In: Proceedings of the fifth ACM conference on digital libraries, DL ’00. ACM, New York, pp 85–94. https://doi.org/10.1145/336597.336644
Blum A, Lafferty J, Rwebangira MR, Reddy R (2004) Semi-supervised learning using randomized mincuts. In: Proceedings of the twenty-first international conference on machine learning, ICML ’04. ACM, New York, p 13. https://doi.org/10.1145/1015330.1015429
Oakes MP (2005) Using Hearst’s rules for the automatic acquisition of hyponyms for mining a pharmaceutical corpus. In: International workshop text mining research, practice and opportunities, proceedings, Borovets, Bulgaria, 24 September 2005. Held in Conjunction with Ranlp 63–67
Chen J, Ji D, Tan C.L, Niu Z (2006) Relation extraction using label propagation based semi-supervised learning. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (Association for Computational Linguistics, 2006), pp 129–136. http://www.aclweb.org/anthology/P06-1017
Bunescu R, Mooney R (2007) Learning to extract relations from the Web using minimal supervision. In: Proceedings of the 45th annual meeting of the association of computational linguistics (Association for Computational Linguistics, 2007), pp 576–583. http://www.aclweb.org/anthology/P07-1073
Bollegala DT, Matsuo Y, Ishizuka M (2010) Relational duality: unsupervised extraction of semantic relations between entities on the Web. In: Proceedings of the 19th international conference on World Wide Web, WWW ’10. ACM, New York, pp 151–160. https://doi.org/10.1145/1772690.1772707
Nakashole N, Tylenda T, Weikum G (2013) Fine-grained semantic typing of emerging entities. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 1: long papers) (Association for Computational Linguistics, 2013), pp 1488–1497. http://www.aclweb.org/anthology/P13-1146
Zelenko D, Aone C, Richardella A (2003) Dropout: a simple way to prevent neural networks from overfitting. Mach Learn Res 3:1083
Bunescu RC, Mooney RJ (2005) Subsequence kernels for relation extraction. In: Proceedings of the 18th international conference on neural information processing systems, NIPS’05. MIT Press, Cambridge, pp 171–178. http://dl.acm.org/citation.cfm?id=2976248.2976270
Qian L, Zhou G, Kong F, Zhu Q, Qian P (2008) Exploiting constituent dependencies for tree kernel-based semantic relation extraction. In: Proceedings of the 22nd international conference on computational linguistics (Coling 2008) (Coling 2008 Organizing Committee, 2008), pp 697–704. http://www.aclweb.org/anthology/C08-1088
Xu K, Feng Y, Huang S, Zhao D (2015) Semantic relation classification via convolutional neural networks with simple negative sampling. In: Proceedings of the 2015 conference on empirical methods in natural language processing (Association for Computational Linguistics, 2015), pp 536–540. https://doi.org/10.18653/v1/D15-1062. http://www.aclweb.org/anthology/D15-1062
Zhang H, Sun Y, Zhao M, Chow TWS, Wu QMJ (2019) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2019.2900159
dos Santos C, Guimarães V (2015) Boosting named entity recognition with neural character embeddings. In: Proceedings of the fifth named entity workshop (Association for Computational Linguistics, 2015), pp 25–33. https://doi.org/10.18653/v1/W15-3904. http://www.aclweb.org/anthology/W15-3904
Zhang H, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inform 13(2):616. https://doi.org/10.1109/TII.2016.2601521
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Long short-term memory. Neural Comput 1(4):541. https://doi.org/10.1162/neco.1989.1.4.541
Xu Y, Mou L, Li G, Chen Y, Peng H, Jin Z (2015) Classifying relations via long short term memory networks along shortest dependency paths. In: Proceedings of the 2015 conference on empirical methods in natural language processing (Association for Computational Linguistics, 2015), pp 1785–1794. https://doi.org/10.18653/v1/D15-1206
Xu Y, Jia R, Mou L, Li G, Chen Y, Lu Y, Jin Z (2016) Improved relation classification by deep recurrent neural networks with data augmentation. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers (The COLING 2016 Organizing Committee, 2016), pp 1461–1470. http://www.aclweb.org/anthology/C16-1138
Zhang S, Zheng D, Hu X, Yang M (2015) Bidirectional long short-term memory networks for relation classification. In: Proceedings of the 29th Pacific Asia conference on language, information and computation, pp 73–78. http://www.aclweb.org/anthology/Y15-1009
Zeng D, Liu K, Lai S, Zhou G, Zhao J (2014) Relation classification via convolutional deep neural network. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers (Dublin City University and Association for Computational Linguistics, 2014), pp 2335–2344. http://www.aclweb.org/anthology/C14-1220
Wang L, Cao Z, de Melo G, Liu Z (2016) Relation classification via multi-level attention CNNs. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers) (Association for Computational Linguistics, 2016), pp 1298–1307. https://doi.org/10.18653/v1/P16-1123. http://www.aclweb.org/anthology/P16-1123
dos Santos C, Xiang B, Zhou B (2015) Classifying relations by ranking with convolutional neural networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers) (Association for Computational Linguistics, 2015), pp 626–634. https://doi.org/10.3115/v1/P15-1061. http://www.aclweb.org/anthology/P15-1061
Vu NT, Adel H, Gupta P, Schütze H (2016) Combining recurrent and convolutional neural networks for relation classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies (Association for Computational Linguistics, 2016), pp 534–539. https://doi.org/10.18653/v1/N16-1065. http://www.aclweb.org/anthology/N16-1065
Yang B, Cardie C (2013) Joint inference for fine-grained opinion extraction. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 1: long papers) (Association for Computational Linguistics, 2013), pp 1640–1649. http://www.aclweb.org/anthology/P13-1161
Singh S, Riedel S, Martin B, Zheng J, McCallum A (2013) Joint inference of entities, relations, and conference. In: Proceedings of the 2013 workshop on automated knowledge base construction, AKBC ’13. ACM, New York, pp 1–6. https://doi.org/10.1145/2509558.2509559. http://doi.acm.org/10.1145/2509558.2509559
Miwa M, Sasaki Y (2014) Modeling joint entity and relation extraction with table representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (Association for Computational Linguistics, 2014), pp 1858–1869. https://doi.org/10.3115/v1/D14-1200. http://www.aclweb.org/anthology/D14-1200
Li Q, Ji H, Incremental Joint Extraction of Entity Mentions and Relations. in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Association for Computational Linguistics, 2014), pp. 402–412. https://doi.org/10.3115/v1/P14-1038. http://www.aclweb.org/anthology/P14-1038
Mintz M, Bills S, Snow R, Jurafsky D (2009) Distant supervision for relation extraction without labeled data. In: Proceedings of the joint conference of the 47th annual meeting of the acl and the 4th international joint conference on natural language processing of the AFNLP (Association for Computational Linguistics, 2009), pp 1003–1011. http://www.aclweb.org/anthology/P09-1113
Riedel S, Yao L, Mccallum A (2010) Modeling relations and their mentions without labeled text. In: European conference on machine learning and knowledge discovery in databases, pp 148–163
Hoffmann R, Zhang C, Ling X, Zettlemoyer L, Weld DS (2011) Knowledge-based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (Association for Computational Linguistics, 2011), pp 541–550. http://www.aclweb.org/anthology/P11-1055
Ren X, Wu Z, He W, Qu M, Voss C.R, Ji H, Abdelzaher TF, Han J (2017) CoType: joint extraction of typed entities and relations with knowledge bases. In: Proceedings of the 26th international conference on World Wide Web (International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 2017), WWW ’17, pp 1015–1024. https://doi.org/10.1145/3038912.3052708. https://doi.org/10.1145/3038912.3052708
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems - volume 2 (Curran Associates Inc., USA, 2013), NIPS’13, pp 3111–3119. http://dl.acm.org/citation.cfm?id=2999792.2999959
Peng N, Poon H, Quirk C, Toutanova K, Yih W (2017) Cross-sentence N-ary relation extraction with graph LSTMs. Trans Assoc Comput Linguist 5:101
Gormley MR, Yu M, Dredze M (2015) Improved relation extraction with feature-rich compositional embedding models. In: Proceedings of the 2015 conference on empirical methods in natural language processing (Association for Computational Linguistics, 2015), pp 1774–1784. https://doi.org/10.18653/v1/D15-1205. http://www.aclweb.org/anthology/D15-1205
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) LINE: large-scale information network embedding (WWW World Wide Web Consortium (W3C), 2015). https://www.microsoft.com/en-us/research/publication/line-large-scale-information-network-embedding/
Socher R, Huval B, Manning CD, Ng AY (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (Association for Computational Linguistics, 2012), pp 1201–1211. http://www.aclweb.org/anthology/D12-1110
Kingma D.P, Ba J.L (2015) Adam: A Method for Stochastic Optimization. international conference on learning representations
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929
Acknowledgements
The authors would like to thank Xiang Ren, Zeqiu, Wu and Wenqi He et al. for the public NYT dataset constructed by them. The authors are also grateful to Mikolov et al. for their public program training word embeddings. This research work is supported by the National Key Research and Development Program of China (Grant No. 2017YFB0803302), the National Natural Science Foundation of China (No. 61751201) and the National Key Research and Development Program of China (No. 2016QY03D0602).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lei, M., Huang, H., Feng, C. et al. An input information enhanced model for relation extraction. Neural Comput & Applic 31, 9113–9126 (2019). https://doi.org/10.1007/s00521-019-04430-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04430-3