Intrinsic Evaluation for English–Tamil Bilingual Word Embeddings

Sanjanasri, J. P.; Menon, Vijay Krishna; Rajendran, S.; Soman, K. P.; Anand Kumar, M.

doi:10.1007/978-981-13-6095-4_3

J. P. Sanjanasri²²,
Vijay Krishna Menon²²,
S. Rajendran²²,
K. P. Soman²² &
…
M. Anand Kumar²³

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 910))

330 Accesses
2 Citations

Abstract

Despite the growth of bilingual word embeddings, there is no work done so far, for directly evaluating them for English–Tamil language pair. In this paper, we present a data resource and evaluation for the English–Tamil bilingual word vector model. In this paper, we present dataset and the evaluation paradigm for English–Tamil bilingual language pair. This dataset contains words that covers a range of concepts that occur in natural language. The dataset is scored based on the similarity rather than association or relatedness. Hence, the word pairs that are associated but not literally similar have a low rating. The measures are quantified further to ensure consistency in the dataset, mimicking the cognitive phenomena. Henceforth, the dataset can be used by non-native speakers, with minimal effort. We also present some inferences and insights into the semantics captured by word vectors and human cognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Each word is appended with an id (:id) to understand the mapping between the sentences.
2.
All the participants were informed about this study, and they have provided their consent to be part of this.

References

Akhtar, S.S., Gupta, A., Vajpayee, A., Srivastava, A., Shrivastava, M.: Word similarity datasets for indian languages: Annotation and baseline systems. In: LAW@ACL (2017)
Google Scholar
Bruni, E., Tran, N.K., Baroni, M.: Multimodal distributional semantics. J. Artif. Int. Res. 49(1), 1–47 (2014). URL http://dl.acm.org/citation.cfm?id=2655713.2655714
Article MathSciNet Google Scholar
Budanitsky, A., Hirst, G.: Semantic distance in wordnet: An experimental, application-oriented evaluation of five measures. In: IN WORKSHOP ON WORDNET AND OTHER LEXICAL RESOURCES, SECOND MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (2001)
Google Scholar
Chomsky, N.: Aspects of the Theory of Syntax. The MIT Press, Cambridge (1965). URL http://www.amazon.com/Aspects-Theory-Syntax-Noam-Chomsky/dp/0262530074
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011). URL http://dl.acm.org/citation.cfm?id=1953048.2078186
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: The concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, WWW ’01, pp. 406–414. ACM, New York, NY, USA (2001). 10.1145/371920.372094. URL http://doi.acm.org/10.1145/371920.372094
Gouws, S., Bengio, Y., Corrado, G.: Bilbowa: Fast bilingual distributed representations without word alignments. In: F. Bach, D. Blei (eds.) Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 37, pp. 748–756. PMLR, Lille, France (2015)
Google Scholar
Hill, F., Reichart, R., Korhonen, A.: Simlex-999: Evaluating semantic models with (genuine) similarity estimation. CoRR abs/1408.3456 (2014). URL http://arxiv.org/abs/1408.3456
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL ’12, pp. 873–882. Association for Computational Linguistics, Stroudsburg, PA, USA (2012). URL http://dl.acm.org/citation.cfm?id=2390524.2390645
Li, Q., Shah, S., Nourbakhsh, A., Liu, X., Fang, R.: Hashtag recommendation based on topic enhanced embedding, tweet entity data and learning to rank. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM ’16, pp. 2085–2088. ACM, New York, NY, USA (2016). 10.1145/2983323.2983915. URL http://doi.acm.org/10.1145/2983323.2983915
Liu, Y., Liu, Z., Chua, T.S., Sun, M.: Topical word embeddings. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, pp. 2418–2424. AAAI Press (2015). URL http://dl.acm.org/citation.cfm?id=2886521.2886657
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pp. 3111–3119. Curran Associates Inc., USA (2013). URL http://dl.acm.org/citation.cfm?id=2999792.2999959
Rekha, R.U., Anand Kumar, M., Dhanalakshmi, V., Soman, K.P., Rajendran, S.: A novel approach to morphological generator for tamil. In: Kannan, R., Andres, F. (eds.) Data Engineering and Management, pp. 249–251. Springer, Berlin Heidelberg, Berlin, Heidelberg (2012)
Chapter Google Scholar
Tsvetkov, Y., Faruqui, M., Ling, W., Lample, G., Dyer, C.: Evaluation of word vector representations by subspace alignment. In: EMNLP (2015)
Google Scholar
Zahran, M.A., Magooda, A., Mahgoub, A.Y., Raafat, H., Rashwan, M., Atyia, A.: Word representations in vector space and their applications for arabic. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing, pp. 430–443. Springer International Publishing, Cham (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computational Engineering & Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
J. P. Sanjanasri, Vijay Krishna Menon, S. Rajendran & K. P. Soman
Department of Information Technology, National Institute of Technology, Surathkal, Karnataka, India
M. Anand Kumar

Authors

J. P. Sanjanasri
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Krishna Menon
View author publications
You can also search for this author in PubMed Google Scholar
S. Rajendran
View author publications
You can also search for this author in PubMed Google Scholar
K. P. Soman
View author publications
You can also search for this author in PubMed Google Scholar
M. Anand Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. P. Sanjanasri .

Editor information

Editors and Affiliations

School of Computer Science and Information Technology, Indian Institute of Information Technology and Management—Kerala (IIITM-K), Trivandrum, Kerala, India
Sabu M. Thampi
School of Engineering Science, Simon Fraser University, Burnaby, BC, Canada
Ljiljana Trajkovic
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, West Bengal, India
Sushmita Mitra
Indian Institute of Information Technology, Allahabad (IIIT-A), Allahabad, Uttar Pradesh, India
P. Nagabhushan
Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
Jayanta Mukhopadhyay
Departamento de Informática y Automática, Universidad de Salamanca, Salamanca, Spain
Juan M. Corchado
Dipartimento di Ingegneria dell’Informazione (DINFO), Università degli Studi di Firenze, Florence, Italy
Stefano Berretti
Indian Institute of Space Science and Technology, Trivandrum, Kerala, India
Deepak Mishra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sanjanasri, J.P., Menon, V.K., Rajendran, S., Soman, K.P., Anand Kumar, M. (2020). Intrinsic Evaluation for English–Tamil Bilingual Word Embeddings. In: Thampi, S., et al. Intelligent Systems, Technologies and Applications. Advances in Intelligent Systems and Computing, vol 910. Springer, Singapore. https://doi.org/10.1007/978-981-13-6095-4_3

Download citation

DOI: https://doi.org/10.1007/978-981-13-6095-4_3
Published: 24 February 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6094-7
Online ISBN: 978-981-13-6095-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics