Abstract
In today’s digital era, the proliferation of vernacular languages such as Hindi, Marathi, Bengali, Tamil, and Malayalam cannot be overlooked. The social media sites like Facebook and Twitter are great sources of opinionated content for these languages. The work to analyze public opinions has been concentrated on English, with very few Sentiment Analysis studies of minor or morphologically rich languages like Marathi. Moreover, it is a challenge to investigate the results of Sentiment Analysis with the local context for the under-resourced languages. This paper presents the sentiment prediction work over Twitter for the Marathi language using supervised machine learning techniques. The first-ever attempt experiments on the dataset for the Marathi political tweets. The benchmark dataset of 4248 tweets for the four major political parties of Maharashtra (India) is created. The Multinomial Naïve Bayes, Support Vector Machines with both linear and RBF kernel, Logistic Regression, and Random Forest are used to train classifiers considering the Term Frequency vs. Inverse Document Frequency (TF-IDF) as features to classify the tweets as positive or negative. The performance of the Sentiment Analysis model is evaluated using the standard measures viz., accuracy, precision, recall, and f1-score. The experimental results evidence that the Multinomial Naïve Bayes outperforms among all the classifiers with the maximum accuracy of 87.29% in the prediction of the Indian State Assembly Election 2019. The proposed model ranks first in the list of Naïve Bayes classifiers employed for the current state-of-the-art sentiment analysis of Indian text.
Similar content being viewed by others
Notes
Census of India, Government of India, 2001.
Twitter Archiver Tool. https://gsuite.google.com/marketplace/app/tweet_archiver/976886281542.
References
Akhtar Md S, Kumar A, Ekbal A, Bhattacharya P (2016) A Hybrid Deep Learning Architecture for Sentiment Analysis. In: Proceedings of COLING 2016, the international conference on computational linguistics, pp 482–493. https://www.aclweb.org/anthology/C16-1047
Anjaria M, Gcuddeti RM (2014) A novel sentiment analysis of social networks using supervised learning. Soc Netw Anal Min 4(1):1. https://doi.org/10.1007/s13278-014-0181-9
Ansari MA, Govilkar S (2018) Sentiment analysis of mixed code for the transliterated Hindi and Marathi texts. SSRN Electron J. https://doi.org/10.2139/ssrn.3429694
Bai Q, Dan Q, Mu Z, Yang M (2019) A systematic review of emoji: current research and future perspectives. Front Psychol. https://doi.org/10.3389/fpsyg.2019.02221
Balamurali A, Joshi A, Bhattacharyya P (2012) Cross-lingual sentiment analysis for Indian languages using Linked Wordnets. In: Proceedings of 24th international conference on computational linguistics: posters, pp 73–82
Bayes T (1763) An Essay Toward Solving a Problem in the Doctrine of Chances, vol 53. Reprinted in Facsimiles of Two Papers by Bayes, Hafner Publishing, 1963.
Bhargava R, Arora S, Sharma Y (2019) Neural network-based architecture for sentiment analysis in Indian languages. J Intell Syst 28(3):361–375. https://doi.org/10.1515/jisys-2017-0398
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual workshop on computational learning theory - COLT '92. https://doi.org/10.1145/130385.130401
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/bf00058655
Chaudhari CV, Khaire AV, Rashmi R, Murtadak KS, Sirsulla, (2017) Sentiment Analysis in Marathi using Marathi WordNet. Imperial J Interdiscipl Res (IJIR) 3(4):1
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/bf00994018
Fersini E (2017) Sentiment analysis in social networks. Morgan Kaufmann, pp 91–111. ISBN 9780128044124.
García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Intell Syst Ref Lib. https://doi.org/10.1007/978-3-319-10247-4
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. Processing 150:1
Heikal M, Torki M, El-Makky N (2018) Sentiment analysis of Arabic tweets using deep learning. Procedia Comput Sci 142:114–122
Joshi A, Balamurali A, Bhattacharyya P (2010) A fall-back strategy for sentiment analysis in Hindi: a case study. In: Proceedings of the 8th international conference on natural language processing, pp 1–6
Jurafsky D, Martin JH (2014) Speech and language processing. Pearson Prentice Hall
Kumar A, Kohail S, Ekbal A, Biemann C (2015a) IIT-TUDA: system for sentiment analysis in indian languages using lexical acquisition. Min Intell Knowl Explor. https://doi.org/10.1007/978-3-319-26832-3_65
Kumar SS, Kumar MA, Soman KP (2017) Sentiment analysis of tweets in malayalam using long short-term memory units and convolutional neural nets. Min Intell Knowl Explor. https://doi.org/10.1007/978-3-319-71928-3_31
Kumar SS, Premjith B, Kumar MA, Soman KP (2015b) AMRITA_CEN-NLP@SAIL2015: sentiment analysis in Indian languages using regularized least square approach with randomized feature learning. Min Intell Knowl Explor. https://doi.org/10.1007/978-3-319-26832-3_64
Liu Y, Chen Y, Wu S, Peng G, Lv B (2015) Composite leading search index: a preprocessing method of internet search data for stock trends prediction. Ann Oper Res 234(1):77–94. https://doi.org/10.1007/s10479-014-1779-z
Liu B (2012) Sentiment analysis and opinion mining. Morgan & Claypool
McCallum A, Nigam KA (1998) Comparison of event models for naive bayes text classification. In: AAAI/ICML-98 workshop on learning for text categorization, pp 41–48
Mishra D, Venugopalan M, Gupta D (2016) Context specific Lexicon for Hindi reviews. Procedia Comput Sci 93:554–563. https://doi.org/10.1016/j.procs.2016.07.283
Mosteller F, Wallace DL (1964) Inference and disputed authorship: the federalist. 1984 2nd ed: Applied Bayesian and Classical Inference. Springer
Patil RS, Kolhe SR (2021) Resource Creation for Sentiment Analysis of Under-Resourced Language: Marathi. Commun Comput Inf Sci. https://doi.org/10.1007/978-981-16-0507-9_37
Platt J (1999) Fast training of support vector machines using sequential minimal optimization. Adv Kernel Methods Supp Vect Learn 3:1
Rajan A, Salgaonkar A (2020) Sentiment analysis for Konkani Language: Konkani poetry, a case study. Adv Intell Syst Comput. https://doi.org/10.1007/978-981-15-0936-0_32
Rani S, Kumar P (2018) A journey of Indian languages over sentiment analysis: a systematic review. Artif Intell Rev. https://doi.org/10.1007/s10462-018-9670-y
Sachin Kumar S, Anand Kumar M, Soman KP (2018) Identifying sentiment of Malayalam tweets using deep learning. Digital Bus. https://doi.org/10.1007/978-3-319-93940-7_16
Sahu SK, Behera P, Mohapatra DP, Balabantaray RC (2016) Sentiment analysis for Odia language using supervised classifier: an information retrieval in Indian language initiative. CSI Trans ICT 4(2–4):111–115. https://doi.org/10.1007/s40012-016-0117-9
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Sarkar K (2020) Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets. Sādhanā 45(1):1. https://doi.org/10.1007/s12046-020-01424-z
Sarkar K, Chakraborty S (2015) A sentiment analysis system for Indian language tweets. Min Intell Knowl Explor. https://doi.org/10.1007/978-3-319-26832-3_66
Sarkar K (2018) Using character N-gram features and multinomial Naïve Bayes for sentiment polarity detection in Bengali tweets. In: 2018 Fifth International conference on emerging applications of information technology (EAIT). https://doi.org/10.1109/eait.2018.8470415
Sarkar K (2019) Sentiment polarity detection in bengali tweets using LSTM recurrent neural networks. In: 2nd International conference on advanced computational and communication paradigms (ICACCP). https://doi.org/10.1109/icaccp.2019.8883010
Sarkar K, Bhowmick M (2017) Sentiment polarity detection in Bengali tweets using multinomial Naïve Bayes and support vector machines. In: 2017 IEEE Calcutta Conference (CALCON). pp. 31–36, https://doi.org/10.1109/calcon.2017.8280690
Shalini K, Ravikurnar A, Reddy A, Soman KP. (2018) Sentiment Analysis of Indian Languages using Convolutional Neural Networks. In: 2018 International Conference on Computer Communication and Informatics (ICCCI). https://doi.org/10.1109/iccci.2018.8441371
Sharma P, Moh TS (2016) Prediction of Indian election using sentiment analysis on Hindi Twitter. In: IEEE International Conference on Big Data (Big Data). https://doi.org/10.1109/bigdata.2016.7840818
Soumya S, Pramod KV (2020) Sentiment analysis of Malayalam tweets using machine learning techniques. ICT Express 6(4):300–305. https://doi.org/10.1016/j.icte.2020.04.003
Tin KH (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition. https://doi.org/10.1109/icdar.1995.598994
Ujaley M (2018) Indian languages are storming the Internet in India, 9 out of 10 new users to be an Indian language user. Express Comput. https://www.expresscomputer.in/news/indian-languages-are-storming-the-internet-in-india-9-out-of-10-new-users-to-be-an-indian-language-user/21282/
Van den Broeck J, Argeseanu Cunningham S, Eeckels R, Herbst K (2005) Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Med 2(10):1. https://doi.org/10.1371/journal.pmed.0020267
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Patil, R.S., Kolhe, S.R. Supervised classifiers with TF-IDF features for sentiment analysis of Marathi tweets. Soc. Netw. Anal. Min. 12, 51 (2022). https://doi.org/10.1007/s13278-022-00877-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-022-00877-w