Supervised classifiers with TF-IDF features for sentiment analysis of Marathi tweets

Patil, Rupali S.; Kolhe, Satish R.

doi:10.1007/s13278-022-00877-w

Supervised classifiers with TF-IDF features for sentiment analysis of Marathi tweets

Original Article
Published: 05 May 2022

Volume 12, article number 51, (2022)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

830 Accesses
10 Citations
Explore all metrics

Abstract

In today’s digital era, the proliferation of vernacular languages such as Hindi, Marathi, Bengali, Tamil, and Malayalam cannot be overlooked. The social media sites like Facebook and Twitter are great sources of opinionated content for these languages. The work to analyze public opinions has been concentrated on English, with very few Sentiment Analysis studies of minor or morphologically rich languages like Marathi. Moreover, it is a challenge to investigate the results of Sentiment Analysis with the local context for the under-resourced languages. This paper presents the sentiment prediction work over Twitter for the Marathi language using supervised machine learning techniques. The first-ever attempt experiments on the dataset for the Marathi political tweets. The benchmark dataset of 4248 tweets for the four major political parties of Maharashtra (India) is created. The Multinomial Naïve Bayes, Support Vector Machines with both linear and RBF kernel, Logistic Regression, and Random Forest are used to train classifiers considering the Term Frequency vs. Inverse Document Frequency (TF-IDF) as features to classify the tweets as positive or negative. The performance of the Sentiment Analysis model is evaluated using the standard measures viz., accuracy, precision, recall, and f1-score. The experimental results evidence that the Multinomial Naïve Bayes outperforms among all the classifiers with the maximum accuracy of 87.29% in the prediction of the Indian State Assembly Election 2019. The proposed model ranks first in the list of Naïve Bayes classifiers employed for the current state-of-the-art sentiment analysis of Indian text.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Article 19 November 2021

A survey of sentiment analysis in social media

Article 04 July 2018

Social media discourse and voting decisions influence: sentiment analysis in tweets during an electoral period

Article Open access 07 March 2023

Notes

References

Akhtar Md S, Kumar A, Ekbal A, Bhattacharya P (2016) A Hybrid Deep Learning Architecture for Sentiment Analysis. In: Proceedings of COLING 2016, the international conference on computational linguistics, pp 482–493. https://www.aclweb.org/anthology/C16-1047
Anjaria M, Gcuddeti RM (2014) A novel sentiment analysis of social networks using supervised learning. Soc Netw Anal Min 4(1):1. https://doi.org/10.1007/s13278-014-0181-9
Article Google Scholar
Ansari MA, Govilkar S (2018) Sentiment analysis of mixed code for the transliterated Hindi and Marathi texts. SSRN Electron J. https://doi.org/10.2139/ssrn.3429694
Article Google Scholar
Bai Q, Dan Q, Mu Z, Yang M (2019) A systematic review of emoji: current research and future perspectives. Front Psychol. https://doi.org/10.3389/fpsyg.2019.02221
Article Google Scholar
Balamurali A, Joshi A, Bhattacharyya P (2012) Cross-lingual sentiment analysis for Indian languages using Linked Wordnets. In: Proceedings of 24th international conference on computational linguistics: posters, pp 73–82
Bayes T (1763) An Essay Toward Solving a Problem in the Doctrine of Chances, vol 53. Reprinted in Facsimiles of Two Papers by Bayes, Hafner Publishing, 1963.
Bhargava R, Arora S, Sharma Y (2019) Neural network-based architecture for sentiment analysis in Indian languages. J Intell Syst 28(3):361–375. https://doi.org/10.1515/jisys-2017-0398
Article Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual workshop on computational learning theory - COLT '92. https://doi.org/10.1145/130385.130401
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/bf00058655
Article MATH Google Scholar
Chaudhari CV, Khaire AV, Rashmi R, Murtadak KS, Sirsulla, (2017) Sentiment Analysis in Marathi using Marathi WordNet. Imperial J Interdiscipl Res (IJIR) 3(4):1
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/bf00994018
Article MATH Google Scholar
Fersini E (2017) Sentiment analysis in social networks. Morgan Kaufmann, pp 91–111. ISBN 9780128044124.
García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Intell Syst Ref Lib. https://doi.org/10.1007/978-3-319-10247-4
Article Google Scholar
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. Processing 150:1
Google Scholar
Heikal M, Torki M, El-Makky N (2018) Sentiment analysis of Arabic tweets using deep learning. Procedia Comput Sci 142:114–122
Article Google Scholar
Joshi A, Balamurali A, Bhattacharyya P (2010) A fall-back strategy for sentiment analysis in Hindi: a case study. In: Proceedings of the 8th international conference on natural language processing, pp 1–6
Jurafsky D, Martin JH (2014) Speech and language processing. Pearson Prentice Hall
Kumar A, Kohail S, Ekbal A, Biemann C (2015a) IIT-TUDA: system for sentiment analysis in indian languages using lexical acquisition. Min Intell Knowl Explor. https://doi.org/10.1007/978-3-319-26832-3_65
Article Google Scholar
Kumar SS, Kumar MA, Soman KP (2017) Sentiment analysis of tweets in malayalam using long short-term memory units and convolutional neural nets. Min Intell Knowl Explor. https://doi.org/10.1007/978-3-319-71928-3_31
Article Google Scholar
Kumar SS, Premjith B, Kumar MA, Soman KP (2015b) AMRITA_CEN-NLP@SAIL2015: sentiment analysis in Indian languages using regularized least square approach with randomized feature learning. Min Intell Knowl Explor. https://doi.org/10.1007/978-3-319-26832-3_64
Article Google Scholar
Liu Y, Chen Y, Wu S, Peng G, Lv B (2015) Composite leading search index: a preprocessing method of internet search data for stock trends prediction. Ann Oper Res 234(1):77–94. https://doi.org/10.1007/s10479-014-1779-z
Article MathSciNet MATH Google Scholar
Liu B (2012) Sentiment analysis and opinion mining. Morgan & Claypool
McCallum A, Nigam KA (1998) Comparison of event models for naive bayes text classification. In: AAAI/ICML-98 workshop on learning for text categorization, pp 41–48
Mishra D, Venugopalan M, Gupta D (2016) Context specific Lexicon for Hindi reviews. Procedia Comput Sci 93:554–563. https://doi.org/10.1016/j.procs.2016.07.283
Article Google Scholar
Mosteller F, Wallace DL (1964) Inference and disputed authorship: the federalist. 1984 2nd ed: Applied Bayesian and Classical Inference. Springer
Patil RS, Kolhe SR (2021) Resource Creation for Sentiment Analysis of Under-Resourced Language: Marathi. Commun Comput Inf Sci. https://doi.org/10.1007/978-981-16-0507-9_37
Article Google Scholar
Platt J (1999) Fast training of support vector machines using sequential minimal optimization. Adv Kernel Methods Supp Vect Learn 3:1
Google Scholar
Rajan A, Salgaonkar A (2020) Sentiment analysis for Konkani Language: Konkani poetry, a case study. Adv Intell Syst Comput. https://doi.org/10.1007/978-981-15-0936-0_32
Article Google Scholar
Rani S, Kumar P (2018) A journey of Indian languages over sentiment analysis: a systematic review. Artif Intell Rev. https://doi.org/10.1007/s10462-018-9670-y
Article Google Scholar
Sachin Kumar S, Anand Kumar M, Soman KP (2018) Identifying sentiment of Malayalam tweets using deep learning. Digital Bus. https://doi.org/10.1007/978-3-319-93940-7_16
Article Google Scholar
Sahu SK, Behera P, Mohapatra DP, Balabantaray RC (2016) Sentiment analysis for Odia language using supervised classifier: an information retrieval in Indian language initiative. CSI Trans ICT 4(2–4):111–115. https://doi.org/10.1007/s40012-016-0117-9
Article Google Scholar
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Article Google Scholar
Sarkar K (2020) Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets. Sādhanā 45(1):1. https://doi.org/10.1007/s12046-020-01424-z
Article Google Scholar
Sarkar K, Chakraborty S (2015) A sentiment analysis system for Indian language tweets. Min Intell Knowl Explor. https://doi.org/10.1007/978-3-319-26832-3_66
Article Google Scholar
Sarkar K (2018) Using character N-gram features and multinomial Naïve Bayes for sentiment polarity detection in Bengali tweets. In: 2018 Fifth International conference on emerging applications of information technology (EAIT). https://doi.org/10.1109/eait.2018.8470415
Sarkar K (2019) Sentiment polarity detection in bengali tweets using LSTM recurrent neural networks. In: 2nd International conference on advanced computational and communication paradigms (ICACCP). https://doi.org/10.1109/icaccp.2019.8883010
Sarkar K, Bhowmick M (2017) Sentiment polarity detection in Bengali tweets using multinomial Naïve Bayes and support vector machines. In: 2017 IEEE Calcutta Conference (CALCON). pp. 31–36, https://doi.org/10.1109/calcon.2017.8280690
Shalini K, Ravikurnar A, Reddy A, Soman KP. (2018) Sentiment Analysis of Indian Languages using Convolutional Neural Networks. In: 2018 International Conference on Computer Communication and Informatics (ICCCI). https://doi.org/10.1109/iccci.2018.8441371
Sharma P, Moh TS (2016) Prediction of Indian election using sentiment analysis on Hindi Twitter. In: IEEE International Conference on Big Data (Big Data). https://doi.org/10.1109/bigdata.2016.7840818
Soumya S, Pramod KV (2020) Sentiment analysis of Malayalam tweets using machine learning techniques. ICT Express 6(4):300–305. https://doi.org/10.1016/j.icte.2020.04.003
Article Google Scholar
Tin KH (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition. https://doi.org/10.1109/icdar.1995.598994
Ujaley M (2018) Indian languages are storming the Internet in India, 9 out of 10 new users to be an Indian language user. Express Comput. https://www.expresscomputer.in/news/indian-languages-are-storming-the-internet-in-india-9-out-of-10-new-users-to-be-an-indian-language-user/21282/
Van den Broeck J, Argeseanu Cunningham S, Eeckels R, Herbst K (2005) Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Med 2(10):1. https://doi.org/10.1371/journal.pmed.0020267
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Sciences, Kavayitri Bahinabai Chaudhari North Maharashtra University, Jalgaon, Maharashtra, 425001, India
Rupali S. Patil & Satish R. Kolhe

Authors

Rupali S. Patil
View author publications
You can also search for this author in PubMed Google Scholar
Satish R. Kolhe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rupali S. Patil.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patil, R.S., Kolhe, S.R. Supervised classifiers with TF-IDF features for sentiment analysis of Marathi tweets. Soc. Netw. Anal. Min. 12, 51 (2022). https://doi.org/10.1007/s13278-022-00877-w

Download citation

Received: 24 April 2021
Revised: 31 March 2022
Accepted: 02 April 2022
Published: 05 May 2022
DOI: https://doi.org/10.1007/s13278-022-00877-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervised classifiers with TF-IDF features for sentiment analysis of Marathi tweets

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

A survey of sentiment analysis in social media

Social media discourse and voting decisions influence: sentiment analysis in tweets during an electoral period

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Supervised classifiers with TF-IDF features for sentiment analysis of Marathi tweets

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

A survey of sentiment analysis in social media

Social media discourse and voting decisions influence: sentiment analysis in tweets during an electoral period

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation