ABSTRACT
Various aspects of the recent COVID-19 outbreak have been extensively discussed on online social media platforms and, in particular, on Twitter. Geotagging COVID-19-related discourse data on Twitter is essential for understanding the different discourse facets and their regional relevance, including calls for social distancing, acceptance of measures implemented to contain virus spread, anti-vaccination campaigns, and misinformation. In this paper, we aim at enriching TweetsCOV19—a large COVID-19 discourse knowledge base of more than 20 million tweets—with geographic information. For this purpose, we evaluate two state-of-the-art Geotagging algorithms: (1) DeepGeo—predicting the tweet location and (2) GeoLocation—predicting the user location. We compare pre-trained models with models trained on context-specific ground truth geolocation data extracted from TweetsCOV19. Models trained on our context-specific data achieve more than 6.7% improvement in Acc@25 compared to the pre-trained models. Further, our results show that DeepGeo outperforms GeoLocation and that longer tweets are, in general, easier to geotag. Finally, we use the two geotagging methods to study the distribution of tweets per country in TweetsCOV19 and compare the geographic coverage, i.e., the number of countries and cities each algorithm can detect.
- Spurthi Amba Hombaiah, Tao Chen, Mingyang Zhang, Michael Bendersky, and Marc Najork. 2021. Dynamic Language Models for Continuously Evolving Content. (2021), 2514–2524.Google Scholar
- CDC. 2020. How COVID-19 Spreads. https://www.cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/how-covid-spreads.htmlGoogle Scholar
- Swarup Chandra, Latifur Khan, and Fahad Bin Muhaya. 2011. Estimating Twitter User Location Using Social Interactions–A Content Based Approach. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing. 838–843. https://doi.org/10.1109/PASSAT/SocialCom.2011.120Google ScholarCross Ref
- Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You are where you tweet: a content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management. Association for Computational Linguistics, Toronto, Canada, 759–768. https://dl.acm.org/doi/10.1145/1871437.1871535Google ScholarDigital Library
- Ryan Compton, David Jurgens, and David Allen. 2014. Geotagging one hundred million Twitter accounts with total variation minimization. In 2014 IEEE International Conference on Big Data (Big Data). 393–401. https://doi.org/10.1109/BigData.2014.7004256Google ScholarCross Ref
- Clodoveu A Davis Jr, Gisele L Pappa, Diogo Rennó Rocha De Oliveira, and Filipe de L. Arcanjo. 2011. Inferring the location of twitter messages based on user relationships. Transactions in GIS 15, 6 (2011), 735–751.Google ScholarCross Ref
- Dimitar Dimitrov, Erdal baran, Pavlos Fafalios, Ran Yu, Xiaofei Zhu, Matthäus Zloch, and Stefan Dietze. 2020. TweetsCOV19 - A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, Ireland, 2991–2998. https://dl.acm.org/doi/10.1145/3340531.3412765Google ScholarDigital Library
- Pavlos Fafalios, Vasileios Iosifidis, Eirini Ntoutsi, and Stefan Dietze. 2018. TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets. In The Semantic Web - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, Proceedings(Lecture Notes in Computer Science, Vol. 10843). Springer, 177–190. https://doi.org/10.1007/978-3-319-93417-4_12Google ScholarDigital Library
- Brent Hecht, Lichan Hong, Bongwon Suh, and Ed H Chi. 2011. Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In Proceedings of the SIGCHI conference on human factors in computing systems. 237–246.Google ScholarDigital Library
- Mans Hulden, Miikka Silfverberg, and Jerid Francom. 2015. Kernel Density Estimation for Text-Based Geolocation. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (Austin, Texas). AAAI Press, 145–150.Google ScholarDigital Library
- David Jurgens. 2013. That’s what friends are for: Inferring location in online social media platforms based on social relationships. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 7.Google Scholar
- Rabindra Lamsal. 2020. Design and analysis of a large-scale COVID-19 tweets dataset. Applied Intelligence(2020).Google Scholar
- Jey Han Lau, Lianhua Chi, Khoi-Nguyen Tran, and Trevor Cohn. 2017. End-to-end Network for Twitter Geolocation Prediction and Hashing. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Asian Federation of Natural Language Processing, Taipei, Taiwan, 744–753. https://www.aclweb.org/anthology/I17-1075/Google Scholar
- Umair Qazi, Muhammad Imran, Ferda Ofli, and Filipe Arcanjo. 2020. GeoCoV19: a dataset of hundreds of millions of multilingual COVID-19 tweets with location information. SIGSPATIAL Special 12, 1 (2020).Google Scholar
- Afshin Rahimi, Duy Vu, Trevor Cohn, and Timothy Baldwin. 2015. Exploiting Text and Network Context for Geolocation of Social Media Users. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Denver, Colorado, 1362–1367. https://www.aclweb.org/anthology/N15-1153.pdfGoogle ScholarCross Ref
- Luke Sloan, Jeffrey Morgan, William Housley, Matthew Williams, Adam Edwards, Pete Burnap, and Omer Rana. 2013. Knowing the tweeters: Deriving sociologically relevant demographics from Twitter. Sociological research online 18, 3 (2013), 74–84.Google Scholar
- WHO. 2020. Coronavirus disease (COVID-19) pandemic. https://www.euro.who.int/en/health-topics/health-emergencies/coronavirus-covid-19/novel-coronavirus-2019-ncovGoogle Scholar
- Benjamin Wing and Jason Baldridge. 2014. Hierarchical discriminative classification for text-based geolocation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 336–348.Google ScholarCross Ref
Index Terms
- Geotagging TweetsCOV19: Enriching a COVID-19 Twitter Discourse Knowledge Base with Geographic Information
Recommendations
Geolocated Social Media Posts are Happier: Understanding the Characteristics of Check-in Posts on Twitter
WebSci '23: Proceedings of the 15th ACM Web Science Conference 2023The increasing prevalence of location sharing features on social media has enabled researchers to ground computational social science research using geolocated data, affording opportunities to study human mobility, the impact of real-world events, and ...
Information resonance on Twitter: watching Iran
SOMA '10: Proceedings of the First Workshop on Social Media AnalyticsTwitter has undoubtedly caught the attention of both the general public, and academia as a microblogging service worthy of study and attention. Twitter has several features that sets it apart from other social media/networking sites, including its 140 ...
Are Mutated Misinformation More Contagious? A Case Study of COVID-19 Misinformation on Twitter
WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022The spread of online misinformation has become a major global risk. Understanding how misinformation propagates on social media is vital. While prior studies suggest that the content factors, such as emotion and topic in texts, are closely related to the ...
Comments