skip to main content
10.1145/3487553.3524623acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
short-paper

Geotagging TweetsCOV19: Enriching a COVID-19 Twitter Discourse Knowledge Base with Geographic Information

Authors Info & Claims
Published:16 August 2022Publication History

ABSTRACT

Various aspects of the recent COVID-19 outbreak have been extensively discussed on online social media platforms and, in particular, on Twitter. Geotagging COVID-19-related discourse data on Twitter is essential for understanding the different discourse facets and their regional relevance, including calls for social distancing, acceptance of measures implemented to contain virus spread, anti-vaccination campaigns, and misinformation. In this paper, we aim at enriching TweetsCOV19—a large COVID-19 discourse knowledge base of more than 20 million tweets—with geographic information. For this purpose, we evaluate two state-of-the-art Geotagging algorithms: (1) DeepGeo—predicting the tweet location and (2) GeoLocation—predicting the user location. We compare pre-trained models with models trained on context-specific ground truth geolocation data extracted from TweetsCOV19. Models trained on our context-specific data achieve more than 6.7% improvement in Acc@25 compared to the pre-trained models. Further, our results show that DeepGeo outperforms GeoLocation and that longer tweets are, in general, easier to geotag. Finally, we use the two geotagging methods to study the distribution of tweets per country in TweetsCOV19 and compare the geographic coverage, i.e., the number of countries and cities each algorithm can detect.

References

  1. Spurthi Amba Hombaiah, Tao Chen, Mingyang Zhang, Michael Bendersky, and Marc Najork. 2021. Dynamic Language Models for Continuously Evolving Content. (2021), 2514–2524.Google ScholarGoogle Scholar
  2. CDC. 2020. How COVID-19 Spreads. https://www.cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/how-covid-spreads.htmlGoogle ScholarGoogle Scholar
  3. Swarup Chandra, Latifur Khan, and Fahad Bin Muhaya. 2011. Estimating Twitter User Location Using Social Interactions–A Content Based Approach. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing. 838–843. https://doi.org/10.1109/PASSAT/SocialCom.2011.120Google ScholarGoogle ScholarCross RefCross Ref
  4. Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You are where you tweet: a content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management. Association for Computational Linguistics, Toronto, Canada, 759–768. https://dl.acm.org/doi/10.1145/1871437.1871535Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ryan Compton, David Jurgens, and David Allen. 2014. Geotagging one hundred million Twitter accounts with total variation minimization. In 2014 IEEE International Conference on Big Data (Big Data). 393–401. https://doi.org/10.1109/BigData.2014.7004256Google ScholarGoogle ScholarCross RefCross Ref
  6. Clodoveu A Davis Jr, Gisele L Pappa, Diogo Rennó Rocha De Oliveira, and Filipe de L. Arcanjo. 2011. Inferring the location of twitter messages based on user relationships. Transactions in GIS 15, 6 (2011), 735–751.Google ScholarGoogle ScholarCross RefCross Ref
  7. Dimitar Dimitrov, Erdal baran, Pavlos Fafalios, Ran Yu, Xiaofei Zhu, Matthäus Zloch, and Stefan Dietze. 2020. TweetsCOV19 - A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, Ireland, 2991–2998. https://dl.acm.org/doi/10.1145/3340531.3412765Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Pavlos Fafalios, Vasileios Iosifidis, Eirini Ntoutsi, and Stefan Dietze. 2018. TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets. In The Semantic Web - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, Proceedings(Lecture Notes in Computer Science, Vol. 10843). Springer, 177–190. https://doi.org/10.1007/978-3-319-93417-4_12Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brent Hecht, Lichan Hong, Bongwon Suh, and Ed H Chi. 2011. Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In Proceedings of the SIGCHI conference on human factors in computing systems. 237–246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mans Hulden, Miikka Silfverberg, and Jerid Francom. 2015. Kernel Density Estimation for Text-Based Geolocation. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (Austin, Texas). AAAI Press, 145–150.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David Jurgens. 2013. That’s what friends are for: Inferring location in online social media platforms based on social relationships. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 7.Google ScholarGoogle Scholar
  12. Rabindra Lamsal. 2020. Design and analysis of a large-scale COVID-19 tweets dataset. Applied Intelligence(2020).Google ScholarGoogle Scholar
  13. Jey Han Lau, Lianhua Chi, Khoi-Nguyen Tran, and Trevor Cohn. 2017. End-to-end Network for Twitter Geolocation Prediction and Hashing. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Asian Federation of Natural Language Processing, Taipei, Taiwan, 744–753. https://www.aclweb.org/anthology/I17-1075/Google ScholarGoogle Scholar
  14. Umair Qazi, Muhammad Imran, Ferda Ofli, and Filipe Arcanjo. 2020. GeoCoV19: a dataset of hundreds of millions of multilingual COVID-19 tweets with location information. SIGSPATIAL Special 12, 1 (2020).Google ScholarGoogle Scholar
  15. Afshin Rahimi, Duy Vu, Trevor Cohn, and Timothy Baldwin. 2015. Exploiting Text and Network Context for Geolocation of Social Media Users. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Denver, Colorado, 1362–1367. https://www.aclweb.org/anthology/N15-1153.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  16. Luke Sloan, Jeffrey Morgan, William Housley, Matthew Williams, Adam Edwards, Pete Burnap, and Omer Rana. 2013. Knowing the tweeters: Deriving sociologically relevant demographics from Twitter. Sociological research online 18, 3 (2013), 74–84.Google ScholarGoogle Scholar
  17. WHO. 2020. Coronavirus disease (COVID-19) pandemic. https://www.euro.who.int/en/health-topics/health-emergencies/coronavirus-covid-19/novel-coronavirus-2019-ncovGoogle ScholarGoogle Scholar
  18. Benjamin Wing and Jason Baldridge. 2014. Hierarchical discriminative classification for text-based geolocation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 336–348.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Geotagging TweetsCOV19: Enriching a COVID-19 Twitter Discourse Knowledge Base with Geographic Information

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WWW '22: Companion Proceedings of the Web Conference 2022
      April 2022
      1338 pages
      ISBN:9781450391306
      DOI:10.1145/3487553

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 August 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format