short-paper

Geotagging TweetsCOV19: Enriching a COVID-19 Twitter Discourse Knowledge Base with Geographic Information

Authors:
Dimitar Dimitrov

GESIS - Leibniz Institute for the Social Sciences, Germany

GESIS - Leibniz Institute for the Social Sciences, Germany
View Profile

,
Dennis Segeth

Heinrich Heine University Düsseldorf, Germany

Heinrich Heine University Düsseldorf, Germany
View Profile

,
Stefan Dietze

GESIS - Leibniz Institute for the Social Sciences, Germany and Heinrich Heine University Düsseldorf, Germany

GESIS - Leibniz Institute for the Social Sciences, Germany and Heinrich Heine University Düsseldorf, Germany
View Profile

Authors Info & Claims

WWW '22: Companion Proceedings of the Web Conference 2022April 2022Pages 438–442https://doi.org/10.1145/3487553.3524623

Published:16 August 2022Publication History

WWW '22: Companion Proceedings of the Web Conference 2022

Pages 438–442

ABSTRACT

Various aspects of the recent COVID-19 outbreak have been extensively discussed on online social media platforms and, in particular, on Twitter. Geotagging COVID-19-related discourse data on Twitter is essential for understanding the different discourse facets and their regional relevance, including calls for social distancing, acceptance of measures implemented to contain virus spread, anti-vaccination campaigns, and misinformation. In this paper, we aim at enriching TweetsCOV19—a large COVID-19 discourse knowledge base of more than 20 million tweets—with geographic information. For this purpose, we evaluate two state-of-the-art Geotagging algorithms: (1) DeepGeo—predicting the tweet location and (2) GeoLocation—predicting the user location. We compare pre-trained models with models trained on context-specific ground truth geolocation data extracted from TweetsCOV19. Models trained on our context-specific data achieve more than 6.7% improvement in Acc@25 compared to the pre-trained models. Further, our results show that DeepGeo outperforms GeoLocation and that longer tweets are, in general, easier to geotag. Finally, we use the two geotagging methods to study the distribution of tweets per country in TweetsCOV19 and compare the geographic coverage, i.e., the number of countries and cities each algorithm can detect.

References

Spurthi Amba Hombaiah, Tao Chen, Mingyang Zhang, Michael Bendersky, and Marc Najork. 2021. Dynamic Language Models for Continuously Evolving Content. (2021), 2514–2524.Google Scholar
CDC. 2020. How COVID-19 Spreads. https://www.cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/how-covid-spreads.htmlGoogle Scholar
Swarup Chandra, Latifur Khan, and Fahad Bin Muhaya. 2011. Estimating Twitter User Location Using Social Interactions–A Content Based Approach. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing. 838–843. https://doi.org/10.1109/PASSAT/SocialCom.2011.120Google ScholarCross Ref
Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You are where you tweet: a content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management. Association for Computational Linguistics, Toronto, Canada, 759–768. https://dl.acm.org/doi/10.1145/1871437.1871535Google ScholarDigital Library
Ryan Compton, David Jurgens, and David Allen. 2014. Geotagging one hundred million Twitter accounts with total variation minimization. In 2014 IEEE International Conference on Big Data (Big Data). 393–401. https://doi.org/10.1109/BigData.2014.7004256Google ScholarCross Ref
Clodoveu A Davis Jr, Gisele L Pappa, Diogo Rennó Rocha De Oliveira, and Filipe de L. Arcanjo. 2011. Inferring the location of twitter messages based on user relationships. Transactions in GIS 15, 6 (2011), 735–751.Google ScholarCross Ref
Dimitar Dimitrov, Erdal baran, Pavlos Fafalios, Ran Yu, Xiaofei Zhu, Matthäus Zloch, and Stefan Dietze. 2020. TweetsCOV19 - A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, Ireland, 2991–2998. https://dl.acm.org/doi/10.1145/3340531.3412765Google ScholarDigital Library
Pavlos Fafalios, Vasileios Iosifidis, Eirini Ntoutsi, and Stefan Dietze. 2018. TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets. In The Semantic Web - 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3-7, 2018, Proceedings(Lecture Notes in Computer Science, Vol. 10843). Springer, 177–190. https://doi.org/10.1007/978-3-319-93417-4_12Google ScholarDigital Library
Brent Hecht, Lichan Hong, Bongwon Suh, and Ed H Chi. 2011. Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In Proceedings of the SIGCHI conference on human factors in computing systems. 237–246.Google ScholarDigital Library
Mans Hulden, Miikka Silfverberg, and Jerid Francom. 2015. Kernel Density Estimation for Text-Based Geolocation. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (Austin, Texas). AAAI Press, 145–150.Google ScholarDigital Library
David Jurgens. 2013. That’s what friends are for: Inferring location in online social media platforms based on social relationships. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 7.Google Scholar
Rabindra Lamsal. 2020. Design and analysis of a large-scale COVID-19 tweets dataset. Applied Intelligence(2020).Google Scholar
Jey Han Lau, Lianhua Chi, Khoi-Nguyen Tran, and Trevor Cohn. 2017. End-to-end Network for Twitter Geolocation Prediction and Hashing. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Asian Federation of Natural Language Processing, Taipei, Taiwan, 744–753. https://www.aclweb.org/anthology/I17-1075/Google Scholar
Umair Qazi, Muhammad Imran, Ferda Ofli, and Filipe Arcanjo. 2020. GeoCoV19: a dataset of hundreds of millions of multilingual COVID-19 tweets with location information. SIGSPATIAL Special 12, 1 (2020).Google Scholar
Afshin Rahimi, Duy Vu, Trevor Cohn, and Timothy Baldwin. 2015. Exploiting Text and Network Context for Geolocation of Social Media Users. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Denver, Colorado, 1362–1367. https://www.aclweb.org/anthology/N15-1153.pdfGoogle ScholarCross Ref
Luke Sloan, Jeffrey Morgan, William Housley, Matthew Williams, Adam Edwards, Pete Burnap, and Omer Rana. 2013. Knowing the tweeters: Deriving sociologically relevant demographics from Twitter. Sociological research online 18, 3 (2013), 74–84.Google Scholar
WHO. 2020. Coronavirus disease (COVID-19) pandemic. https://www.euro.who.int/en/health-topics/health-emergencies/coronavirus-covid-19/novel-coronavirus-2019-ncovGoogle Scholar
Benjamin Wing and Jason Baldridge. 2014. Hierarchical discriminative classification for text-based geolocation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 336–348.Google ScholarCross Ref

Index Terms

Geotagging TweetsCOV19: Enriching a COVID-19 Twitter Discourse Knowledge Base with Geographic Information
1. Information systems
  1. World Wide Web
    1. Web mining
      1. Data extraction and integration

Recommendations

Geolocated Social Media Posts are Happier: Understanding the Characteristics of Check-in Posts on Twitter
WebSci '23: Proceedings of the 15th ACM Web Science Conference 2023

The increasing prevalence of location sharing features on social media has enabled researchers to ground computational social science research using geolocated data, affording opportunities to study human mobility, the impact of real-world events, and ...
Read More
Information resonance on Twitter: watching Iran
SOMA '10: Proceedings of the First Workshop on Social Media Analytics

Twitter has undoubtedly caught the attention of both the general public, and academia as a microblogging service worthy of study and attention. Twitter has several features that sets it apart from other social media/networking sites, including its 140 ...
Read More
Are Mutated Misinformation More Contagious? A Case Study of COVID-19 Misinformation on Twitter
WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022

The spread of online misinformation has become a major global risk. Understanding how misinformation propagates on social media is vital. While prior studies suggest that the content factors, such as emotion and topic in texts, are closely related to the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '22: Companion Proceedings of the Web Conference 2022
April 2022
1338 pages
ISBN:9781450391306
DOI:10.1145/3487553
Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Lionel Médini
Université Lyon 1, France
,
Ivan Herman
W3C / retired
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 August 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
COVID-19
Twitter
discourse
evaluation
geotagging
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 64
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Geotagging TweetsCOV19: Enriching a COVID-19 Twitter Discourse Knowledge Base with Geographic Information

WWW '22: Companion Proceedings of the Web Conference 2022

ABSTRACT

References

Cited By

Index Terms

Recommendations

Geolocated Social Media Posts are Happier: Understanding the Characteristics of Check-in Posts on Twitter

Information resonance on Twitter: watching Iran

Are Mutated Misinformation More Contagious? A Case Study of COVID-19 Misinformation on Twitter

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Geotagging TweetsCOV19: Enriching a COVID-19 Twitter Discourse Knowledge Base with Geographic Information

WWW '22: Companion Proceedings of the Web Conference 2022

ABSTRACT

References

Cited By

Index Terms

Recommendations

Geolocated Social Media Posts are Happier: Understanding the Characteristics of Check-in Posts on Twitter

Information resonance on Twitter: watching Iran

Are Mutated Misinformation More Contagious? A Case Study of COVID-19 Misinformation on Twitter

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media