Beyond Bot Detection: Combating Fraudulent Online Survey Takers✱

Authors:
Ziyi Zhang

University of Wisconsin-Madison, USA

University of Wisconsin-Madison, USA
View Profile

,
Shuofei Zhu

The Pennsylvania State University, USA

The Pennsylvania State University, USA
View Profile

,
Jaron Mink

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

,
Aiping Xiong

The Pennsylvania State University, USA

The Pennsylvania State University, USA
View Profile

,
Linhai Song

The Pennsylvania State University, USA

The Pennsylvania State University, USA
View Profile

,
Gang Wang

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

Authors Info & Claims

WWW '22: Proceedings of the ACM Web Conference 2022April 2022Pages 699–709https://doi.org/10.1145/3485447.3512230

Published:25 April 2022Publication History

WWW '22: Proceedings of the ACM Web Conference 2022

Pages 699–709

ABSTRACT

Different techniques have been recommended to detect fraudulent responses in online surveys, but little research has been taken to systematically test the extent to which they actually work in practice. In this paper, we conduct an empirical evaluation of 22 anti-fraud tests in two complementary online surveys. The first survey recruits Rust programmers on public online forums and social media networks. We find that fraudulent respondents involve both bot and human characteristics. Among different anti-fraud tests, those designed based on domain knowledge are the most effective. By combining individual tests, we can achieve a detection performance as good as commercial techniques while making the results more explainable. To explore these tests under a broader context, we ran a different survey on Amazon Mechanical Turk (MTurk). The results show that for a generic survey without requiring users to have any domain knowledge, it is more difficult to distinguish fraudulent responses. However, a subset of tests still remain effective.

References

Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle Mazurek, and Christian Stransky. 2016. You Get Where You’re Looking for: The Impact of Information Sources on Code Security. In Proceedings of the 37th IEEE Symposium on Security and Privacy (S&P ’16). San Jose, CA, USA. https://doi.org/10.1109/SP.2016.25Google ScholarCross Ref
Nasser Mohammed Al-Fannah. 2017. Making defeating captchas harder for bots. In Proceedings of the 2017 Computing Conference. London, United Kingdom. https://doi.org/10.1109/SAI.2017.8252183Google ScholarCross Ref
Babak Amin Azad, Oleksii Starov, Pierre Laperdrix, and Nick Nikiforakis. 2020. Short Paper - Taming the Shape Shifter: Detecting Anti-fingerprinting Browsers. In Proceedings of the 17th Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA ’20). Lisboa, Portugal. https://doi.org/10.1007/978-3-030-52683-2_8Google ScholarDigital Library
Babak Amin Azad, Oleksii Starov, Pierre Laperdrix, and Nick Nikiforakis. 2020. Web Runner 2049: Evaluating Third-Party Anti-bot Services. In Proceedings of the 17th Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA ’20). Lisboa, Portugal. https://doi.org/10.1007/978-3-030-52683-2_7Google ScholarDigital Library
Hala Assal and Sonia Chiasson. 2019. ’Think Secure from the Beginning’: A Survey with Software Developers. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Glasgow, United Kingdom. https://doi.org/10.1145/3290605.3300519Google ScholarDigital Library
Hui Bai. 2018. Evidence that a large amount of low quality responses on MTurk can be detected with repeated GPS coordinates. https://goo.gl/19KCHG. (Accessed on 10/22/2021).Google Scholar
Rebecca Balebako, Abigail Marsh, Jialiu Lin, Jason I. Hong, and Lorrie Cranor. 2018. The Privacy and Security Behaviors of Smartphone App Developers. https://doi.org/10.1184/R1/6470528.v1Google ScholarCross Ref
Benjamin Birnbaum, Gaetano Borriello, Abraham D. Flaxman, Brian DeRenzi, and Anna R. Karlin. 2013. Using Behavioral Data to Identify Interviewer Fabrication in Surveys. In Proceedings of the 2013 CHI Conference on Human Factors in Computing Systems (CHI ’13). Paris, France. https://doi.org/10.1145/2470654.2481404Google ScholarDigital Library
Michael H Birnbaum. 2004. Human research and data collection via the Internet. Annual Review of Psychology 55 (2004), 803–832. https://doi.org/10.1146/annurev.psych.55.090902.141601Google ScholarCross Ref
Erin M Buchanan and John E Scofield. 2018. Methods to detect low quality data and its implication for psychological research. Behavior Research Methods 50, 6 (2018), 2586–2596. https://doi.org/10.3758/s13428-018-1035-6Google ScholarCross Ref
Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. 2021. UnitedQA: A Hybrid Approach for Open Domain Question Answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP ’21). Bangkok, Thailand. https://doi.org/10.18653/v1/2021.acl-long.240Google ScholarCross Ref
Cisco. 2021. What Is a VPN? - Virtual Private Network - Cisco. https://www.cisco.com/c/en/us/products/security/vpn-endpoint-security-clients/what-is-vpn.html. (Accessed on 10/21/2021).Google Scholar
Daniel Cubley. 2019. Microsoft Teams Calls Intermittently Dropping due to Firewall Protocol Inspection. https://www.risual.com/2019/10/microsoft-teams-calls-intermittently-dropping-due-to-firewall-protocol-inspection/. (Accessed on 10/22/2021).Google Scholar
Anastasia Danilova, Alena Naiakshina, Stefan Horstmann, and Matthew Smith. 2021. Do you Really Code? Designing and Evaluating Screening Questions for Online Surveys with Programmers. In Proceedings of the 43rd International Conference on Software Engineering (ICSE ’21). Virtual Event, Spain. https://doi.org/10.1109/ICSE43902.2021.00057Google ScholarDigital Library
Anastasia Danilova, Alena Naiakshina, and Matthew Smith. 2020. One Size Does Not Fit All: A Grounded Theory and Online Survey Study of Developer Preferences for Security Warning Types. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Seoul, South Korea. https://doi.org/10.1145/3377811.3380387Google ScholarDigital Library
Marcel Das, Peter Ester, and Lars Kaczmirek. 2018. Social and behavioral research and the internet: Advances in applied methods and research strategies.Google Scholar
Sean A. Dennis, Brian M. Goodson, and Christopher A. Pearson. 2019. Online Worker Fraud and Evolving Threats to the Integrity of MTurk Data: A Discussion of Virtual Private Servers and the Limitations of IP-Based Screening Procedures. Behavioral Research in Accounting(2019). https://doi.org/10.2308/bria-18-044Google ScholarCross Ref
Dmitry. 2021. Pydnsbl: Async dnsbl spam lists checker based on asyncio/aiodns. https://github.com/dmippolitov/pydnsbl. (Accessed on 10/21/2021).Google Scholar
Franciscus Cornelis Donders. 1969. On the speed of mental processes. Acta Psychologica 30(1969), 412–431. https://doi.org/10.1016/0001-6918(69)90065-1Google ScholarCross Ref
Marc Dupuis, Emanuele Meier, and Félix Cuneo. 2019. Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices. Behavior Research Methods 51, 5 (2019), 2228–2237. https://doi.org/10.3758/s13428-018-1103-yGoogle ScholarCross Ref
FingerprintJS, Inc. 2021. Fingerprintjs. https://github.com/fingerprintjs/fingerprintjs. (Accessed on 10/21/2021).Google Scholar
Google. 2021. Choose your privacy settings - Computer - Google Chrome Help. https://support.google.com/chrome/answer/114836. (Accessed on 10/21/2021).Google Scholar
Google. 2021. reCAPTCHA. https://www.google.com/recaptcha/about/. (Accessed on 10/21/2021).Google Scholar
Daniel Goßen, Hugo Jonker, Stefan Karsch, Benjamin Krumnow, and David Roefs. 2021. HLISA: Towards a More Reliable Measurement Tool. In Proceedings of the 21st ACM Internet Measurement Conference (IMC ’21). Virtual Event, USA. https://doi.org/10.1145/3487552.3487843Google ScholarDigital Library
Marybec Griffin, Richard J Martino, Caleb LoSchiavo, Camilla Comer-Carruthers, Kristen D Krause, Christopher B Stults, and Perry N Halkitis. 2021. Ensuring survey research data integrity in the era of internet bots. Quality & Quantity (2021), 1–12. https://doi.org/10.1007/s11135-021-01252-1Google ScholarCross Ref
Christopher M Harris, Jonathan Waddington, Valerio Biscione, and Sean Manzi. 2014. Manual choice reaction times in the rate-domain. Frontiers in Human Neuroscience 8 (2014), 418. https://doi.org/10.3389/fnhum.2014.00418Google ScholarCross Ref
Sandra G Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting.Google ScholarCross Ref
Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology. Vol. 52. 139–183. https://doi.org/10.1016/S0166-4115(08)62386-9Google ScholarCross Ref
David J Hauser and Norbert Schwarz. 2016. Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behavior Research Methods 48, 1 (2016), 400–407. https://doi.org/10.3758/s13428-015-0578-zGoogle ScholarCross Ref
Md Imran Hossen, Yazhou Tu, Md Fazle Rabby, Md Nazmul Islam, Hui Cao, and Xiali Hei. 2020. An object detection based solver for google’s image recaptcha v2. In Proceedings of the 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID ’20). Virtual Event, Spain.Google Scholar
Imperium. 2020. RelevantID: Enjoy a next-generation approach to ID validation. https://www.imperium.com/relevantid/. (Accessed on 10/21/2021).Google Scholar
Ipregistry. 2021. The Trusted Source for IP Address Data (geolocation and threat) - Ipregistry. https://ipregistry.co/. (Accessed on 10/21/2021).Google Scholar
Hugo Jonker, Benjamin Krumnow, and Gabry Vlot. 2019. Fingerprint Surface-Based Detection of Web Bot Detectors. In Proceedings of the 24th European Symposium on Research in Computer Security (ESORICS ’19). Luxembourg.Google ScholarDigital Library
Craig Leth-Steensen, Zmira King Elbaz, and Virginia I Douglas. 2000. Mean response times, variability, and skew in the responding of ADHD children: a response time distributional approach. Acta Psychologica 104, 2 (2000), 167–190. https://doi.org/10.1016/s0001-6918(00)00019-6Google ScholarCross Ref
Xigao Li, Babak Amin Azad, Amir Rahmati, and Nick Nikiforakis. 2021. Good bot, bad bot: Characterizing automated browsing activity. In Proceedings of the 42nd IEEE Symposium on Security and Privacy (S&P ’21). Virtual Event, USA. https://doi.org/10.1109/SP40001.2021.00079Google ScholarCross Ref
MaxMind. 2021. minFraud Overview | MaxMind. https://www.maxmind.com/en/solutions/minfraud-services. (Accessed on 10/21/2021).Google Scholar
MTurk. 2021. Amazon Mechanical Turk. https://www.mturk.com/. (Accessed on 10/21/2021).Google Scholar
Peter M Nardi. 2018. Doing survey research: A guide to quantitative methods.Google Scholar
The Tor Project. 2021. Tor Project | Anonymity Online. https://www.torproject.org/. (Accessed on 10/21/2021).Google Scholar
Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Wayne Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. 2021. RocketQA: An optimized training approach to dense passage retrieval for open-domain question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT ’21). Virtual Event, USA. https://doi.org/10.18653/v1/2021.naacl-main.466Google ScholarCross Ref
Qualtrics. 2021. Fraud Detection. https://www.qualtrics.com/support/survey-platform/survey-module/survey-checker/fraud-detection/. (Accessed on 10/22/2021).Google Scholar
John J Shaughnessy, Eugene B Zechmeister, and Jeanne S Zechmeister. 2000. Research methods in psychology.Google Scholar
Melissa Simone. 2019. Bots started sabotaging my online research. I fought back - STAT. https://www.statnews.com/2019/11/21/bots-started-sabotaging-my-online-research-i-fought-back/. (Accessed on 10/21/2021).Google Scholar
Suphannee Sivakorn, Iasonas Polakis, and Angelos D. Keromytis. 2016. I am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs. In Proceedings of the 1st European Symposium on Security and Privacy (EuroS&P ’16). Saarbrücken, GERMANY. https://doi.org/10.1109/EuroSP.2016.37Google ScholarCross Ref
Robert J Sternberg 1999. The nature of cognition. MIT Press.Google Scholar
Stefan Stieger and Ulf-Dietrich Reips. 2010. What are participants doing while filling in an online questionnaire: A paradata collection tool and an empirical study. Computers in Human Behavior 26, 6 (2010), 1488–1495.Google ScholarDigital Library
Andie Storozuk, Marilyn Ashley, Véronic Delage, and Erin A Maloney. 2020. Got bots? Practical recommendations to protect online survey data from bot attacks. The Quantitative Methods for Psychology 16, 5 (2020), 472–481.Google ScholarCross Ref
Peng Sun and Kathryn T. Stolee. 2016. Exploring Crowd Consistency in a Mechanical Turk Survey. In Proceedings of the 3rd International Workshop on CrowdSourcing in Software Engineering (CSI-SE ’16). Austin, Texas, USA. https://doi.org/10.1145/2897659.2897662Google ScholarDigital Library
VirusTotal. 2021. VirusTotal - Home. https://www.virustotal.com/gui/home/upload. (Accessed on 10/21/2021).Google Scholar
Luis von Ahn, Manuel Blum, and John Langford. 2004. Telling Humans and Computers Apart Automatically. Commun. ACM (2004), 56–60. https://doi.org/10.1145/966389.966390Google ScholarDigital Library
Luis Von Ahn, Benjamin Maurer, Colin McMillen, David Abraham, and Manuel Blum. 2008. recaptcha: Human-based character recognition via web security measures. Science 321, 5895 (2008), 1465–1468.Google Scholar
W3C. 2020. WebRTC 1.0: Real-Time Communication Between Browsers. https://www.w3.org/TR/webrtc/. (Accessed on 10/22/2021).Google Scholar
Dustin Wood, Peter D Harms, Graham H Lowman, and Justin A DeSimone. 2017. Response speed and response consistency as mutually validating indicators of data quality in online samples. Social Psychological and Personality Science 8, 4 (2017), 454–464.Google ScholarCross Ref
Christina Yarrish, Laurie Groshon, Juliet Daisy Mitchell, Ashlyn Appelbaum, Samantha Klock, Taylor Winternitz, and Dara G Friedman-Wheeler. 2019. Finding the signal in the noise: Minimizing responses from bots and inattentive humans in online research. The Behavior Therapist 42, 7 (2019), 235–242.Google Scholar
Penghui Zhang, Adam Oest, Haehyun Cho, Zhibo Sun, RC Johnson, Brad Wardman, Shaown Sarker, Alexandros Kapravelos, Tiffany Bao, Ruoyu Wang, Yan Shoshitaishvili, Adam Doupé, and Gail-Joon Ahn. 2021. CrawlPhish: Large-scale Analysis of Client-side Cloaking Techniques in Phishing. In Proceedings of the 42nd IEEE Symposium on Security and Privacy (S&P ’21). Virtual Event, USA. https://doi.org/10.1109/SP40001.2021.00021Google ScholarCross Ref

Index Terms

Beyond Bot Detection: Combating Fraudulent Online Survey Takers✱

Index terms have been assigned to the content through auto-classification.

Recommendations

Detection of fraudulent and malicious websites by analysing user reviews for online shopping websites

Recently, the web has become a crucial worldwide platform for online shopping. People go online to sell and buy products, use online banking facilities and even give opinions about their online shopping experience. People with malicious intent may be ...
Read More
An Optimized SVM Model for Detection of Fraudulent Online Credit Card Transactions
ICMECG '12: Proceedings of the 2012 International Conference on Management of e-Commerce and e-Government

In order to identify the credit card fraudulent transactions, in this paper we propose an optimized SVM model for detection of fraudulent online credit card model. The model use non-liner SVM and RBF for the sparse transaction data, and use grid ...
Read More
Research on Credit Card Fraud Detection Model Based on Distance Sum
JCAI '09: Proceedings of the 2009 International Joint Conference on Artificial Intelligence

Along with increasing credit cards and growing trade volume in China, credit card fraud rises sharply. How to enhance the detection and prevention of credit card fraud becomes the focus of risk control of banks. This paper proposes a credit card fraud ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '22: Proceedings of the ACM Web Conference 2022
April 2022
3764 pages
ISBN:9781450390965
DOI:10.1145/3485447
Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Elena Simperl
King’s College London, UK
,
Deepak Agarwal
Pinterest, USA
,
Aristides Gionis
KTH Royal Institute of Technology, Sweden
,
Ivan Herman
W3C / retired
,
Lionel Médini
Université Lyon 1, France
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 April 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Fraud Detection
Online Survey
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 1,239
  Total Downloads
- Downloads (Last 12 months)834
- Downloads (Last 6 weeks)91
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Beyond Bot Detection: Combating Fraudulent Online Survey Takers✱

WWW '22: Proceedings of the ACM Web Conference 2022

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detection of fraudulent and malicious websites by analysing user reviews for online shopping websites

An Optimized SVM Model for Detection of Fraudulent Online Credit Card Transactions

Research on Credit Card Fraud Detection Model Based on Distance Sum