skip to main content
10.1145/3485447.3512230acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Public Access

Beyond Bot Detection: Combating Fraudulent Online Survey Takers✱

Published:25 April 2022Publication History

ABSTRACT

Different techniques have been recommended to detect fraudulent responses in online surveys, but little research has been taken to systematically test the extent to which they actually work in practice. In this paper, we conduct an empirical evaluation of 22 anti-fraud tests in two complementary online surveys. The first survey recruits Rust programmers on public online forums and social media networks. We find that fraudulent respondents involve both bot and human characteristics. Among different anti-fraud tests, those designed based on domain knowledge are the most effective. By combining individual tests, we can achieve a detection performance as good as commercial techniques while making the results more explainable. To explore these tests under a broader context, we ran a different survey on Amazon Mechanical Turk (MTurk). The results show that for a generic survey without requiring users to have any domain knowledge, it is more difficult to distinguish fraudulent responses. However, a subset of tests still remain effective.

References

  1. Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle Mazurek, and Christian Stransky. 2016. You Get Where You’re Looking for: The Impact of Information Sources on Code Security. In Proceedings of the 37th IEEE Symposium on Security and Privacy (S&P ’16). San Jose, CA, USA. https://doi.org/10.1109/SP.2016.25Google ScholarGoogle ScholarCross RefCross Ref
  2. Nasser Mohammed Al-Fannah. 2017. Making defeating captchas harder for bots. In Proceedings of the 2017 Computing Conference. London, United Kingdom. https://doi.org/10.1109/SAI.2017.8252183Google ScholarGoogle ScholarCross RefCross Ref
  3. Babak Amin Azad, Oleksii Starov, Pierre Laperdrix, and Nick Nikiforakis. 2020. Short Paper - Taming the Shape Shifter: Detecting Anti-fingerprinting Browsers. In Proceedings of the 17th Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA ’20). Lisboa, Portugal. https://doi.org/10.1007/978-3-030-52683-2_8Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Babak Amin Azad, Oleksii Starov, Pierre Laperdrix, and Nick Nikiforakis. 2020. Web Runner 2049: Evaluating Third-Party Anti-bot Services. In Proceedings of the 17th Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA ’20). Lisboa, Portugal. https://doi.org/10.1007/978-3-030-52683-2_7Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hala Assal and Sonia Chiasson. 2019. ’Think Secure from the Beginning’: A Survey with Software Developers. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Glasgow, United Kingdom. https://doi.org/10.1145/3290605.3300519Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hui Bai. 2018. Evidence that a large amount of low quality responses on MTurk can be detected with repeated GPS coordinates. https://goo.gl/19KCHG. (Accessed on 10/22/2021).Google ScholarGoogle Scholar
  7. Rebecca Balebako, Abigail Marsh, Jialiu Lin, Jason I. Hong, and Lorrie Cranor. 2018. The Privacy and Security Behaviors of Smartphone App Developers. https://doi.org/10.1184/R1/6470528.v1Google ScholarGoogle ScholarCross RefCross Ref
  8. Benjamin Birnbaum, Gaetano Borriello, Abraham D. Flaxman, Brian DeRenzi, and Anna R. Karlin. 2013. Using Behavioral Data to Identify Interviewer Fabrication in Surveys. In Proceedings of the 2013 CHI Conference on Human Factors in Computing Systems (CHI ’13). Paris, France. https://doi.org/10.1145/2470654.2481404Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael H Birnbaum. 2004. Human research and data collection via the Internet. Annual Review of Psychology 55 (2004), 803–832. https://doi.org/10.1146/annurev.psych.55.090902.141601Google ScholarGoogle ScholarCross RefCross Ref
  10. Erin M Buchanan and John E Scofield. 2018. Methods to detect low quality data and its implication for psychological research. Behavior Research Methods 50, 6 (2018), 2586–2596. https://doi.org/10.3758/s13428-018-1035-6Google ScholarGoogle ScholarCross RefCross Ref
  11. Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. 2021. UnitedQA: A Hybrid Approach for Open Domain Question Answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP ’21). Bangkok, Thailand. https://doi.org/10.18653/v1/2021.acl-long.240Google ScholarGoogle ScholarCross RefCross Ref
  12. Cisco. 2021. What Is a VPN? - Virtual Private Network - Cisco. https://www.cisco.com/c/en/us/products/security/vpn-endpoint-security-clients/what-is-vpn.html. (Accessed on 10/21/2021).Google ScholarGoogle Scholar
  13. Daniel Cubley. 2019. Microsoft Teams Calls Intermittently Dropping due to Firewall Protocol Inspection. https://www.risual.com/2019/10/microsoft-teams-calls-intermittently-dropping-due-to-firewall-protocol-inspection/. (Accessed on 10/22/2021).Google ScholarGoogle Scholar
  14. Anastasia Danilova, Alena Naiakshina, Stefan Horstmann, and Matthew Smith. 2021. Do you Really Code? Designing and Evaluating Screening Questions for Online Surveys with Programmers. In Proceedings of the 43rd International Conference on Software Engineering (ICSE ’21). Virtual Event, Spain. https://doi.org/10.1109/ICSE43902.2021.00057Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Anastasia Danilova, Alena Naiakshina, and Matthew Smith. 2020. One Size Does Not Fit All: A Grounded Theory and Online Survey Study of Developer Preferences for Security Warning Types. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Seoul, South Korea. https://doi.org/10.1145/3377811.3380387Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Marcel Das, Peter Ester, and Lars Kaczmirek. 2018. Social and behavioral research and the internet: Advances in applied methods and research strategies.Google ScholarGoogle Scholar
  17. Sean A. Dennis, Brian M. Goodson, and Christopher A. Pearson. 2019. Online Worker Fraud and Evolving Threats to the Integrity of MTurk Data: A Discussion of Virtual Private Servers and the Limitations of IP-Based Screening Procedures. Behavioral Research in Accounting(2019). https://doi.org/10.2308/bria-18-044Google ScholarGoogle ScholarCross RefCross Ref
  18. Dmitry. 2021. Pydnsbl: Async dnsbl spam lists checker based on asyncio/aiodns. https://github.com/dmippolitov/pydnsbl. (Accessed on 10/21/2021).Google ScholarGoogle Scholar
  19. Franciscus Cornelis Donders. 1969. On the speed of mental processes. Acta Psychologica 30(1969), 412–431. https://doi.org/10.1016/0001-6918(69)90065-1Google ScholarGoogle ScholarCross RefCross Ref
  20. Marc Dupuis, Emanuele Meier, and Félix Cuneo. 2019. Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices. Behavior Research Methods 51, 5 (2019), 2228–2237. https://doi.org/10.3758/s13428-018-1103-yGoogle ScholarGoogle ScholarCross RefCross Ref
  21. FingerprintJS, Inc. 2021. Fingerprintjs. https://github.com/fingerprintjs/fingerprintjs. (Accessed on 10/21/2021).Google ScholarGoogle Scholar
  22. Google. 2021. Choose your privacy settings - Computer - Google Chrome Help. https://support.google.com/chrome/answer/114836. (Accessed on 10/21/2021).Google ScholarGoogle Scholar
  23. Google. 2021. reCAPTCHA. https://www.google.com/recaptcha/about/. (Accessed on 10/21/2021).Google ScholarGoogle Scholar
  24. Daniel Goßen, Hugo Jonker, Stefan Karsch, Benjamin Krumnow, and David Roefs. 2021. HLISA: Towards a More Reliable Measurement Tool. In Proceedings of the 21st ACM Internet Measurement Conference (IMC ’21). Virtual Event, USA. https://doi.org/10.1145/3487552.3487843Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Marybec Griffin, Richard J Martino, Caleb LoSchiavo, Camilla Comer-Carruthers, Kristen D Krause, Christopher B Stults, and Perry N Halkitis. 2021. Ensuring survey research data integrity in the era of internet bots. Quality & Quantity (2021), 1–12. https://doi.org/10.1007/s11135-021-01252-1Google ScholarGoogle ScholarCross RefCross Ref
  26. Christopher M Harris, Jonathan Waddington, Valerio Biscione, and Sean Manzi. 2014. Manual choice reaction times in the rate-domain. Frontiers in Human Neuroscience 8 (2014), 418. https://doi.org/10.3389/fnhum.2014.00418Google ScholarGoogle ScholarCross RefCross Ref
  27. Sandra G Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting.Google ScholarGoogle ScholarCross RefCross Ref
  28. Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology. Vol. 52. 139–183. https://doi.org/10.1016/S0166-4115(08)62386-9Google ScholarGoogle ScholarCross RefCross Ref
  29. David J Hauser and Norbert Schwarz. 2016. Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behavior Research Methods 48, 1 (2016), 400–407. https://doi.org/10.3758/s13428-015-0578-zGoogle ScholarGoogle ScholarCross RefCross Ref
  30. Md Imran Hossen, Yazhou Tu, Md Fazle Rabby, Md Nazmul Islam, Hui Cao, and Xiali Hei. 2020. An object detection based solver for google’s image recaptcha v2. In Proceedings of the 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID ’20). Virtual Event, Spain.Google ScholarGoogle Scholar
  31. Imperium. 2020. RelevantID: Enjoy a next-generation approach to ID validation. https://www.imperium.com/relevantid/. (Accessed on 10/21/2021).Google ScholarGoogle Scholar
  32. Ipregistry. 2021. The Trusted Source for IP Address Data (geolocation and threat) - Ipregistry. https://ipregistry.co/. (Accessed on 10/21/2021).Google ScholarGoogle Scholar
  33. Hugo Jonker, Benjamin Krumnow, and Gabry Vlot. 2019. Fingerprint Surface-Based Detection of Web Bot Detectors. In Proceedings of the 24th European Symposium on Research in Computer Security (ESORICS ’19). Luxembourg.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Craig Leth-Steensen, Zmira King Elbaz, and Virginia I Douglas. 2000. Mean response times, variability, and skew in the responding of ADHD children: a response time distributional approach. Acta Psychologica 104, 2 (2000), 167–190. https://doi.org/10.1016/s0001-6918(00)00019-6Google ScholarGoogle ScholarCross RefCross Ref
  35. Xigao Li, Babak Amin Azad, Amir Rahmati, and Nick Nikiforakis. 2021. Good bot, bad bot: Characterizing automated browsing activity. In Proceedings of the 42nd IEEE Symposium on Security and Privacy (S&P ’21). Virtual Event, USA. https://doi.org/10.1109/SP40001.2021.00079Google ScholarGoogle ScholarCross RefCross Ref
  36. MaxMind. 2021. minFraud Overview | MaxMind. https://www.maxmind.com/en/solutions/minfraud-services. (Accessed on 10/21/2021).Google ScholarGoogle Scholar
  37. MTurk. 2021. Amazon Mechanical Turk. https://www.mturk.com/. (Accessed on 10/21/2021).Google ScholarGoogle Scholar
  38. Peter M Nardi. 2018. Doing survey research: A guide to quantitative methods.Google ScholarGoogle Scholar
  39. The Tor Project. 2021. Tor Project | Anonymity Online. https://www.torproject.org/. (Accessed on 10/21/2021).Google ScholarGoogle Scholar
  40. Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Wayne Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. 2021. RocketQA: An optimized training approach to dense passage retrieval for open-domain question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT ’21). Virtual Event, USA. https://doi.org/10.18653/v1/2021.naacl-main.466Google ScholarGoogle ScholarCross RefCross Ref
  41. Qualtrics. 2021. Fraud Detection. https://www.qualtrics.com/support/survey-platform/survey-module/survey-checker/fraud-detection/. (Accessed on 10/22/2021).Google ScholarGoogle Scholar
  42. John J Shaughnessy, Eugene B Zechmeister, and Jeanne S Zechmeister. 2000. Research methods in psychology.Google ScholarGoogle Scholar
  43. Melissa Simone. 2019. Bots started sabotaging my online research. I fought back - STAT. https://www.statnews.com/2019/11/21/bots-started-sabotaging-my-online-research-i-fought-back/. (Accessed on 10/21/2021).Google ScholarGoogle Scholar
  44. Suphannee Sivakorn, Iasonas Polakis, and Angelos D. Keromytis. 2016. I am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs. In Proceedings of the 1st European Symposium on Security and Privacy (EuroS&P ’16). Saarbrücken, GERMANY. https://doi.org/10.1109/EuroSP.2016.37Google ScholarGoogle ScholarCross RefCross Ref
  45. Robert J Sternberg 1999. The nature of cognition. MIT Press.Google ScholarGoogle Scholar
  46. Stefan Stieger and Ulf-Dietrich Reips. 2010. What are participants doing while filling in an online questionnaire: A paradata collection tool and an empirical study. Computers in Human Behavior 26, 6 (2010), 1488–1495.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Andie Storozuk, Marilyn Ashley, Véronic Delage, and Erin A Maloney. 2020. Got bots? Practical recommendations to protect online survey data from bot attacks. The Quantitative Methods for Psychology 16, 5 (2020), 472–481.Google ScholarGoogle ScholarCross RefCross Ref
  48. Peng Sun and Kathryn T. Stolee. 2016. Exploring Crowd Consistency in a Mechanical Turk Survey. In Proceedings of the 3rd International Workshop on CrowdSourcing in Software Engineering (CSI-SE ’16). Austin, Texas, USA. https://doi.org/10.1145/2897659.2897662Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. VirusTotal. 2021. VirusTotal - Home. https://www.virustotal.com/gui/home/upload. (Accessed on 10/21/2021).Google ScholarGoogle Scholar
  50. Luis von Ahn, Manuel Blum, and John Langford. 2004. Telling Humans and Computers Apart Automatically. Commun. ACM (2004), 56–60. https://doi.org/10.1145/966389.966390Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Luis Von Ahn, Benjamin Maurer, Colin McMillen, David Abraham, and Manuel Blum. 2008. recaptcha: Human-based character recognition via web security measures. Science 321, 5895 (2008), 1465–1468.Google ScholarGoogle Scholar
  52. W3C. 2020. WebRTC 1.0: Real-Time Communication Between Browsers. https://www.w3.org/TR/webrtc/. (Accessed on 10/22/2021).Google ScholarGoogle Scholar
  53. Dustin Wood, Peter D Harms, Graham H Lowman, and Justin A DeSimone. 2017. Response speed and response consistency as mutually validating indicators of data quality in online samples. Social Psychological and Personality Science 8, 4 (2017), 454–464.Google ScholarGoogle ScholarCross RefCross Ref
  54. Christina Yarrish, Laurie Groshon, Juliet Daisy Mitchell, Ashlyn Appelbaum, Samantha Klock, Taylor Winternitz, and Dara G Friedman-Wheeler. 2019. Finding the signal in the noise: Minimizing responses from bots and inattentive humans in online research. The Behavior Therapist 42, 7 (2019), 235–242.Google ScholarGoogle Scholar
  55. Penghui Zhang, Adam Oest, Haehyun Cho, Zhibo Sun, RC Johnson, Brad Wardman, Shaown Sarker, Alexandros Kapravelos, Tiffany Bao, Ruoyu Wang, Yan Shoshitaishvili, Adam Doupé, and Gail-Joon Ahn. 2021. CrawlPhish: Large-scale Analysis of Client-side Cloaking Techniques in Phishing. In Proceedings of the 42nd IEEE Symposium on Security and Privacy (S&P ’21). Virtual Event, USA. https://doi.org/10.1109/SP40001.2021.00021Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Beyond Bot Detection: Combating Fraudulent Online Survey Takers✱
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              WWW '22: Proceedings of the ACM Web Conference 2022
              April 2022
              3764 pages
              ISBN:9781450390965
              DOI:10.1145/3485447

              Copyright © 2022 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 25 April 2022

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited

              Acceptance Rates

              Overall Acceptance Rate1,899of8,196submissions,23%

              Upcoming Conference

              WWW '24
              The ACM Web Conference 2024
              May 13 - 17, 2024
              Singapore , Singapore

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format