skip to main content
10.1145/3203217.3203265acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
short-paper

Online anomaly detection framework for spark systems via stage-task behavior modeling

Authors Info & Claims
Published:08 May 2018Publication History

ABSTRACT

With rapid growth of Big Data, Apache Spark has been in widespread use. However, with the system scale growing, application delays caused by abnormal tasks/nodes become a common problem in Spark systems. In this paper, we propose an anomaly detection approach based on stage-task behaviors modeling. First, we assume that the abnormal behavior of tasks can reflect the node's abnormal situation. Then, from the collected Spark runtime logs, we extract the four-dimension feature vector that related to the tasks execution status, and then classify the task behaviors as normal and abnormal, which is used to discover the abnormal nodes from the distribution of abnormal tasks. Simultaneously, we build the online framework on Spark Streaming and it could integrate the offline learning methodologies, such as the logical regression method, which is a very simple and powerful classifier for the low-dimensional eigenvectors. Additionally, our experiments show that the accuracy of realtime anomaly detection reaches about 91%, and the given cases show that our framework is really effective for detecting abnormal nodes.

References

  1. 2017. Stress tool. http://weather.ou.edu/~apw/projects/stress/. (2017).Google ScholarGoogle Scholar
  2. 2017. Z-test. https://en.wikipedia.org/wiki/Z-test. (2017).Google ScholarGoogle Scholar
  3. 2018. P-value. https://en.wikipedia.org/wiki/P-value. (2018).Google ScholarGoogle Scholar
  4. Elastic. 2017. Logstash. https://www.elastic.co/products/logstash. (2017).Google ScholarGoogle Scholar
  5. P. Harrington. 2012. Machine learning in action{M}. Greenwich, CT: Manning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Kreps, N. Narkhede, and J. Rao. 2011. Kafka: A distributed messaging system for log processing. In 6th International Workshop on Networking Meets Databases (NetDB).Google ScholarGoogle Scholar
  7. Siyang Lu, BingBing Rao, Xiang Wei, Byungchul Tak, Long Wang, and Liqiang Wang. 2017. Log-based Abnormal Task Detection and Root Cause Analysis for Spark. In IEEE International Conference on Web Services (ICWS).Google ScholarGoogle Scholar
  8. L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang, Y. He, W. Gao, Z. Jia, Y. Shi, S. Zhang, C. Zheng, G. Lu, K. Zhan, and B. Qiu. 2014. BigDataBench: A Big Data Benchmark Suite from Internet Services. In Proceedings of the 20th IEEE International Symposium On High Performance Computer Architecture.Google ScholarGoogle Scholar
  9. Wikipedia. 2018. Logistic regression. https://en.wikipedia.org/wiki/Logistic_regression. (2018).Google ScholarGoogle Scholar
  10. wikipedia. 2018. Precision and recall. https://en.wikipedia.org/wiki/Precision_and_recall. (2018).Google ScholarGoogle Scholar
  11. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, and I. Stoica. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In Conference on Networked Systems Design and Implementationv. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    CF '18: Proceedings of the 15th ACM International Conference on Computing Frontiers
    May 2018
    401 pages
    ISBN:9781450357616
    DOI:10.1145/3203217

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 8 May 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • short-paper

    Acceptance Rates

    Overall Acceptance Rate240of680submissions,35%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader