ABSTRACT
With rapid growth of Big Data, Apache Spark has been in widespread use. However, with the system scale growing, application delays caused by abnormal tasks/nodes become a common problem in Spark systems. In this paper, we propose an anomaly detection approach based on stage-task behaviors modeling. First, we assume that the abnormal behavior of tasks can reflect the node's abnormal situation. Then, from the collected Spark runtime logs, we extract the four-dimension feature vector that related to the tasks execution status, and then classify the task behaviors as normal and abnormal, which is used to discover the abnormal nodes from the distribution of abnormal tasks. Simultaneously, we build the online framework on Spark Streaming and it could integrate the offline learning methodologies, such as the logical regression method, which is a very simple and powerful classifier for the low-dimensional eigenvectors. Additionally, our experiments show that the accuracy of realtime anomaly detection reaches about 91%, and the given cases show that our framework is really effective for detecting abnormal nodes.
- 2017. Stress tool. http://weather.ou.edu/~apw/projects/stress/. (2017).Google Scholar
- 2017. Z-test. https://en.wikipedia.org/wiki/Z-test. (2017).Google Scholar
- 2018. P-value. https://en.wikipedia.org/wiki/P-value. (2018).Google Scholar
- Elastic. 2017. Logstash. https://www.elastic.co/products/logstash. (2017).Google Scholar
- P. Harrington. 2012. Machine learning in action{M}. Greenwich, CT: Manning. Google ScholarDigital Library
- J. Kreps, N. Narkhede, and J. Rao. 2011. Kafka: A distributed messaging system for log processing. In 6th International Workshop on Networking Meets Databases (NetDB).Google Scholar
- Siyang Lu, BingBing Rao, Xiang Wei, Byungchul Tak, Long Wang, and Liqiang Wang. 2017. Log-based Abnormal Task Detection and Root Cause Analysis for Spark. In IEEE International Conference on Web Services (ICWS).Google Scholar
- L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang, Y. He, W. Gao, Z. Jia, Y. Shi, S. Zhang, C. Zheng, G. Lu, K. Zhan, and B. Qiu. 2014. BigDataBench: A Big Data Benchmark Suite from Internet Services. In Proceedings of the 20th IEEE International Symposium On High Performance Computer Architecture.Google Scholar
- Wikipedia. 2018. Logistic regression. https://en.wikipedia.org/wiki/Logistic_regression. (2018).Google Scholar
- wikipedia. 2018. Precision and recall. https://en.wikipedia.org/wiki/Precision_and_recall. (2018).Google Scholar
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, and I. Stoica. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In Conference on Networked Systems Design and Implementationv. Google ScholarDigital Library
Recommendations
Two-stage anomaly detection algorithm via dynamic community evolution in temporal graph
AbstractDetecting anomalies from a massive amount of user behavioral data is often liken to finding a needle in a haystack. While tremendous efforts have been devoted to anomaly detection from temporal graphs, existing studies rarely consider community ...
A Formal Framework for Program Anomaly Detection
RAID 2015: Proceedings of the 18th International Symposium on Research in Attacks, Intrusions, and Defenses - Volume 9404Program anomaly detection analyzes normal program behaviors and discovers aberrant executions caused by attacks, misconfigurations, program bugs, and unusual usage patterns. The merit of program anomaly detection is its independence from attack ...
Online anomaly detection for sensor systems: A simple and efficient approach
Wireless sensor systems aid scientific studies by instrumenting the real world and collecting measurements. Given the large volume of measurements collected by sensor systems, one problem arises-an automated approach to identifying the ''interesting'' ...
Comments