short-paper

Online anomaly detection framework for spark systems via stage-task behavior modeling

Authors:
Rui Ren

Institute of Computing Technology, CAS and University of Chinese Academy of Sciences

Institute of Computing Technology, CAS and University of Chinese Academy of Sciences
View Profile

,
Shuai Tian

Kunming University of Science and Technology

Kunming University of Science and Technology
View Profile

,
Lei Wang

Institute of Computing Technology, CAS

Institute of Computing Technology, CAS
View Profile

CF '18: Proceedings of the 15th ACM International Conference on Computing FrontiersMay 2018Pages 256–259https://doi.org/10.1145/3203217.3203265

Published:08 May 2018Publication History

CF '18: Proceedings of the 15th ACM International Conference on Computing Frontiers

Pages 256–259

ABSTRACT

With rapid growth of Big Data, Apache Spark has been in widespread use. However, with the system scale growing, application delays caused by abnormal tasks/nodes become a common problem in Spark systems. In this paper, we propose an anomaly detection approach based on stage-task behaviors modeling. First, we assume that the abnormal behavior of tasks can reflect the node's abnormal situation. Then, from the collected Spark runtime logs, we extract the four-dimension feature vector that related to the tasks execution status, and then classify the task behaviors as normal and abnormal, which is used to discover the abnormal nodes from the distribution of abnormal tasks. Simultaneously, we build the online framework on Spark Streaming and it could integrate the offline learning methodologies, such as the logical regression method, which is a very simple and powerful classifier for the low-dimensional eigenvectors. Additionally, our experiments show that the accuracy of realtime anomaly detection reaches about 91%, and the given cases show that our framework is really effective for detecting abnormal nodes.

References

2017. Stress tool. http://weather.ou.edu/~apw/projects/stress/. (2017).Google Scholar
2017. Z-test. https://en.wikipedia.org/wiki/Z-test. (2017).Google Scholar
2018. P-value. https://en.wikipedia.org/wiki/P-value. (2018).Google Scholar
Elastic. 2017. Logstash. https://www.elastic.co/products/logstash. (2017).Google Scholar
P. Harrington. 2012. Machine learning in action{M}. Greenwich, CT: Manning. Google ScholarDigital Library
J. Kreps, N. Narkhede, and J. Rao. 2011. Kafka: A distributed messaging system for log processing. In 6th International Workshop on Networking Meets Databases (NetDB).Google Scholar
Siyang Lu, BingBing Rao, Xiang Wei, Byungchul Tak, Long Wang, and Liqiang Wang. 2017. Log-based Abnormal Task Detection and Root Cause Analysis for Spark. In IEEE International Conference on Web Services (ICWS).Google Scholar
L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang, Y. He, W. Gao, Z. Jia, Y. Shi, S. Zhang, C. Zheng, G. Lu, K. Zhan, and B. Qiu. 2014. BigDataBench: A Big Data Benchmark Suite from Internet Services. In Proceedings of the 20th IEEE International Symposium On High Performance Computer Architecture.Google Scholar
Wikipedia. 2018. Logistic regression. https://en.wikipedia.org/wiki/Logistic_regression. (2018).Google Scholar
wikipedia. 2018. Precision and recall. https://en.wikipedia.org/wiki/Precision_and_recall. (2018).Google Scholar
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, and I. Stoica. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In Conference on Networked Systems Design and Implementationv. Google ScholarDigital Library

Recommendations

Two-stage anomaly detection algorithm via dynamic community evolution in temporal graph
Abstract
Detecting anomalies from a massive amount of user behavioral data is often liken to finding a needle in a haystack. While tremendous efforts have been devoted to anomaly detection from temporal graphs, existing studies rarely consider community ...
Read More
A Formal Framework for Program Anomaly Detection
RAID 2015: Proceedings of the 18th International Symposium on Research in Attacks, Intrusions, and Defenses - Volume 9404

Program anomaly detection analyzes normal program behaviors and discovers aberrant executions caused by attacks, misconfigurations, program bugs, and unusual usage patterns. The merit of program anomaly detection is its independence from attack ...
Read More
Online anomaly detection for sensor systems: A simple and efficient approach

Wireless sensor systems aid scientific studies by instrumenting the real world and collecting measurements. Given the large volume of measurements collected by sensor systems, one problem arises-an automated approach to identifying the ''interesting'' ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CF '18: Proceedings of the 15th ACM International Conference on Computing Frontiers
May 2018
401 pages
ISBN:9781450357616
DOI:10.1145/3203217
General Chairs:
David Kaeli
Northeastern University
,
Miquel Pericàs
Chalmers University of Technology, SE
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
feature extraction
offline logical regression
realtime anomaly detection
spark system
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate240of680submissions,35%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 164
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Online anomaly detection framework for spark systems via stage-task behavior modeling

CF '18: Proceedings of the 15th ACM International Conference on Computing Frontiers

ABSTRACT

References

Cited By

Recommendations

Two-stage anomaly detection algorithm via dynamic community evolution in temporal graph

A Formal Framework for Program Anomaly Detection

Online anomaly detection for sensor systems: A simple and efficient approach