70 likes | 231 Views
EbAT: Online Methods for Detecting Utility Cloud Anomalies. Chengwei Wang CERCS, Georgia Institute of Technology Thesis Advisor: Prof. Karsten Schwan. Outline. Problem Description - Challenges of anomaly detection in utility cloud - State of the art - Positioning our solution
E N D
EbAT: Online Methods for Detecting Utility CloudAnomalies Chengwei Wang CERCS, Georgia Institute of Technology Thesis Advisor: Prof. Karsten Schwan
Outline • Problem Description - Challenges of anomaly detection in utility cloud - State of the art - Positioning our solution • Solution Description • Experiment Evaluation and Discussion - Experiment I: RUBiS Benchmark - Discussion: Hadoop Experiment
Challenge I: Large Scale in Hardware 8’192’000 cores / 81’920’000 VMs on the Infrastructure. Billions of Metrics. Anomalies are buried like needles in a haystack!
Horizontal Crossing Map-Reduce Social- Networking C2C/ B2B/ B2C Video- Sharing APP Level Vertical Crossing Guest OS Guest OS Guest OS Guest OS … … … OS Level Hypervisor Hypervisor … VM Level Platform Platform … Platform Level Challenge II: Large Scale in Software ‘Cloud’ Characteristics: Anomaly Detection Challenges: 1. Millions of Services 1. Scalability 2. Mixture of Heterogeneous Apps 2. Lack of Pre-Knowledge 3. Multi Levels of Software Stacks 3. Vertical/Horizontal Correlating 4. VM Creations/Migrations/Termination 4. Dynamism of Workload Patterns
Problem Definition • Aggregation Problem 1. Scalability: reduce the data volume in communication and analysis 2. Retain valuable information for anomaly detection and identification. 3. ‘horizontal crossing' and 'vertical crossing‘: metrics in different levels/components are collectively considered • Detection Problem 1. Online designating when the utility cloud is experiencing anomalies. 2. High Detection Rate and Low False Alarm Rate 3. Unsupervised method with minimal pre-knowledge about normal or abnormal behaviors. • Zoom-In Problem Localizing anomalies so as to narrow the search scopes for further diagnosing the causes of those anomalies.
State of the Art Issues • Industrial World: Threshold-Based Approaches 1. Incremental False Alarm Rate: the false alarm rate grows with increase in the number of monitoring metrics 2. Detection after the Fact 3. Scalability Issues: online monitoring huge volume of metrics • Academia: Statistical Approaches 1. Sophisticated and promising approaches for problem diagnosis. 2. None claims to be scalable. 3. Most of them use heavy-weight machine learning algorithms 4. Most of them requires pre-knowledge. Details in Backup Slices
Outline • Problem Description - Challenges of anomaly detection in utility cloud - State of the art - Positioning our solution • Solution Description • Experiment Evaluation and Discussion - Experiment I: RUBiS Benchmark - Discussion: Hadoop Experiment