340 likes | 423 Views
ASTUTE: Detecting a Different Class of Traffic Anomalies. Fernando Silveira 1,2 , Christophe Diot 1 , Nina Taft 3 , Ramesh Govindan 4 1 Technicolor 2 UPMC Paris Universitas 3 Intel Labs Berkeley 4 University of Southern California ACM SIGCOMM 2010.
E N D
ASTUTE: Detecting a Different Class of Traffic Anomalies Fernando Silveira1,2, Christophe Diot1, Nina Taft3, Ramesh Govindan4 1 Technicolor 2 UPMC Paris Universitas 3 Intel Labs Berkeley 4 University of Southern California ACM SIGCOMM 2010
AShort-Timescale Uncorrelated-Traffic Equilibrium ASTUTE: Detecting a Different Class of Traffic Anomalies Comparing to Kalman Filter and Wavelet Analysis, ASTUTE can find anomalies with different features • Kalman & Wavelet can detect: • few large flows • ASTUTE can detect: • many small flows
Outline • Motivation & Goal • ASTUTE – An Equilibrium Model • ASTUTE-based Anomaly Detection • Experimental Methodology • Performance Evaluation • Conclusion & My Comments Speaker: Li-Ming Chen
Anomaly Detection • Traffic anomalies (in large ISPs & enterprise networks) come from: • Malicious activities (e.g., DoS, port scan) • Misconfigurations/failures of network components (e.g., link failure, routing problem) • Legitimate events (e.g., large file transfers, flash crowds) • Anomaly detection: • Build a statistical model of normal traffic • An anomaly is defined as deviation from the normal model Speaker: Li-Ming Chen
Motivation: Challenges in Anomaly Detection • Anomaly Detection: • Pros: • Can detect new anomalies! • Cons: • Training takes times • Training data is never guaranteed to be clean • Periodical (re)training is required • False alarm • Can we detect anomalies without having to learn what is normal? Speaker: Li-Ming Chen
Observation • Network Traffic show Equilibrium: • When many flows are multiplexed on a non-saturated link, their volume changes over short timescales tend to cancel each other out • making the average change across flows close to ZERO • The equilibrium property • Holds if the flows are independent • While, is violated by traffic changes caused by several, potentially small, correlated flows • ~ traffic anomalies Speaker: Li-Ming Chen
Goal • Propose a new approach to anomaly detection based on ASTUTE • A mathematical model to describe “AShort-Timescale Uncorrelated-Traffic Equilibrium” • Advantages: • No training – computationally simple and immune to data-poisoning • Accurately detects a well-defined class of traffic anomalies • Theoretical guarantees on the false positive rates • Evaluate the performance against Kalman filter and wavelet analysis Speaker: Li-Ming Chen
Outline • Motivation & Goal • ASTUTE – An Equilibrium Model • ASTUTE-based Anomaly Detection • Experimental Methodology • Performance Evaluation • Conclusion & My Comments Speaker: Li-Ming Chen
… time … Equilibrium Model Measure flow volume on a link for each time bin bin i+1 bin i • Flow: a set of packets that share the same values for a given set of traffic features (e.g., 5-tuple) • Binning: use time bin to study the evolution of a flow • Flow volume: number of packets in the flow during the corresponding bin flow f starts at time bin sf flow f ’s volume of each time bin can be represented as a vector: xf,i xf,i+1 flow f continued for df bins Speaker: Li-Ming Chen
… time … Equilibrium Model:Focus on Volume Changes of Flows bin i+1 bin i flow f ’s volume of each time bin can be represented as a vector: F: set of flows that are active in i or i+1 xf,i xf,i+1 (volume change of f from i to i+1) Speaker: Li-Ming Chen
Consequences of the ASTUTE Model • Assumptions: • (A1) Flow independence • (A2) Stationary • Theorem 1 (consequences of the ASTUTE): Intuition: independent flows cancel each other out other Speaker: Li-Ming Chen
Outline • Motivation & Goal • ASTUTE – An Equilibrium Model • ASTUTE-based Anomaly Detection • Experimental Methodology • Performance Evaluation • Conclusion & My Comments Speaker: Li-Ming Chen
ASTUTE-based Anomaly Detection Method A toy example : • Given: • A detection thresholdK(p) • A pair of consecutive time bins • Measure: • Set of active flows, F • Mean volume change, • Variance of volume changes, • Compute AAV(ASTUTE Assessment Value): • Flag an alarm if: i i+1 0 +2 -1 No Alarm (copy from author’s slides) Speaker: Li-Ming Chen
Note: About Volume Changes • Requirement: • Only consider traffic on non-saturated links, and using short-timescale bins • Volume change (for F flows that are active at bin i): • Mean: • Standard deviation: Speaker: Li-Ming Chen
Note: About Detection Threshold • For large F, has a (1-p) confidence interval given by the central limit theorem • If contains zero, then F satisfies ASTUTE • Otherwise, there is an ASTUTE anomaly at time bin i • smallest value of K(p) is 1-p conf. interval < 0 > 0 p/2 -K(p) 0 K(p) (defined as AAV) Speaker: Li-Ming Chen
Note: Situations that ASTUTE is Violated • There are 2 possibilities that ASTUTE is violated: • (1) false positive • Controlled by false positive rate p • In a fraction p of the time bins, ASTUTE may be violated by normal traffic • (2) Flows violate the model’s assumption: independence & stationary • Stationary: • Only over the timescale of a typical flow duration • Authors study which bin sizes show stationary behavior • Independence: • Many flows increase/decrease their volumes at the same time! Speaker: Li-Ming Chen
Note: Validate Stationary Assumption (A2) • Stationary: • Depends on timescale (bin size) • In the trace: • Long scales: daily usage bias • Small scales: no bias! • We use short timescales to factor out violations of stationarity Speaker: Li-Ming Chen
Note: Validate “Gaussianity” of AAVs Study the impact of packet sampling rate Check distribution similarity Speaker: Li-Ming Chen
Outline • Motivation & Goal • ASTUTE – An Equilibrium Model • ASTUTE-based Anomaly Detection • Experimental Methodology • Competitors (or collaborator!?): Kalman & Wavelet • Inspect anomalies from traffic data and identify their root causes • Simulation through anomaly injection • Performance Evaluation • Conclusion & My Comments Speaker: Li-Ming Chen
Kalman & Wavelet (alternative anomaly detectors for comparison purpose) • Kalman: a spatio-temporal detector • Learning spatial and temporal correlations to predict the next values • Its threshold parameter has similar semantics to that of ASTUTE (allowing a direct comparison) • [26] A. Soule, K. Salamatian, and N. Taft, “Combining Filtering and Statistical Methods for Anomaly Detection,” in Proc. IMC, 2005. • Wavelet: a frequency-based detector • Decompose signals into low/medium/high frequency bands • The variance of the combined signal (medium & high freq. bands) represents anomalies • [2] P. Barford, J. Kline, D. Plonka, and A.Ron, “A Signal Analysis of Network Traffic Anomalies,” In Proc. IMW, 2002. Speaker: Li-Ming Chen
Kalman & Wavelet (cont’d) • Targets of these two detectors: • (1) packet volume time series • (2) entropy time series of Src. IP • (3) entropy time series of Dst. IP • (4) entropy time series of Src. Port • (5) entropy time series of Dst. port Speaker: Li-Ming Chen
Dataset • Flow traces from 3 different networks Flow sampling: 0.1 0.01 NO (between research institutions) (public Internet European NRENs) (inside the enterprise network) Speaker: Li-Ming Chen
Manual Classification of Anomalies for Root Cause Analysis • Goal: • To perform “root cause” analysis for the anomalies found by ASTUTE, Kalman, and Wavelet • need to know the root cause first • Approach: • Use information provided by ASTUTE to help the process of manual classification of anomalies in the traffic trace • Steps: • (1) correlated anomalous flows • (2) anomalous flow identification • (3) anomalous flow classification (by hand) Speaker: Li-Ming Chen
Results of Anomalous Flow Classification • Take these as the criteria for labeling the anomalies found in the three traces Speaker: Li-Ming Chen
Simulation through Anomaly Injection • Benefit: • Simulation helps understand how methods trade-off detection rates for false positives (ROC curves) • ps: for comparing Kalman and ASTUTE only • Approach: • For end-host activity: build a set of benchmark anomalies and inject (recreate identified anomalies) • For outages: remove related traffic Speaker: Li-Ming Chen
Outline • Motivation & Goal • ASTUTE – An Equilibrium Model • ASTUTE-based Anomaly Detection • Experimental Methodology • Performance Evaluation • Conclusion & My Comments Speaker: Li-Ming Chen
Number of Anomalies and Anomaly Overlap Small overlap Kalman & Wavelet have more overlap among each other • what are these anomalies?? Speaker: Li-Ming Chen
Anomaly Types (Internet2) Detection capabilities are different Speaker: Li-Ming Chen
Anomaly Types (GEANT2 & Corporate) Users characteristics in different networks are different Speaker: Li-Ming Chen
Small Detector Overlap Kalman/Wavelet (few large flow) Less total volume ASTUTE (several small flow) (map qualitative properties (types) of the anomalies to their quantitative properties (# flows and packets)) Speaker: Li-Ming Chen
Detection Performance Type 2 Type 1 Type 2 Type 3 Speaker: Li-Ming Chen
Complementarity of ASTUTE & Kalman After combination, the performance is better! Speaker: Li-Ming Chen
Outline • Motivation & Goal • ASTUTE – An Equilibrium Model • ASTUTE-based Anomaly Detection • Experimental Methodology • Performance Evaluation • Conclusion & My Comments Speaker: Li-Ming Chen
Conclusion • ASTUTE detects anomalies w/o learning the normal behavior • Computationally simple and immune to data-poisoning • Specializes on strongly correlated flows (several small flow) • Limitation: can not find anomalies involving a few large flows • But those are easy to find! • ASTUTE and Kalman complement each other nicely • ASTUTE also provides information that is useful to perform root cause analysis Speaker: Li-Ming Chen