1 / 34

ASTUTE: Detecting a Different Class of Traffic Anomalies

ASTUTE: Detecting a Different Class of Traffic Anomalies. Fernando Silveira 1,2 , Christophe Diot 1 , Nina Taft 3 , Ramesh Govindan 4 1 Technicolor 2 UPMC Paris Universitas 3 Intel Labs Berkeley 4 University of Southern California ACM SIGCOMM 2010.

siusan
Download Presentation

ASTUTE: Detecting a Different Class of Traffic Anomalies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ASTUTE: Detecting a Different Class of Traffic Anomalies Fernando Silveira1,2, Christophe Diot1, Nina Taft3, Ramesh Govindan4 1 Technicolor 2 UPMC Paris Universitas 3 Intel Labs Berkeley 4 University of Southern California ACM SIGCOMM 2010

  2. AShort-Timescale Uncorrelated-Traffic Equilibrium ASTUTE: Detecting a Different Class of Traffic Anomalies Comparing to Kalman Filter and Wavelet Analysis, ASTUTE can find anomalies with different features • Kalman & Wavelet can detect: •  few large flows • ASTUTE can detect: •  many small flows

  3. Outline • Motivation & Goal • ASTUTE – An Equilibrium Model • ASTUTE-based Anomaly Detection • Experimental Methodology • Performance Evaluation • Conclusion & My Comments Speaker: Li-Ming Chen

  4. Anomaly Detection • Traffic anomalies (in large ISPs & enterprise networks) come from: • Malicious activities (e.g., DoS, port scan) • Misconfigurations/failures of network components (e.g., link failure, routing problem) • Legitimate events (e.g., large file transfers, flash crowds) • Anomaly detection: • Build a statistical model of normal traffic • An anomaly is defined as deviation from the normal model Speaker: Li-Ming Chen

  5. Motivation: Challenges in Anomaly Detection • Anomaly Detection: • Pros: • Can detect new anomalies! • Cons: • Training takes times • Training data is never guaranteed to be clean • Periodical (re)training is required • False alarm •  Can we detect anomalies without having to learn what is normal? Speaker: Li-Ming Chen

  6. Observation • Network Traffic show Equilibrium: • When many flows are multiplexed on a non-saturated link, their volume changes over short timescales tend to cancel each other out •  making the average change across flows close to ZERO • The equilibrium property • Holds if the flows are independent • While, is violated by traffic changes caused by several, potentially small, correlated flows • ~ traffic anomalies Speaker: Li-Ming Chen

  7. Goal • Propose a new approach to anomaly detection based on ASTUTE • A mathematical model to describe “AShort-Timescale Uncorrelated-Traffic Equilibrium” • Advantages: • No training – computationally simple and immune to data-poisoning • Accurately detects a well-defined class of traffic anomalies • Theoretical guarantees on the false positive rates • Evaluate the performance against Kalman filter and wavelet analysis Speaker: Li-Ming Chen

  8. Outline • Motivation & Goal • ASTUTE – An Equilibrium Model • ASTUTE-based Anomaly Detection • Experimental Methodology • Performance Evaluation • Conclusion & My Comments Speaker: Li-Ming Chen

  9. time … Equilibrium Model Measure flow volume on a link for each time bin bin i+1 bin i • Flow: a set of packets that share the same values for a given set of traffic features (e.g., 5-tuple) • Binning: use time bin to study the evolution of a flow • Flow volume: number of packets in the flow during the corresponding bin flow f starts at time bin sf flow f ’s volume of each time bin can be represented as a vector: xf,i xf,i+1 flow f continued for df bins Speaker: Li-Ming Chen

  10. time … Equilibrium Model:Focus on Volume Changes of Flows bin i+1 bin i flow f ’s volume of each time bin can be represented as a vector: F: set of flows that are active in i or i+1 xf,i xf,i+1 (volume change of f from i to i+1) Speaker: Li-Ming Chen

  11. Consequences of the ASTUTE Model • Assumptions: • (A1) Flow independence • (A2) Stationary • Theorem 1 (consequences of the ASTUTE): Intuition: independent flows cancel each other out other Speaker: Li-Ming Chen

  12. Outline • Motivation & Goal • ASTUTE – An Equilibrium Model • ASTUTE-based Anomaly Detection • Experimental Methodology • Performance Evaluation • Conclusion & My Comments Speaker: Li-Ming Chen

  13. ASTUTE-based Anomaly Detection Method A toy example : • Given: • A detection thresholdK(p) • A pair of consecutive time bins • Measure: • Set of active flows, F • Mean volume change, • Variance of volume changes, • Compute AAV(ASTUTE Assessment Value): • Flag an alarm if: i i+1 0 +2 -1 No Alarm (copy from author’s slides) Speaker: Li-Ming Chen

  14. Note: About Volume Changes • Requirement: • Only consider traffic on non-saturated links, and using short-timescale bins • Volume change (for F flows that are active at bin i): • Mean: • Standard deviation: Speaker: Li-Ming Chen

  15. Note: About Detection Threshold • For large F, has a (1-p) confidence interval given by the central limit theorem •  If contains zero, then F satisfies ASTUTE •  Otherwise, there is an ASTUTE anomaly at time bin i •  smallest value of K(p) is 1-p conf. interval < 0 > 0 p/2 -K(p) 0 K(p) (defined as AAV) Speaker: Li-Ming Chen

  16. Note: Situations that ASTUTE is Violated • There are 2 possibilities that ASTUTE is violated: • (1) false positive • Controlled by false positive rate p • In a fraction p of the time bins, ASTUTE may be violated by normal traffic • (2) Flows violate the model’s assumption: independence & stationary • Stationary: • Only over the timescale of a typical flow duration • Authors study which bin sizes show stationary behavior • Independence: • Many flows increase/decrease their volumes at the same time! Speaker: Li-Ming Chen

  17. Note: Validate Stationary Assumption (A2) • Stationary: • Depends on timescale (bin size) • In the trace: • Long scales: daily usage bias • Small scales: no bias! •  We use short timescales to factor out violations of stationarity Speaker: Li-Ming Chen

  18. Note: Validate “Gaussianity” of AAVs Study the impact of packet sampling rate Check distribution similarity Speaker: Li-Ming Chen

  19. Outline • Motivation & Goal • ASTUTE – An Equilibrium Model • ASTUTE-based Anomaly Detection • Experimental Methodology • Competitors (or collaborator!?): Kalman & Wavelet • Inspect anomalies from traffic data and identify their root causes • Simulation through anomaly injection • Performance Evaluation • Conclusion & My Comments Speaker: Li-Ming Chen

  20. Kalman & Wavelet (alternative anomaly detectors for comparison purpose) • Kalman: a spatio-temporal detector • Learning spatial and temporal correlations to predict the next values • Its threshold parameter has similar semantics to that of ASTUTE (allowing a direct comparison) • [26] A. Soule, K. Salamatian, and N. Taft, “Combining Filtering and Statistical Methods for Anomaly Detection,” in Proc. IMC, 2005. • Wavelet: a frequency-based detector • Decompose signals into low/medium/high frequency bands • The variance of the combined signal (medium & high freq. bands) represents anomalies • [2] P. Barford, J. Kline, D. Plonka, and A.Ron, “A Signal Analysis of Network Traffic Anomalies,” In Proc. IMW, 2002. Speaker: Li-Ming Chen

  21. Kalman & Wavelet (cont’d) • Targets of these two detectors: • (1) packet volume time series • (2) entropy time series of Src. IP • (3) entropy time series of Dst. IP • (4) entropy time series of Src. Port • (5) entropy time series of Dst. port Speaker: Li-Ming Chen

  22. Dataset • Flow traces from 3 different networks Flow sampling: 0.1 0.01 NO (between research institutions) (public Internet  European NRENs) (inside the enterprise network) Speaker: Li-Ming Chen

  23. Manual Classification of Anomalies for Root Cause Analysis • Goal: • To perform “root cause” analysis for the anomalies found by ASTUTE, Kalman, and Wavelet •  need to know the root cause first • Approach: • Use information provided by ASTUTE to help the process of manual classification of anomalies in the traffic trace • Steps: • (1) correlated anomalous flows • (2) anomalous flow identification • (3) anomalous flow classification (by hand) Speaker: Li-Ming Chen

  24. Results of Anomalous Flow Classification • Take these as the criteria for labeling the anomalies found in the three traces Speaker: Li-Ming Chen

  25. Simulation through Anomaly Injection • Benefit: • Simulation helps understand how methods trade-off detection rates for false positives (ROC curves) • ps: for comparing Kalman and ASTUTE only • Approach: • For end-host activity: build a set of benchmark anomalies and inject (recreate identified anomalies) • For outages: remove related traffic Speaker: Li-Ming Chen

  26. Outline • Motivation & Goal • ASTUTE – An Equilibrium Model • ASTUTE-based Anomaly Detection • Experimental Methodology • Performance Evaluation • Conclusion & My Comments Speaker: Li-Ming Chen

  27. Number of Anomalies and Anomaly Overlap Small overlap Kalman & Wavelet have more overlap among each other • what are these anomalies?? Speaker: Li-Ming Chen

  28. Anomaly Types (Internet2) Detection capabilities are different Speaker: Li-Ming Chen

  29. Anomaly Types (GEANT2 & Corporate) Users characteristics in different networks are different Speaker: Li-Ming Chen

  30. Small Detector Overlap Kalman/Wavelet (few large flow) Less total volume ASTUTE (several small flow) (map qualitative properties (types) of the anomalies to their quantitative properties (# flows and packets)) Speaker: Li-Ming Chen

  31. Detection Performance Type 2 Type 1 Type 2 Type 3 Speaker: Li-Ming Chen

  32. Complementarity of ASTUTE & Kalman After combination, the performance is better! Speaker: Li-Ming Chen

  33. Outline • Motivation & Goal • ASTUTE – An Equilibrium Model • ASTUTE-based Anomaly Detection • Experimental Methodology • Performance Evaluation • Conclusion & My Comments Speaker: Li-Ming Chen

  34. Conclusion • ASTUTE detects anomalies w/o learning the normal behavior • Computationally simple and immune to data-poisoning • Specializes on strongly correlated flows (several small flow) • Limitation: can not find anomalies involving a few large flows • But those are easy to find! • ASTUTE and Kalman complement each other nicely • ASTUTE also provides information that is useful to perform root cause analysis Speaker: Li-Ming Chen

More Related