290 likes | 302 Views
Explore a method for detecting and classifying network anomalies using traffic feature distributions to analyze volume changes in OD flows. Learn about subspace analysis of link traffic and anomaly diagnosis.
E N D
Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina, Mark Crovella (cs.bu), Christophe Diot (Intel) SIGCOMM 2005
Reference • SIGCOMM 2004 – “Diagnosing Network-Wide Traffic Anomalies” • SIGCOMM 2005 – “Mining Anomalies Using Traffic Feature Distributions” • Authors: • Anukool Lakhina (Ph.D. @ Boston Univ.) • Mark Crovella (Professor @ Boston Univ.) • Christophe Diot (@ Intel Research Lab.) Speaker: Li-Ming Chen
Outline • Network-wide observation • Using subspace method to detect volume anomalies (SIGCOMM’04) • Volume vs. Traffic Feature Distribution (SIGCOMM’05) • Anomaly Diagnosis Methodology • Anomaly Detection • Anomaly Classification • Conclusion & comments Speaker: Li-Ming Chen
Anomaly Diagnosis • Is my network experiencing unusual conditions? • e.g., being attacked?, worm spreading?, equipment outages?, misconfigurations? unknown… • Anomaly Diagnosis • Detection – is there an unusual event? • Identification – what is the best explanation? • Quantification – how serious is the problem? Speaker: Li-Ming Chen
Previous Work on Anomaly Detection • Largely focused on: • Point solutions for specific types of anomalies • E.g., portscans, worm, DoS… • Not a general approach • Single-link traffic data • Not network-wide view • Rule-based classification • Not unsupervised • A general, unsupervised method for reliably detecting and classifying network anomalies is needed Speaker: Li-Ming Chen
Network-wide Observation • Study the proposed anomaly detection and classification framework using sampled flow data collected from all access links of backbone networks • Two backbone networks: Abilene, Géant and Sprint • OD flow is the traffic that enters at an origin PoP and exits at a destination PoP of a backbone network PoP: Points of Presence Speaker: Li-Ming Chen
Volume Anomaly Detection: Problem Statement • A volume anomaly is a sudden change in an OD flow • i.e., point to point traffic • Given link traffic measurements, diagnose the volume anomalies Speaker: Li-Ming Chen
Why care about OD Flows? • If we only monitor traffic • on network links, volume • arising from an OD flow • may not be noticeable. • Thus, naïve approach • won’t work if OD flow • info isn’t available. • (Problem) • A network with n PoP • will have n2 OD flows. • -> OD flows are high • dimensional data… Speaker: Li-Ming Chen
Subspace Analysis of Link Traffic • Even if OD flow information is not available, and only link traffic information is available, PCA can be applied and subspace technique can detect volume anomalies • PCA: Principle Component Analysis • Link Traffic info: data consist of time samples of traffic volumes at all m links in the network • Y is the t x m traffic measurement matrix • An arbitrary row y of Y denotes one sample • Reasons: • Links share OD flows • Set of OD flows also low dimensional • Use PCA to separate normal and anomalous traffic Speaker: Li-Ming Chen
The Subspace Method • An approach to separate normal from anomalous traffic • Normal Subspace, : space spanned by the first k principal components • Anomalous Subspace, : space spanned by the remaining principal components • Then, decompose traffic on all links by projecting onto and to obtain: Residual trafficvector Traffic vector of all links at a particular point in time Normal trafficvector Speaker: Li-Ming Chen
y A Geometric Illustration In general, anomalous traffic results in a large value of Capture size of vector using squared prediction error (SPE): Traffic on Link 2 Traffic on Link 1 Speaker: Li-Ming Chen
Subspace Analysis Results • Note that during anomaly, • normal component • doesn’t change that much • while residual component • changes quite a lot. • Thus, anomalies can be • detected by setting some • threshold. Speaker: Li-Ming Chen
Outline • Network-wide observation • Using subspace method to detect volume anomalies (SIGCOMM’04) • Volume vs. Traffic Feature Distribution (SIGCOMM’05) • Anomaly Diagnosis Methodology • Anomaly Detection • Anomaly Classification • Conclusion & comments Speaker: Li-Ming Chen
Introduction • Challenges for automatically detecting and classifying anomalies: • Anomalies are a moving target (can span a vast range of events) • New anomalies will continue to arise • Anomalies present in network-wide traffic data are buried like needles in a haystack • Goal of this paper: • Seek methods that are able to detect a diverse and general set of network anomalies • With high detection rate and low false alarm rate • Seek to mine the anomalies from the data by discovering and interpreting the patterns present in network-wide traffic Speaker: Li-Ming Chen
Traffic Feature Distributions • Most anomalies share a common characteristic • Anomalies can be detected and distinguished by inspecting traffic features: • 4-tuple: SrcIP, SrcPort, DstIP, DstPort Speaker: Li-Ming Chen
Volume vs. Traffic Feature Distribution • Volume based detection schemes have been successful in isolating large traffic changes • But a large of anomalies do NOT cause detectable disruptions in traffic volume • Using traffic feature distribution • Augments volume-based anomaly detection • Traffic distributions can reveal valuable information about the structure of anomalies • -> information which is not present in traffic volume measures Speaker: Li-Ming Chen
Summarize using sample entropy of histogram X: where symbol i occurs nitimes; S is total # of observations • Dispersed HistogramHigh Entropy ~ 450 new destination ports Dest. Ports # Packets Dest. IPs • Concentrated • Histogram • Low Entropy One destination (victim) dominates # Packets Port scan Traffic Feature Distributions Typical Traffic Speaker: Li-Ming Chen
Port scan anomalies viewed in terms of traffic volume and in terms of entropy Port scan dwarfed in volume metrics… But stands out in feature entropy, which also revealsits structure Speaker: Li-Ming Chen
Entropy based scheme • In volume based scheme, # of packets or bytes per time slot was the variable. • In entropy based scheme, in every time slot, the entropy of every traffic feature is the variable. • This gives us a three way data matrix H. • H(t, p, k) denotes at time t, the entropy of OD flow p, of the traffic feature k. • To apply subspace method, we need to unfold it into a single-way representation.
Multiway Subspace Method:(Multi-way to single-way) • Decompose into a single-way matrix • Now apply the usual subspace decomposition (PCA) • Every row of the matrix will be decomposed into
Comparing Entropy Detections with Detections in Volume Metrics (1) Found in Entropy Only Found in both metrics Found in Volume Only Points that lie to the right of the vertical line are volume-detected anomalies and points that lie above the horizontal line are detected in entropy. Speaker: Li-Ming Chen
Comparing Entropy Detections with Detections in Volume Metrics (2) Speaker: Li-Ming Chen
Detection Rate by Injecting Real Anomalies • Evaluation Methodology • Superimpose known anomaly traces into OD flows • Test sensitivity at varying anomaly intensities, by thinning trace • Results are average over a sequence of experiments Speaker: Li-Ming Chen 12% 1.3% 6.3% 0.63%
Classifying Anomalies by Clustering • Enables unsupervised classification • Each anomaly is a point in 4-D space: • h = [H(srcIP), H(dstIP), H(srcPort), H(dstPort)] • Questions: • Do anomalies form clusters in this space? • Are the clusters meaningful? • Internally consistent, Externally different • What can we learn from the clusters? • Use Hierarchical Agglomerative Algorithm for determining clusters • Minimizes intra-cluster variation and maximizes inter-cluster variation Speaker: Li-Ming Chen
Clustering Known Anomalies(2-D view) Code Red Scanning Multi source DOS attack Single source DOS attack Speaker: Li-Ming Chen
Abilene anomaly clusters (3-D view) • Results of both clustering • algorithms are consistent • Heuristics identify about • 10 clusters in dataset Speaker: Li-Ming Chen
Anomaly Clusters in Abilene data Speaker: Li-Ming Chen
Conclusion • Feature distributions as summarized by entropy are promising for general anomaly diagnosis • Network-Wide Detection: • Entropy significantly augments volume metrics • Highly sensitive: Detection rates of 90% possible, even when anomaly is 1% of background traffic • Anomaly Classification: • Clusters are meaningful, and reveal new anomalies Speaker: Li-Ming Chen
Comments • The paper only discusses anomaly detection on offline data. Can it be enhanced for online anomaly detection? • We still need volume based detection because feature distribution does not identify all anomalies. • Can other fields in packet header be used for anomaly detection? Speaker: Li-Ming Chen