Network Anomography

Network Anomography Yin Zhang – University of Texas at Austin Zihui Ge and Albert Greenberg – AT&T Labs Matthew Roughan – University of Adelaide IMC 2005

Outline • Introduction • Background • Network Anomography • Anomaly detection method • Inference algorithm • Dynamic Network Anomography • Evaluation Methodology • Results • Conclusions & comments

Introduction • In IP network, anomaly detection is a first and important step needed • Respond to unexpected problems • Assure high performance and security • Many types of network problems cause abnormal patterns to appear in the network traffic • DDoSattacks, network worms, vendor implementation bugs, network misconfigurations

Introduction • Network Anomography • Combining “anomalous” with “tomography” • Inferring network-level anomalies from widely available data aggregates • Network Tomography • Inference of traffic matrices from individual link load measurement • Simple Network Management Protocol (SNMP) [34]

Introduction • Link loads and traffic matrices are simply related by a linear equation b = Ax • b: link loads measurements • A: routing matrix • x: traffic matrix • Anomography is more complex • Anomaly detection is performed on a series of measurements over a period of time • Anomalies have dramatically different properties from normal traffic

Introduction • Major contributions • A powerful framework that encompasses a wide class of methods for network anomography • Clearly decouple the inference and anomaly detection step • A new algorithm for dynamic anomography • Isolating routing changes from traffic anomalies • Robust to missing link load measurement • Using data sets from Tier-1 ISP and Internet2’s Abilene network, an extensive and thorough evaluation

Background • Network Tomography • Inferring Origin-Destination (OD) traffic from link load measurement, b = Ax • For a network with n links, m OD flows • Routing matrix is the n*m matrix A • A = [aij], indicates the fraction of traffic from flow j to appear on link i [5, 14, 22, 23, 28, 31, 34, 35]

Network Anomography • Assume that routing matrices A are time-invariant • B = AX • B = [b1b2…bt] • X = [x1x2…xt] • Two basic solution strategiesto network anomography • Early inverse • Late inverse

Network Anomography • Early-inverse • Network tomography  Anomaly detection • Drawback • Error in inference problem can contaminate the anomaly detection step • Computationally expensive • Late-inverse • “lossy” inference • Extract the anomalous traffic from the link load observation, then form and solve a new set of inference problems

Network Anomography • is formed by multiplying B with a transformation matrix T • Spatial anomography • Temporal anomography

Spatial Anomography • Spatial PCA • PCA (Principal Component Analysis) • Finding dominant patterns • In [19], Lakhina et al. proposed a subspace analysis of link traffic for anomaly detection • Identify a coordinate transformation of B • The link traffic data under the new coordinate systems have the greatest degree of variance along the first axis, and so forth • These axes are called the principal components

Spatial Anomography • Principal component matrix • P = [v1v2…vm]T • Divide the link traffic space into normal and anomalous subspace • Lakhinaet al. [19] developed a threshold-based separation methed • Examining the projection of link load data on each axis in order • A projection is found that contains a 3σ deviation from the mean, the principal component and all subsequent components are assigned to the anomalous subspace

Spatial Anomography • Anomalous subspace • Pa = [vrvr+1…vm]T • vr is the first component that fail to pass the threshold test • Anomalous traffic can be extracted from link load observation by • First projecting the data into the anomalous subspace and then transforming it back • Transformation matrix

Temporal Anomography • ARIMA (AutoRegressive Moving Average) • Linear time-series forecasting technique • Capture the linear dependency of the future values on the past • Simply identify the forecast errors as anomalous link traffic

Temporal Anomography • Fourier Analysis • Decomposing a complex periodic waveform into a set of sinusoids with different amplitudes, frequencies and phases • Low frequency components capture the daily and weekly traffic pattern, • While high frequency components represent the sudden changes in traffic behavior • High frequency components in the traffic data will use as anomalous link

Temporal Anomography • Wavelet Analysis • Mathematical functions that cut up data into different frequency components • Superior to traditional Fourier methods • Filtering the low frequency components • Temporal PCA • Apply PCA on BT as opposed to B as used in spatial PCA

Inference Algorithms • Present three common inference algorithm for solving linear inverse problem • Deal with the underconstrained linear system by searching for a solution that minimizes some notions of vector norm • Pseudoinverse Solution • Sparsity Maximization • Greedy algorithm

Dynamic Network Anomography • Goal • Allow for dynamic routing changes • The normal “self-healing” behavior of the network • If some measurements are missing (at time j), can still form a consistent problem • By setting the appropriate rows of Aj to zero • Seek a solution which is consistent with the equation, but also minimizes the norm

Dynamic Network Anomography • Use ARIMA model because • it can be written in a form such that the set of constraints does not grow with t • Also developed two techniques [33] to reduce the size of the minimization problems • Routing change is infrequent (i.e. not in every time interval) and local (i.e. only in a small subset of rows)

Evaluation Methodology • Two large backbone networks (USA) • Internet2’s Abilene network • 12 routers, 15 backbone links, 144 OD flows • Tier-1 ISP • Hundreds of routers, thousands of links, reduce the total number of OD flows to about 6000

Evaluation Methodology • Ideally, compare the set of anomalies identified by each of the method to the set of “true” network anomalies • Very difficult task • Instead, perform pair-wise comparisions, base on the top ranked anomalies identified by each of the anomography methods • Set BM(j): apply anomaly detection method j directly to the OD flow data, top ranked M anomalies • Set AN(i): each of anomography method i examine the set of N largest anomalies inferred from link load data

Evaluation Methodology • AN(i): Anomography mathod i • BM(j): Benchmark j • False Positives • N < M (ex: N=30, M=50) • Top30 anomalies of A but not in top50 of B • False Negatives • N > M (ex: N=50, M=30) • Top 50 anomalies of A but not in top30 of B • Detective rate • Is the ratio of the overlap between the two sets

Results – Inference Techniques

Results – Robustness

Results – Impact of Route Change

Results – Anomography Methods

Results – Cross Validation

Conclusions • A powerful framework for anomography • Separate the anomaly detection component from inference component • Put forward a number of novel algorithms • Anomaly detection, inference component • Spatial versus temporal approaches • New dynamic anomography algorithm • Handle routing change • Robust to missing data • Evaluate anomography methods • ARIMA based methods and l1 norm minimization shows high fidelity and robustness

Comments • Traffic pattern anomaly detection • Only volume and number of flows • An idea about finding deviation • PCA, 3σ deviation from the mean • Mathematics is too hard

Network Anomography