Applying PCA for Traffic Anomaly Detection: Problems and Solutions

Applying PCA for Traffic Anomaly Detection:Problems and Solutions Daniela Brauckhoff (ETH Zurich, CH) KaveSalamatian (Lancaster University, FR) Martin May (Thomson, CH) IEEE INFOCOM (April, 2009)

Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion

What is PCA? • PCA • Principle Component Analysis • PCA’s Usage • lower the characteristic dimension • e.g., a picture with size 1024 * 768 • its characteristic dimension is its length * width • with 786432 characteristic value • use PCA to lower the characteristic dimension

What is PCA? (cont.1) Ref. Site- http://blog.finalevil.com/2008/07/pca.html

What is PCA? (cont.2)

Problems and Solutions • Consider the temporal correlation of the data • Extend the PCA • Replaced by Karhunen-Loeve Transform

Two different interpretations • As an efficient representation that transforms the data to a new coordinate system • Projection on the first coordinate contains the greatest variance • As a modeling technique • using a finite number of terms of an orthogonal serie expansion of the signal with uncorrelated coefficients

Background • Suppose that we have a column vector of correlated random variables: • Matrix => • Each random variable has its own observation vector through N dependent realization vector: • Note: • Random variables means the data you collected from network

Background (cont.1) • In order to find the characteristicof the above data collected from network • i.e., the most suitable basis: , • where is an eigenvector of the covariance matrix defined as , estimated by • where is a column vector containing the means of

Background (cont.2) • The most suitable basis: • How to find the respectively? • i.e., solve the following linear equation: • Method: SVD (Singular Value Decomposition) • Note: basis change matrix

Background (cont.3) • But is a basis change matrix only when is zero mean • Meanwhile, must replaced by • i.e., • not taking care of it could lead to large errors when using PCA • Rewrite the initial vector of random variables • is the essential property! • i.e., suitable for PCA representation

Stochastic Process • The extension to PCA Stochastic processes that have temporal as well as spatial correlations • Assume we have a K-vector of zero mean stationary stochastic processes • with a covariance function

Stochastic Process (cont.1) • The multi-dimension Karhunen-Loeve theorem states that one can rewrite this vector as a serie expansion (named KL expansion): • Compared:

Stochastic Process (cont.2) • How to get basis function ? • Solve the linear integral equations: • Compared: • Then we can obtained by

Stochastic Process (cont.3) • But Galerkin methodtransforms the above integral equations to a matrix problem that can be solved by applying the SVD technique • It possible to derive the KL expansion using only a finite number of samples • Time-sampled version => • Finally, we obtain a discrete version of the KL expansion as:

Stochastic Process (cont.4) • Construct a KN × (n − N) observation matrix • With KN eigenvector

Stochastic Process (cont.5) • Use to estimate the all needed spatio-temporal convariance

Data Set and Metrics • Collect Three weeks of Netflow data • one of the peering links of a medium-sized ISP (SWITCH, AS559) • Recorded in August 2007 • comprise a variety of traffic anomalies • happening in daily operation such as network scans, denial of service attacks, alpha flows, etc

Data Set and Metrics (cont.1) • The computing the detection metrics: • distinguish between incoming and outgoing traffic, as well as UDP and TCP flows • for each of these four categories, compute seven commonly used traffic features: • Byte • Packet • flow counts • Sources and destination IP address entropy • Source and destination IP address counts

Data Set and Metrics (cont.2) • All metrics obtained by aggregating the traffic in 15 minutes intervals resulting 28*96 matrix per measurement day • Anomalies identified by using visual inspection • Resulted in 28 detected anomalous events in UDP and 73 detected in TCP traffic

Data Set and Metrics (cont.3) • Use the vector of metrics containing the first two days of metrics for building the model • Derive a spatio-temporal correlation matrix with the temporal correlation range set to N = 1, .., 5 • Note that setting N = 1 gives the standard PCA approach • apply SVD decomposition to the data, resulting in a basis change matrix

ROC curves • Receiver Operating Characteristics (ROC) curve combining the two parameters in one value captures this essential trade-off • false positive and true positive

ROC curves (cont.1) • Receiver Operating Characteristics (ROC) curve combining the two parameters in one value captures this essential trade-off • false positive and true positive

ROC curves (cont.2)

ROC curves (cont.3) • The comparison of ROC curves shows a considerable improvement of the anomaly detection performance with use of KL expansion with N = 2, 3 consistently for UDP and TCP traffic and thereafter a decrease for N ≥ 4

Effect of non-stationarity • Stationarity issue: • N ≥ 4 the performance decreases • when N increases, the model contains more parameters and becomes more sensitive to the stationarity of the traffic metrics

Conclusion • Direct application of the PCA method results in poor performance in terms of ROC curves • The correct framework is not the classical PCA but rather the Karhunen-Loeve expansion • Provide a Galerkin method for developing a predictive model and therefore an important improvement is attained when temporal correlation is considered

Q & A Thank you!

Applying PCA for Traffic Anomaly Detection: Problems and Solutions

Applying PCA for Traffic Anomaly Detection: Problems and Solutions

Presentation Transcript

Collision Detection

“A” students work (without solutions manual) ~ 10 problems/night .

How to perform Routine Anomaly Scan 2008

“A” students work (without solutions manual) ~ 10 problems/night .

Intrusion Detection System

SUMO (Simulation of Urban MObility) An open-source traffic simulation – Problems and Solutions Daniel Krajzewicz

Intrusion Detection

Network Payload-based Anomaly Detection and Content-based Alert Correlation

Anomaly and sequential detection with time series data

Using global climate models to evaluate environmental problems and potential solutions

Large-Scale Copy Detection

11/14/2003 10:00AM

Anomaly and sequential detection with time series data

Error Detection and Correction

This lecture will help you understand:

Often Overlooked Problems When Applying Automatic Transfer Switches in Institutions

Traffic Control

Anomaly Detection Systems

SHIKANSEN

Air Traffic Control

Traffic Law Enforcement