360 likes | 542 Views
Applying PCA for Traffic Anomaly Detection: Problems and Solutions. Daniela Brauckhoff (ETH Zurich, CH) Kave Salamatian (Lancaster University, FR) Martin May (Thomson, CH) IEEE INFOCOM (April, 2009). Agenda. Before Introduction Objective A Signal Processing View on PCA
E N D
Applying PCA for Traffic Anomaly Detection:Problems and Solutions Daniela Brauckhoff (ETH Zurich, CH) KaveSalamatian (Lancaster University, FR) Martin May (Thomson, CH) IEEE INFOCOM (April, 2009)
Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion
What is PCA? • PCA • Principle Component Analysis • PCA’s Usage • lower the characteristic dimension • e.g., a picture with size 1024 * 768 • its characteristic dimension is its length * width • with 786432 characteristic value • use PCA to lower the characteristic dimension
What is PCA? (cont.1) Ref. Site- http://blog.finalevil.com/2008/07/pca.html
Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion
Problems and Solutions • Consider the temporal correlation of the data • Extend the PCA • Replaced by Karhunen-Loeve Transform
Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion
Two different interpretations • As an efficient representation that transforms the data to a new coordinate system • Projection on the first coordinate contains the greatest variance • As a modeling technique • using a finite number of terms of an orthogonal serie expansion of the signal with uncorrelated coefficients
Background • Suppose that we have a column vector of correlated random variables: • Matrix => • Each random variable has its own observation vector through N dependent realization vector: • Note: • Random variables means the data you collected from network
Background (cont.1) • In order to find the characteristicof the above data collected from network • i.e., the most suitable basis: , • where is an eigenvector of the covariance matrix defined as , estimated by • where is a column vector containing the means of
Background (cont.2) • The most suitable basis: • How to find the respectively? • i.e., solve the following linear equation: • Method: SVD (Singular Value Decomposition) • Note: basis change matrix
Background (cont.3) • But is a basis change matrix only when is zero mean • Meanwhile, must replaced by • i.e., • not taking care of it could lead to large errors when using PCA • Rewrite the initial vector of random variables • is the essential property! • i.e., suitable for PCA representation
Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion
Stochastic Process • The extension to PCA Stochastic processes that have temporal as well as spatial correlations • Assume we have a K-vector of zero mean stationary stochastic processes • with a covariance function
Stochastic Process (cont.1) • The multi-dimension Karhunen-Loeve theorem states that one can rewrite this vector as a serie expansion (named KL expansion): • Compared:
Stochastic Process (cont.2) • How to get basis function ? • Solve the linear integral equations: • Compared: • Then we can obtained by
Stochastic Process (cont.3) • But Galerkin methodtransforms the above integral equations to a matrix problem that can be solved by applying the SVD technique • It possible to derive the KL expansion using only a finite number of samples • Time-sampled version => • Finally, we obtain a discrete version of the KL expansion as:
Stochastic Process (cont.4) • Construct a KN × (n − N) observation matrix • With KN eigenvector
Stochastic Process (cont.5) • Use to estimate the all needed spatio-temporal convariance
Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion
Data Set and Metrics • Collect Three weeks of Netflow data • one of the peering links of a medium-sized ISP (SWITCH, AS559) • Recorded in August 2007 • comprise a variety of traffic anomalies • happening in daily operation such as network scans, denial of service attacks, alpha flows, etc
Data Set and Metrics (cont.1) • The computing the detection metrics: • distinguish between incoming and outgoing traffic, as well as UDP and TCP flows • for each of these four categories, compute seven commonly used traffic features: • Byte • Packet • flow counts • Sources and destination IP address entropy • Source and destination IP address counts
Data Set and Metrics (cont.2) • All metrics obtained by aggregating the traffic in 15 minutes intervals resulting 28*96 matrix per measurement day • Anomalies identified by using visual inspection • Resulted in 28 detected anomalous events in UDP and 73 detected in TCP traffic
Data Set and Metrics (cont.3) • Use the vector of metrics containing the first two days of metrics for building the model • Derive a spatio-temporal correlation matrix with the temporal correlation range set to N = 1, .., 5 • Note that setting N = 1 gives the standard PCA approach • apply SVD decomposition to the data, resulting in a basis change matrix
ROC curves • Receiver Operating Characteristics (ROC) curve combining the two parameters in one value captures this essential trade-off • false positive and true positive
ROC curves (cont.1) • Receiver Operating Characteristics (ROC) curve combining the two parameters in one value captures this essential trade-off • false positive and true positive
ROC curves (cont.3) • The comparison of ROC curves shows a considerable improvement of the anomaly detection performance with use of KL expansion with N = 2, 3 consistently for UDP and TCP traffic and thereafter a decrease for N ≥ 4
Effect of non-stationarity • Stationarity issue: • N ≥ 4 the performance decreases • when N increases, the model contains more parameters and becomes more sensitive to the stationarity of the traffic metrics
Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion
Conclusion • Direct application of the PCA method results in poor performance in terms of ROC curves • The correct framework is not the classical PCA but rather the Karhunen-Loeve expansion • Provide a Galerkin method for developing a predictive model and therefore an important improvement is attained when temporal correlation is considered
Q & A Thank you!