1 / 35

Applying PCA for Traffic Anomaly Detection: Problems and Solutions

Applying PCA for Traffic Anomaly Detection: Problems and Solutions. Daniela Brauckhoff (ETH Zurich, CH) Kave Salamatian (Lancaster University, FR) Martin May (Thomson, CH) IEEE INFOCOM (April, 2009). Agenda. Before Introduction Objective A Signal Processing View on PCA

abiba
Download Presentation

Applying PCA for Traffic Anomaly Detection: Problems and Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying PCA for Traffic Anomaly Detection:Problems and Solutions Daniela Brauckhoff (ETH Zurich, CH) KaveSalamatian (Lancaster University, FR) Martin May (Thomson, CH) IEEE INFOCOM (April, 2009)

  2. Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion

  3. What is PCA? • PCA • Principle Component Analysis • PCA’s Usage • lower the characteristic dimension • e.g., a picture with size 1024 * 768 • its characteristic dimension is its length * width • with 786432 characteristic value • use PCA to lower the characteristic dimension

  4. What is PCA? (cont.1) Ref. Site- http://blog.finalevil.com/2008/07/pca.html

  5. What is PCA? (cont.2)

  6. Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion

  7. Problems and Solutions • Consider the temporal correlation of the data • Extend the PCA • Replaced by Karhunen-Loeve Transform

  8. Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion

  9. Two different interpretations • As an efficient representation that transforms the data to a new coordinate system • Projection on the first coordinate contains the greatest variance • As a modeling technique • using a finite number of terms of an orthogonal serie expansion of the signal with uncorrelated coefficients

  10. Background • Suppose that we have a column vector of correlated random variables: • Matrix => • Each random variable has its own observation vector through N dependent realization vector: • Note: • Random variables means the data you collected from network

  11. Background (cont.1) • In order to find the characteristicof the above data collected from network • i.e., the most suitable basis: , • where is an eigenvector of the covariance matrix defined as , estimated by • where is a column vector containing the means of

  12. Background (cont.2) • The most suitable basis: • How to find the respectively? • i.e., solve the following linear equation: • Method: SVD (Singular Value Decomposition) • Note: basis change matrix

  13. Background (cont.3) • But is a basis change matrix only when is zero mean • Meanwhile, must replaced by • i.e., • not taking care of it could lead to large errors when using PCA • Rewrite the initial vector of random variables • is the essential property! • i.e., suitable for PCA representation

  14. Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion

  15. Stochastic Process • The extension to PCA Stochastic processes that have temporal as well as spatial correlations • Assume we have a K-vector of zero mean stationary stochastic processes • with a covariance function

  16. Stochastic Process (cont.1) • The multi-dimension Karhunen-Loeve theorem states that one can rewrite this vector as a serie expansion (named KL expansion): • Compared:

  17. Stochastic Process (cont.2) • How to get basis function ? • Solve the linear integral equations: • Compared: • Then we can obtained by

  18. Stochastic Process (cont.3) • But Galerkin methodtransforms the above integral equations to a matrix problem that can be solved by applying the SVD technique • It possible to derive the KL expansion using only a finite number of samples • Time-sampled version => • Finally, we obtain a discrete version of the KL expansion as:

  19. Stochastic Process (cont.4) • Construct a KN × (n − N) observation matrix • With KN eigenvector

  20. Stochastic Process (cont.5) • Use to estimate the all needed spatio-temporal convariance

  21. Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion

  22. Data Set and Metrics • Collect Three weeks of Netflow data • one of the peering links of a medium-sized ISP (SWITCH, AS559) • Recorded in August 2007 • comprise a variety of traffic anomalies • happening in daily operation such as network scans, denial of service attacks, alpha flows, etc

  23. Data Set and Metrics (cont.1) • The computing the detection metrics: • distinguish between incoming and outgoing traffic, as well as UDP and TCP flows • for each of these four categories, compute seven commonly used traffic features: • Byte • Packet • flow counts • Sources and destination IP address entropy • Source and destination IP address counts

  24. Data Set and Metrics (cont.2) • All metrics obtained by aggregating the traffic in 15 minutes intervals resulting 28*96 matrix per measurement day • Anomalies identified by using visual inspection • Resulted in 28 detected anomalous events in UDP and 73 detected in TCP traffic

  25. Data Set and Metrics (cont.3) • Use the vector of metrics containing the first two days of metrics for building the model • Derive a spatio-temporal correlation matrix with the temporal correlation range set to N = 1, .., 5 • Note that setting N = 1 gives the standard PCA approach • apply SVD decomposition to the data, resulting in a basis change matrix

  26. ROC curves • Receiver Operating Characteristics (ROC) curve combining the two parameters in one value captures this essential trade-off • false positive and true positive

  27. ROC curves (cont.1) • Receiver Operating Characteristics (ROC) curve combining the two parameters in one value captures this essential trade-off • false positive and true positive

  28. ROC curves (cont.2)

  29. ROC curves (cont.3) • The comparison of ROC curves shows a considerable improvement of the anomaly detection performance with use of KL expansion with N = 2, 3 consistently for UDP and TCP traffic and thereafter a decrease for N ≥ 4

  30. Effect of non-stationarity • Stationarity issue: • N ≥ 4 the performance decreases • when N increases, the model contains more parameters and becomes more sensitive to the stationarity of the traffic metrics

  31. Agenda • Before Introduction • Objective • A Signal Processing View on PCA • Extension of PCA to Stochastic Processes • Validation • Conclusion

  32. Conclusion • Direct application of the PCA method results in poor performance in terms of ROC curves • The correct framework is not the classical PCA but rather the Karhunen-Loeve expansion • Provide a Galerkin method for developing a predictive model and therefore an important improvement is attained when temporal correlation is considered

  33. Q & A Thank you!

More Related