1 / 32

Anomaly Detection Through a Bayesian SVM

Anomaly Detection Through a Bayesian SVM. Vasilis A. Sotiris AMSC 664 Final Presentation May 6 th 2008 Advisor: Dr. Michael Pecht University of Maryland College Park, MD 20783. Objectives. Develop an algorithm to detect anomalies in electronic systems (large multivariate datasets)

peteri
Download Presentation

Anomaly Detection Through a Bayesian SVM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anomaly Detection Through a Bayesian SVM Vasilis A. Sotiris AMSC 664 Final Presentation May 6th 2008 Advisor: Dr. Michael Pecht University of Maryland College Park, MD 20783

  2. Objectives • Develop an algorithm to detect anomalies in electronic systems (large multivariate datasets) • Perform detection in the absence of negative class data – One – Class Classification • Predict future system performance • Develop application toolbox – CALCEsvm to implement a proof of concept on simulated and real data: • Simulated degradation • Lockheed Martin Data-set AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  3. Motivation • With increasing functional complexity of on-board autonomous systems, there is an increasing demand for system level: • Health assessment, • Fault diagnostics • Failure prognostics • This is of special importance for analyzing intermittent failures, some of the most common failure modes in today’s electronics • There is a need for efficient and reliable prognostics for electronic systems using algorithms that can: • fuse sensor data, • discriminate false alarms from actual failures • correlate faults with relevant system events • and reduce redundant processing elements which are subject to common mode failures AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  4. Algorithm Objectives • Develop a machine learning approach to: • detect anomalies in large multivariate systems • detect anomalies in the absence of reliable failure data • Mitigate false alarms and intermittent faults and failures • Predict future system performance x2 Distribution of fault/failure data Fault Space ? Distribution of training data x1 AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  5. Data Setup • Data is collected at times Ti from a multivariate distribution of random variables x1i…xmi • x’s are the system covariates • Xi’s are independent random vectors • Class  {-1,+1} • Class probability = p(class|X) estimate given X AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  6. [R] Sample Observation x xR xM [M] Data Decomposition (Models) • Extract features from the data by constructing lower dimensional models • X – training data  Rnxm • Singular Value Decomposition (SVD) • With H project data onto [M] and [R] models • k: number of principal components (k=2) • xM: the projection of x onto the model space [M] • xR: projection of x onto the residual space [R] AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  7. F2 x2 x2 D(x) F1 F3 x1 x1 -1 +1 Two Class Support Vector Machines D(x) Solution Mapping F Feature Space Input Space Input Space • Given: nonlinearly separable labeled data xiwith labels yi {+1,-1} • Solve linear optimization problem to find w and b in the feature space • Form a nonlinear decision function my mapping back to the input space • The result is that we can obtain a decision boundary on the given training set and use it to classify new observations AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  8. x2 Negative Class M w D(x)=0 x1 Positive Class Two Class Support Vector Machines • Interested in a function that best separates two classes of data • The margin M=2/||w|| can be maximized by minimizing ||w|| • the learning problem is stated as: • subject to: • The classifier function D(x) is constructed with appropriate w and b (distance origin to D(x)) AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  9. min! Two Class Support Vector Machines • Lagrangian function: • Instead of minimizing LPw.r.t. to w and b, minimize LDw.r.t to α where H is the Hessian Matrix, Hi j = yi yj xiT xj a=[a1,…,an] and p is a unit vector KKT conditions AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  10. Two Class Support Vector Machines • In the nonlinear case use kernel function F centered at each x • Form the same optimization problem where • Argument: the resulting function D(x) is the best classifier for the given training set x2 D(x)=-1 Distribution of fault/failure data D(x)=0 D(x)=+1 Support Vectors Distribution of training data x1 AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  11. Bayesian Interpretation of D(x) • The classification y  {-1.+1} for any x, is equivalent to asking p(Y=+1 | X=x) > ? < p(Y=-1 | X=x) • An optimal classifier yMAP maximizes the conditional probability: • Quadratic optimization problem D(x) • It can be shown that D(x) is the maximum a posteriori (MAP) solution to P(Y=y|X=x)  P(class|data), and therefore the optimal classifier of the given two classes if if AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  12. xi x2 x1 One Class Training • In the absence of negative class data (fault or failure information), a one-class-classification approach is used • X=(X1, X2) ~ bivariate distribution • Likelihood of positive class L=p(X=xi|y=+1) • Class label y  (-1,+1) • Use the margin of this likelihood to construct the negative class L X1 X2 AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  13. 1 # samplesin R total # Samples volume (R) fR(x) = Nonparametric Likelihood Estimation • If the probability that any data point xi falls into the kth bin is r, then the probability of a set of data {x1,…,xm} falling into the kth bin is given by a binomial distribution: • Total sample size: n • Number of samples in kth bin: m • Region defined by bin: R • MLE of r • Density estimate AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  14. 3 1 2 1 2 3 x Estimate likelihood: Gaussian kernel j • The volume of R: • For uniform kernel the number of data m in R: • Kernel function: f • Points xiwhich are close to the sample point x receive higher weight • Resulting density fj(x) is smooth • The bandwidth h is selected according to a nearest neighbor algorithm • Each bin R contains kn data AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  15. P( X11= x1,…,X1n= xn| y1,…yn= +1 ) t Negative Class [N] x1 x1n+2 x1n+1 Estimate of Negative Class • The negative class is estimated based on the likelihood of the positive class (training data) • A threshold t is used to estimate the likelihood ratio of positive to negative class probability for the given training data • A 1D cross-section of the density illustrates the idea of the threshold ratio: Positive Negative AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  16. D(x) as a Sufficient Statistic • D(x) can be used as a sufficient statistic to classify data point x • Argument: since D(x) is the optimal classifier, posterior class probabilities are related to data’s distance to D(x)=0 • These probabilities can be modeled by a logistic distribution, centered at D(x)=0 D(x) AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  17. Posterior Class Probability • The positive posterior class probabilityis given by: • Use D(x) as the sufficient statistic for the classification of xi, by replacing aiby D(xi) • Simplify • Get MLE for parameters A and B logistic distribution where AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  18. Joint Probability Model • Interested in P = P(Y|XM,XR), the joint probability of classification given two models: • XM: model space [M] • XR: residual space [R] • Assume XM, XR independent • After some algebra get the joint positive and negative posterior class probabilities P(+) and P(-): AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  19. Case Studies • Simulated degradation • Lockheed Martin Dataset AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  20. Case Study I –Simulated Degradation • Given: • Simulated correlated data • X1 = gamma, X2 = student t, X3 = beta • Degradation modeling • Period of healthy data • Three successive periods of increasingly larger changes in the mean for each parameter • Expecting a posterior classification probability to reflect these four periods accordingly • First with a probability close to 1 • For the three successive a decreasing trend x1 Observation AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  21. Case Study I Results – Simulated Degradation • Results: a plot of the joint positive classification probability P1 P2 P4 P3 AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  22. Case Study II – Lockheed Martin Data (Known Faulty Periods) • Given: Data set from Lockheed martin • Type of data: server data, unknown parameters • Multivariate, 22 parameters, 2741 observations • Healthy period (T): observations 0 - 800 • Fault periods: observations F1: 912 – 1040, F2: 1092 – 1106, F3: 1593 - 1651 • Training data constructed with sample from period T, with size n=140 • Goal: • Detect onset of known faulty periods without the knowledge of “unhealthy” system characteristics AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  23. Case Study II - Results Period F1 Period F2 Period T 912 800 AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  24. Comparison Metrics of Code Accuracy (LibSVM vs CALCEsvm) • An established and commercially used C++ SVM software (LibSVM) was used to test the accuracy of the code • LibSVM features used: two class SVM • does not include classification probabilities for one class SVM • Input to LibSVM: • Positive class: same training data • Negative class: estimated negative class data from CALCEsvm • Metrics: detection accuracy: • The count of correct classifications based on two categories: • Classification label y • Correct classification probability estimate AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  25. Detection Accuracy LibSVM vs CALCEsvm (Case Study 1 – Degradation Simulation) • Description of test: • Period 1 should be captured with a probability estimate ranging from 80% to 100% positive class • Period 2 equivalently between 70% and 85% • Period 3 between 30% and 70% • Period 4 between 0 and 40% • Based on just the class index, the detection accuracy for both algorithms was almost identical • Based on ranges of probabilities LibSVM performs better in determining the early stages where the system is healthy, but performs worse is detecting degradation in comparison to CALCEsvm P1 P2 P3 P4 P1 P2 P3 P4 AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  26. Detection Accuracy LibSVM vs CALCEsvm (Case Study 2 – Lockheed Data) • Description of test: • The acceptable probability estimate for a correct positive classification should lie between 80 and 100% • Similarly the acceptable probability estimate for a negative classification should not exceed 40% • Based on the class index, both LibSVM and CALCEsvm perfrom almost identically, with small improved performance for CALCEsvm • Based on acceptable probability estimates, • LibSVM: • does a poor job at identifying the healthy state between each successive faulty period • Has a much better performance at detecting the anomalies • CALCEsvm: • Seems to perform overall much better, and identifies correctly both base on index and acceptable probability ranges the faulty and healthy periods in the data AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  27. Summary • For the given data, and on some additional data sets the CALCEsvm algorithm has accomplished the objective • Detected the time events for known anomalies • Identified trends of degradation • Comparison of its performance accuracy to LibSVM is at first hand good! AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  28. Backups AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  29. Dual Form of Lagrangian Function • Dual form of the Lagrangian function, for the optimization problem in LD space through KKT conditions: subject to: AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  30. Karush-Kuhn-Tucker (KKT) Conditions • Optimal solution (w*, b*, α*) exists if and only if KKT conditions are satisfied. In other words, KKT conditions are necessary and sufficient to solve w, b and α in a convex problem AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  31. Posterior Class Probability • Interested in finding the maximum likelihood estimates for parameters A and B • The classification probability of a set of test data X={x1,…,xk}, into c={1,0} is given by a product Bernoulli distribution • Where pi is the probability of classification when c=1 (y=+1) and 1-pi is the probability of classification when c=0 (refers to class y=-1) AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

  32. Posterior Class Probability • Maximize the likelihood of correct classification y for each xi (MLE): • Determine parameters AMLE and BMLE from maximum likelihood equation (above) • Use AMLE and BMLE to compute p(i)MLE in • Where piMLE is the • maximum likelihood estimator of the posterior class probability pi (due to the invariance property of the MLE) • best estimate for the classification probability of each xi • Currently implemented is: AMSC 664 Final Presentation - Anomaly Detection Through a Bayesian SVM

More Related