210 likes | 554 Views
Support Vector Machine and its Applications Mahnaz Hassibi. Outlines. An overview on Support Vector Machines. The theory behind SVM learning. It’s differences from other learning systems. Important issues in instructing SVM model with high performance. Its application in feature ranking.
E N D
Support Vector Machine and its Applications Mahnaz Hassibi
Outlines • An overview on Support Vector Machines. • The theory behind SVM learning. • It’s differences from other learning systems. • Important issues in instructing SVM model with high performance. • Its application in feature ranking. • Its application in multi-class classification. • Its application in detection and diagnosis.
SVM • SVM, Introduced by Vapnik (1995). • SVM is a Statistical Learning Technique. • Classical RegressionAccuracy Depends on Well Defined • PDF. • Not Always, There is Enough Knowledge about • the Probability Distribution of the system. • SVM can perform well even in systems with unknown • distribution. • SVM performs well in High Dimensional Feature Space. • It has a quadratic form and guaranties global minima.
Margin Support Vectors Xi. W +b ≥ +1for yi=+1 Xi. W +b ≤ -1 for yi=-1 Margin = 2/|| W|| The Goal is to Maximize the Margin Between the two parallel Hyper-Planes. • Maximize the Margin 2/||W|| • - Minimize ||W||2 With the following constraints:
Lagrangian Formulation: For constraints of the form: ,constraint equations are multiplied by positive Lagrange multipliers and subtracted from the objective function Dual Formulation: This is a quadratic optimization problem In which the solution will be the global solution.
Kernel Function Non-Linear SVM
Kernels: • Kernels map the data into a higher dimension feature space
Hyper Parameter Optimization of SVM Kernels • In nonlinear kernels, the parameters defining a kernel are very important in training • process of SVM. • All these methods need to minimize error with respect to hyper parameters. (RBF width, polynomial degree,…) • Empirical selection of kernel’s parameter: A finite number of values • for each parameter is selected and the SVM is trained for them • and values with minimum estimated error builds the kernel. • Non-linear optimization of parameters. • Gradient descent • Conjugate gradient • Newton method • Qausi Newton approach
Feature Selection and Ranking with SVM • Smaller feature space. • Reduce computation complexities. • Multivariate methods are prone to over-fitting. • Problems arise when the number of variables is more than number of examples. The objective function comprises two terms: -Accuracy of fit ( maximized). - The number of features (minimized). • Feature ranking and elimination using Linear SVM - SVM performs well in high dimensional space - Because of its linear behavior it can be used for variable ranking and selection - Ranking and feature elimination is based on elements of w -It isa backward feature selection. - In each iteration a feature or group of features are eliminated.
SVM-RFE Algorithm for Feature Selection (Non-Linear approach) • The method is based on backward elimination • It starts with all the features and removes one feature at a time • A variable that creates minimum variation of ||w||2 will be removed from the set. • For simplicity and less computational efforts it is assumed that • In this method variable ranking is based on the sensitivity of ||w||2 • with respect to a variable (sensitivity analysis).
n=n-1 Eliminating Least Significant Feature Training SVM Data sets with all features Best Selected Features
Wrapper Method Using SVM • SVM is used as the predictor model. • SVM model estimates the error of prediction for a selected feature subset. • Leave-one-out procedure can be used to calculate the probability of error . • Because of computational cost of wrapper method, it is difficult to conduct it • for high dimensional systems. • Wrapper is slower that filter method.
S1 Feature space sub-set generator S2 Training SVM S3 Data sets with all features Calculate Leave-one-out error Select Best Set
Imbalanced Datasets • Over sampling:To replicate the data in the minority class - Increases the training set size and the training time - The probability of over fitting increases • Down sampling: To reduce the size of majority class - It may eliminate some of potentially important data. • SVM Ensembles: To build K-binary SVM by partitioning majority class
Majority Class Minority class SVM 1 SVM 2 SVM K Aggregation SVM Ensembles
Multi-Class with SVM 1 2 3 4 • One-against All (1-v-r): • One-against One (1-v-1): 1 vs 4 Not 4 • “AND” Gate • Majority vote • Decision Directed Acyclic Graph Not 1 2 3 4 12 3 2 vs 4 1 vs 3 Not 4 Not 1 Not 3 Not 2 34 12 1 vs 2 3 vs 4 23 2 vs 3 4 3 2 1
Using SVM for Detection • Fault or abnormality detection. • - Monitoring Machinery in industry. • - Interpretation in medical field. • Anomaly detection. • - Intrusion detection • - misuse detection • - Security issues (internet, network, wireless)
Case1- When data sets comprise of both “regular” and “irregular” examples. Standard SVM can be deployed for binary training. • Case2- When examples belong to only one class. • The goal is to identify individuals outside the system’s normal state: • - Available data are only sampled from healthy class. • - There is no sample available to identify outliers. • One-class SVM can be used to detect outliers
One-Class SVM • Objective is to find function F( ..), which is positive on subset A • , in feature space, and negative on every thing out side A. • Maximize distance from hyper-plane to Origin. Subset A Hyper Sphere Feature space Hyper plane d Origin is the first member of the negative class
Conclusions and Summary • Support Vector Machine is a linear classifier in feature space. • The learning process is based on SRM. • It performs well in high dimensional feature space. • It performs relatively well even in case of small example set vs. high • feature space dimensionality . • In case of non-linear systems, Kernels represent dot product of • projected vectors in feature space. • It has a good application in feature ranking and selection. • It can be deployed for multi-class systems. • It is a powerful tool for classification, diagnosis and detection problems
Conclusions and Summary Cont. - Pattern recognition - Signal processing - Data mining - Abnormality detection - Interpretation procedure in different areas of medicine - Intrusion detection ( internet, networking, wireless) - Security issues, identifying different type of Attacks - Information retrieval systems, - Enhancement of search engines results, mail filtering…..