Support Vector Machine and its Applications Mahnaz Hassibi

Support Vector Machine and its Applications Mahnaz Hassibi

Outlines • An overview on Support Vector Machines. • The theory behind SVM learning. • It’s differences from other learning systems. • Important issues in instructing SVM model with high performance. • Its application in feature ranking. • Its application in multi-class classification. • Its application in detection and diagnosis.

SVM • SVM, Introduced by Vapnik (1995). • SVM is a Statistical Learning Technique. • Classical RegressionAccuracy Depends on Well Defined • PDF. • Not Always, There is Enough Knowledge about • the Probability Distribution of the system. • SVM can perform well even in systems with unknown • distribution. • SVM performs well in High Dimensional Feature Space. • It has a quadratic form and guaranties global minima.

Margin Support Vectors Xi. W +b ≥ +1for yi=+1 Xi. W +b ≤ -1 for yi=-1 Margin = 2/|| W|| The Goal is to Maximize the Margin Between the two parallel Hyper-Planes. • Maximize the Margin 2/||W|| • - Minimize ||W||2 With the following constraints:

Lagrangian Formulation: For constraints of the form: ,constraint equations are multiplied by positive Lagrange multipliers and subtracted from the objective function Dual Formulation: This is a quadratic optimization problem In which the solution will be the global solution.

Kernel Function Non-Linear SVM

Kernels: • Kernels map the data into a higher dimension feature space

Hyper Parameter Optimization of SVM Kernels • In nonlinear kernels, the parameters defining a kernel are very important in training • process of SVM. • All these methods need to minimize error with respect to hyper parameters. (RBF width, polynomial degree,…) • Empirical selection of kernel’s parameter: A finite number of values • for each parameter is selected and the SVM is trained for them • and values with minimum estimated error builds the kernel. • Non-linear optimization of parameters. • Gradient descent • Conjugate gradient • Newton method • Qausi Newton approach

Feature Selection and Ranking with SVM • Smaller feature space. • Reduce computation complexities. • Multivariate methods are prone to over-fitting. • Problems arise when the number of variables is more than number of examples. The objective function comprises two terms: -Accuracy of fit ( maximized). - The number of features (minimized). • Feature ranking and elimination using Linear SVM - SVM performs well in high dimensional space - Because of its linear behavior it can be used for variable ranking and selection - Ranking and feature elimination is based on elements of w -It isa backward feature selection. - In each iteration a feature or group of features are eliminated.

SVM-RFE Algorithm for Feature Selection (Non-Linear approach) • The method is based on backward elimination • It starts with all the features and removes one feature at a time • A variable that creates minimum variation of ||w||2 will be removed from the set. • For simplicity and less computational efforts it is assumed that • In this method variable ranking is based on the sensitivity of ||w||2 • with respect to a variable (sensitivity analysis).

n=n-1 Eliminating Least Significant Feature Training SVM Data sets with all features Best Selected Features

Wrapper Method Using SVM • SVM is used as the predictor model. • SVM model estimates the error of prediction for a selected feature subset. • Leave-one-out procedure can be used to calculate the probability of error . • Because of computational cost of wrapper method, it is difficult to conduct it • for high dimensional systems. • Wrapper is slower that filter method.

S1 Feature space sub-set generator S2 Training SVM S3 Data sets with all features Calculate Leave-one-out error Select Best Set

Imbalanced Datasets • Over sampling:To replicate the data in the minority class - Increases the training set size and the training time - The probability of over fitting increases • Down sampling: To reduce the size of majority class - It may eliminate some of potentially important data. • SVM Ensembles: To build K-binary SVM by partitioning majority class

Majority Class Minority class SVM 1 SVM 2 SVM K Aggregation SVM Ensembles

Multi-Class with SVM 1 2 3 4 • One-against All (1-v-r): • One-against One (1-v-1): 1 vs 4 Not 4 • “AND” Gate • Majority vote • Decision Directed Acyclic Graph Not 1 2 3 4 12 3 2 vs 4 1 vs 3 Not 4 Not 1 Not 3 Not 2 34 12 1 vs 2 3 vs 4 23 2 vs 3 4 3 2 1

Using SVM for Detection • Fault or abnormality detection. • - Monitoring Machinery in industry. • - Interpretation in medical field. • Anomaly detection. • - Intrusion detection • - misuse detection • - Security issues (internet, network, wireless)

Case1- When data sets comprise of both “regular” and “irregular” examples. Standard SVM can be deployed for binary training. • Case2- When examples belong to only one class. • The goal is to identify individuals outside the system’s normal state: • - Available data are only sampled from healthy class. • - There is no sample available to identify outliers. • One-class SVM can be used to detect outliers

One-Class SVM • Objective is to find function F( ..), which is positive on subset A • , in feature space, and negative on every thing out side A. • Maximize distance from hyper-plane to Origin. Subset A Hyper Sphere Feature space Hyper plane d Origin is the first member of the negative class

Conclusions and Summary • Support Vector Machine is a linear classifier in feature space. • The learning process is based on SRM. • It performs well in high dimensional feature space. • It performs relatively well even in case of small example set vs. high • feature space dimensionality . • In case of non-linear systems, Kernels represent dot product of • projected vectors in feature space. • It has a good application in feature ranking and selection. • It can be deployed for multi-class systems. • It is a powerful tool for classification, diagnosis and detection problems

Conclusions and Summary Cont. - Pattern recognition - Signal processing - Data mining - Abnormality detection - Interpretation procedure in different areas of medicine - Intrusion detection ( internet, networking, wireless) - Security issues, identifying different type of Attacks - Information retrieval systems, - Enhancement of search engines results, mail filtering…..

Support Vector Machine and its Applications Mahnaz Hassibi