SVM and Its Related Applications

SVM and Its Related Applications Jung-Ying Wang 5/24/2006

Outline • Basic concept of SVM • SVC formulations • Kernel function • Model selection (tuning SVM hyperparameters) • SVM application: breast cancer diagnosis • Prediction of protein secondary structure • SVM application in protein fold assignment

Introduction • Data classification • training • testing • Learning • supervised learning (classification) • unsupervised learning (clustering)

Basic Concept of SVM • Consider linear separable case • Training data two classes

Decision Function • f(x) > 0  class 1 • f(x) < 0  class 2 • How to find good w and b? • There are many possible (w,b)

a promising technique for data classification statistic learning theorem: maximize the distance between two classes linear separating hyperplane Support Vector Machines

Maximal margin = distance between

Questions? • 1. How to solve w,b? • 2. Linear nonseparable case • 3. Is this (w,b) good? • 4. Multiple-class case

Method to Handle Non-separable Case nonlinear case • mapping the input data into a higher dimensional feature space

Example:

Find a linear separating hyperplane

Questions: • 1. How to choose  ? • 2. Is it really better? Yes. • Some times even in high dimension spaces. Data may still not separable. •  Allow training error

example: • non-linear curves: linear hyperplane in high dimension space (feature space)

SVC formulations (the soft margin hyperplane) Expect: if separable,

If f is convex, x is opt. …(KKT condition)

How to solve an opt. problem with constraints? Using Lagrangian multipliers Given an optimisation problem

What is good in Dual than Primal? • Consider the following primal problem: • (P) # variables: w dimension of (x) ( very big number) , b1,  l • (D) # variables: l • Derive its dual.

Derive the Dual The primal Lagrangian for the problem is : The corresponding dual is found by differentiating with respect to w, , and b.

Resubstituting the relations obtained into the primal to obtain the following adaptation of the dual objective function: Let then Hence, maximizing the above objective over is equivalent to maximizing

Primal and dual problem have the same KKT conditions • Primal: # variables very large (shortcoming) • Dual: # of variable = l • High dim. Inner product • Reduce its computational time • For special  question can be efficiently calculated.

Kernel function

Model selection (Tuning SVM hyperparameters) • Cross validation: can avoid overfitting • Ex: 10 fold cross-validation, l data separated to 10 groups. Each time 9 groups as training data, 1group as test data. • LOO (leave-one-out): • cross validation with l groups, each time (l-1) data for training, 1 for testing.

Model Selection • The commonly used method of the model selection is grid method

Model Selection of SVMs Using GA Approach • Peng-Wei Chen, Jung-Ying Wang and Hahn-Ming Lee; 2004 IJCNN International Joint Conference on Neural Networks, 26 - 29 July 2004. • Abstract— A new automatic search methodology for model selection of support vector machines, based on the GA-based tuning algorithm, is proposed to search for the adequate hyperameters of SVMs.

Model Selection of SVMs Using GA Approach Procedure: GA-based Model Selection Algorithm Begin Read in dataset; Initialize hyperparameters; While (not termination condition) do Train SVMs; Estimate general error; Create hyperparameters by tuning algorithm; End Output the best hyperparameters; End

Experiment Setup • The initial population is selected at random and the chromosome consists of one string of bits with fixed length 20. • Each bit can have the value 0 or 1. • The first 10 bits encode the integer value of C, and the rest 10 bits encode the decimal value of σ. • Suggestion of population size N = 20 is used • The crossover rate 0.8 and mutation rate = 1/20 = 0.05 is chosen

SVM Application: Breast Cancer Diagnosis Software WEKA

Coding for Weka • @relation breast_training • @attribute a1 real • @attribute a2 real • @attribute a3 real • @attribute a4 real • @attribute a5 real • @attribute a6 real • @attribute a7 real • @attribute a8 real • @attribute a9 real • @attribute class {2,4}

Coding for Weka @data 5 ,1 ,1 ,1 ,2 ,1 ,3 ,1 ,1 ,2 5 ,4 ,4 ,5 ,7 ,10,3 ,2 ,1 ,2 3 ,1 ,1 ,1 ,2 ,2 ,3 ,1 ,1 ,2 6 ,8 ,8 ,1 ,3 ,4 ,3 ,7 ,1 ,2 8 ,10,10,7 ,10,10,7 ,3 ,8 ,4 8 ,10,5 ,3 ,8 ,4 ,4 ,10,3 ,4 10,3 ,5 ,4 ,3 ,7 ,3 ,5 ,3 ,4 6 ,10,10,10,10,10,8 ,10,10,4 1 ,1 ,1 ,1 ,2 ,10,3 ,1 ,1 ,2 2 ,1 ,2 ,1 ,2 ,1 ,3 ,1 ,1 ,2 2 ,1 ,1 ,1 ,2 ,1 ,1 ,1 ,5 ,2

Running Results: using Weka 3.3.6predictor: Support Vector Machines (in Weka called: Sequential Minimal Optimization algorithm Weka SMO result for 400 training data:

Weka SMO result for 283 test data

Software and Model Selection • software: LIBSVM • mapping function: use Radial Basis Function • find the best parameter C and kernel parameter g • use cross validation to do the model selection

LIBSVM Model Selection using Grid Method -c 1000 -g 10 3-fold accuracy= 69.8389 -c 1000 -g 1000 3-fold accuracy= 69.8389 -c 1 -g 0.002 3-fold accuracy= 97.0717 winner -c 1 -g 0.004 3-fold accuracy= 96.9253

Coding for LIBSVM 2 1: 2 2: 3 3: 1 4: 1 5: 5 6: 1 7: 1 8: 1 9: 1 2 1: 3 2: 2 3: 2 4: 3 5: 2 6: 3 7: 3 8: 1 9: 1 4 1:10 2:10 3:10 4: 7 5:10 6:10 7: 8 8: 2 9: 1 2 1: 4 2: 3 3: 3 4: 1 5: 2 6: 1 7: 3 8: 3 9: 1 2 1: 5 2: 1 3: 3 4: 1 5: 2 6: 1 7: 2 8: 1 9: 1 2 1: 3 2: 1 3: 1 4: 1 5: 2 6: 1 7: 1 8: 1 9: 1 4 1: 9 2:10 3:10 4:10 5:10 6:10 7:10 8:10 9: 1 2 1: 5 2: 3 3: 6 4: 1 5: 2 6: 1 7: 1 8: 1 9: 1 4 1: 8 2: 7 3: 8 4: 2 5: 4 6: 2 7: 5 8:10 9: 1

Summary

Multi-class SVM • one-against-all method • k SVM models (k: the number of classes) • ith SVM trained with all examples in the ith class as positive, and others as negative • one-against-one method • k(k-1)/2 classifiers where each one trains data from two classes

SVM Application in Bioinformatics • Prediction of protein secondary structure • SVM application in protein fold assignment

Introduction to Secondary Structure • The prediction of protein secondary structure is an important step to determine structural properties of proteins. • The secondary structure consists of local folding regularities maintained by hydrogen bonds and is traditionally subdivided into three classes: alpha-helices, beta-sheets, and coil.

The Secondary Structure Prediction Task

b Coding Example:Protein Secondary Structure Prediction • given an amino-acid sequence • predict a secondary-structure state (a, b, coil) for each residue in the sequence • coding: considering a moving window on n (typically 13-21) neighboring residues FGWYALVLAMFFYOYQEKSVMKKGD

Methods • statistical information ( Figureau et al., 2003; Yan et al., 2004); • neural networks (Qian and Sejnowski, 1988; Rost and Sander, 1993;; Pollastri et al., 2002; Cai et al., 2003; Kaur and Raghava, 2004; Wood and Hirst, 2004; Lin et al., 2005); • nearest-neighbor algorithms • hidden Markov modes • support vector machines (Hua and Sun, 2001; Hyunsoo and Haesun, 2003; Ward et al., 2003; Guo et al., 2004).

Milestone • In 1988, using Neural Networks first achieved about 62% accuracy (Qian and Sejnowski, 1988; Holley and Karplus, 1989). • In 1993, using evolutionary information, Neural Network system had improved the prediction accuracy to over 70% (Rost and Sander, 1993). • Recently there have been approaches (e.g. Baldi et al., 1999; Petersen et al., 2000; Pollastr and McLysaght, 2005) using neural networks which achieve even higher accuracy (> 78%).

Benchmark (Data Set Used in Protein Secondary Structure) • Rost and Sander data set (Rost and Sander, 1993) (referred as RS126) • Note that the RS126 data set consists of 25,184 data points in three classes where 47% are coil, 32% are helix, and 21% are strand. • Cuff and Barton data set (Cuff and Barton, 1999) (referred as CB513) • The performance accuracy is verified by a 7-fold cross validation.

Secondary Structure Assignment • According to the DSSP (Dictionary of Secondary Structures of Proteins) algorithm (Kabsch and Sander, 1983), which distinguishes eight secondary structure classes • We converted the eight types into three classes in the following way: H (α-helix), I (π-helix), and G (310-helix) as helix (α), E (extended strand) as β-strand (β), and all others as coil (c). • Different conversion methods influence the prediction accuracy to some extent, as discussed by Cutt and Barton (Cutt and Barton, 1999).

SVM and Its Related Applications

SVM and Its Related Applications

Presentation Transcript

Network Processor and Its Applications

ATCF 5.0 and Related Applications

Nanobiotechnology and its Applications

Social Networks and Related Applications

Kevlar and its Applications

SVM and Its Related Applications

JVSTM and its applications

Nanostructures and its Applications

ChIP-seq and related applications

ELECTROMAGNET AND ITS APPLICATIONS

Replication and Its Applications

MDF and its Applications

Cancer Metabolomics and Its Applications

Entanglement and its Applications

Research and Its Applications

Electrochemistry and Its Applications

Elasticity and its Applications

Outplacement And Its Related Characteristics

Rotary Feeder and its applications

kNN and SVM

CSTNET and Its Applications

Game Theory and its Applications