320 likes | 591 Views
Support Vector Machine (SVM). Based on Nello Cristianini presentation http:// www.support-vector.net/tutorial.html. Basic Idea. Use Linear Learning Machine (LLM). Overcome the linearity constraints: Map to non-linearly to higher dimension. Select between hyperplans Use margin as a test
E N D
Support Vector Machine (SVM) Based on Nello Cristianini presentation http://www.support-vector.net/tutorial.html
Basic Idea • Use Linear Learning Machine (LLM). • Overcome the linearity constraints: • Map to non-linearly to higher dimension. • Select between hyperplans • Use margin as a test • Generalization depends on the margin.
General idea Transformed Problem Original Problem
Kernel Based Algorithms • Two separate learning functions • Learning Algorithm: • in an imbedded space • Kernel function • performs the embedding
Basic Example: Kernel Perceptron • Hyperplane classification • f(x)=<w,x>+b = <w’,x’> • h(x)= sign(f(x)) • Perceptron Algorithm: • Sample: (xi,ti), ti{-1,+1} • If ti <wk,xi> < 0 THEN /* Error*/ • wk+1 = wk + ti xi • k=k+1
Recall • Margin of hyperplan w • Mistake bound
Observations • Solution is a linear combination of inputs • w = ai ti xi • where ai >0 • Mistake driven • Only points on which we make mistake influence! • Support vectors • The non-zero ai
Dual representation • Rewrite basic function: • f(x) = <w,x> +b = ai ti <xi , x> +b • w = ai ti xi • Change update rule: • IF tj ( ai ti <xi , xj> +b) < 0 • THEN aj = aj+1 • Observation: • Data only inside inner product!
Limitation of Perceptron • Only linear separations • Only converges for linearly separable data • Only defined on vectorial data
Transformed Problem Original Problem The idea of a Kernel • Embed data to a different space • Possibly higher dimension • Linearly separable in the new space.
Kernel Mapping • Need only to compute inner-products. • Mapping: M(x) • Kernel: K(x,y) = < M(x) , M(y)> • Dimensionality of M(x): unimportant! • Need only to compute K(x,y) • Using it in the embedded space: • Replace <x,y> by K(x,y)
Example x=(x1 , x2); z=(z1 ,z2); K(x,z) = (<x,z>)2
Polynomial Kernel Transformed Problem Original Problem
Example of Basic Kernels • Polynomial • K(x,z)= (<x,z> )d • Gaussian • K(x,z)= exp{- ||x-z||2 /2}
Kernel: Closure Properties • K(x,z) = K1(x,z) + c • K(x,z) = c*K1(x,z) • K(x,z) = K1(x,z) * K2(x,z) • K(x,z) = K1(x,z) + K2(x,z) • Create new kernels using basic ones!
Support Vector Machines • Linear Learning Machines (LLM) • Use dual representation • Work in the kernel induced feature space • f(x) = ai ti K(xi , x) +b • Which hyperplane to select
Generalization of SVM • PAC theory: • error = O( Vcdim / m) • Problem: Vcdim >> m • No preference between consistent hyperplanes
Margin based bounds • H: Basic Hypothesis class • conv(H): finite convex combinations of H • D: Distribution over X and {+1,-1} • S: Sample of size m over D
Margin based bounds • THEOREM: for every f in conv(H)
Maximal Margin Classifier • Maximizes the margin • Minimizes the overfitting due to margin selection. • Increases margin • Rather than reduce dimensionality
Margins • Geometric Margin: mini ti f(xi)/ ||w|| Functional margin: mini ti f(xi) f(x)
Main trick in SVM • Insist on functional marginal at least 1. • Support vectors have margin 1. • Geometric margin = 1 / || w|| • Proof.
SVM criteria • Find a hyperplane (w,b) • That Maximizes: || w ||2 = <w,w> • Subject to: • for all i • ti (<w,xi>+b) 1
Quadratic Programming • Quadratic goal function. • Linear constraint. • Unique Maximum. • Polynomial time algorithms.
Dual Problem • Maximize • W(a) = ai - 1/2 i,j ai ti aj tj K(xi , xj) +b • Subject to • i ai ti =0 • ai 0
Applications: Text • Classify a text to given categories • Sports, news, business, science, … • Feature space • Bag of words • Huge sparse vector!
Applications: Text • Practicalities: • Mw(x) = tfw log (idfw) / K • ftw= text frequency of w • idfw= inverse document frequency • idfw = # documents / # documents with w • Inner product <M(x),M(z)> • sparse vectors • SVM: finds a hyperplan in “document space”