VC theory, Support vectors and Hedged prediction technology

VC theory, Support vectors and Hedged prediction technology

Overfitting in classification • Assume a family C of classifiers of points infeature space F. A family of classifiers is a map from CF to {0,1} (Negative and positive class). • For each subset X of F and each c in C, c(X) defines a partitioning of X into two classes. • C shatters X if every partitioning of X is accomplished by some c in C • If every point set X of size d is shattered by C, then the VC dimension is at least d. • If a point set of d+1 elements cannot be shattered by C, then the VC-dimension is at most d.

VC-dimension of hyperplanes • The set of points on the line shatters any two points, but not three • The set of lines in the plane shatters any three non-collinear points, but no four points. • Any d+2 points in E^d can be partitioned into two blocks whose convex hulls intersect. • VC-dimension of hyperplanes in E^d is thus d+1.

Why VC-dimension? • Elegant and pedagogical, not very useful. • Bounds future error of classifier, PAC-learning. • Exchangeable distribution of (xi, yi). • For first N points, training error for c isobserved error rate for c. • Goodness of selecting from C a classifier with best performance on training set depends on VC-dimension h:

Why VC-dimension?

Classify with hyperplanes Frank Rosenblatt (1928 – 1971) Pioneering work in classifying byhyperplanes in high-dimensional spaces. Criticized by Minsky-Papert, sincereal classes are not normallylinearly separable. ANN research taken up again in1980:s, with non-linear mappingsto get improved separation.Predecessor to SVM/kernel methods

Find parallel hyperplanes • Separate examples by wide margin hyperplanes (classifications). • Enclose examples between hyperplanes (regression). • If necessary, non-linearly map examples to high-dimensional space where they are better separated.

Find parallel hyperplanes Classification Red true separatingplane. Blue: wide marginseparation in sample Classify by planebetween blue planes

Find parallel hyperplanes Regression Red: true central plane. Blue: narrowest margin enclosing sample New xk : predict ykso (xk, yk) lies on mid-plane (dotted).

From vector to scalar product

Soft Margins

Soft Margins Quadratic programming goes through also with soft margins. Specification of softness constant C is part of most packages. However, no prior rule for setting C is established, and experimentation is necessary for each application. Choice is between narrowing margin, allowing more outliers, and using a more liberal kernel (to be described).

SVM packages • Inputs xi, yi, and KERNEL and SOFTNESS information • Only output is , non-zero coefficients indicatesupport vectors. • Hyperplane obtained by

Kernel Trick

Kernel Trick Example: 2D space (x1,x2). Map to 5D space (c1*x1, c2*x2, c3*x1^2, c4*x1*x2, c5*x2^2). K(x,y)=(xy+1)^2 =2*x1*y1+2*x2*y2+x1^2*y1^2+x2^2*y2^2+2*x1*x2*y1*y2+1 =(x)(y), Where (x)= ((x1,x2)) = (√2x1, √2x2, x1^2, √2x1*x2, x2^2). Hyperplanes in R^5 are mapped back to conic sections in R^2!!

Kernel Trick Gaussian Kernel: K(x,y) = exp(-||x-y||^2/2

VC theory, Support vectors and Hedged prediction technology

VC theory, Support vectors and Hedged prediction technology

Presentation Transcript

SVM Support Vectors Machines

CS252 Graduate Computer Architecture Lecture 11 Vectors, Branch Prediction, Dependence Speculation, and Data Prediction

Learning Theory and Technology

Vectors and Scalars

VECTORS AND SCALARS

11.4 Tangent Vectors and Normal Vectors

Scalars and Vectors

Matrices and vectors

Template for VC support

Vectors and Plotting

Character Vectors And Numeric Vectors

Vectors and Scalars

Vectors and DataFrames

Computational Learning Theory PAC IID VC Dimension SVM

Arrays and Vectors

Satisfiability modulo the Theory of Bit Vectors

vc

Critical Theory and Technology

Technology Services and Support

“Theory for Technology”

Arrays and Vectors

Vectors and Scalars