Exploiting Duality (Particularly the dual of SVM)

VISUAL GEOMETRY GROUP Exploiting Duality(Particularly the dual of SVM) M. Pawan Kumar

PART I : General duality theory • Basics of Mathematical Optimization • The algebra • The geometry • Examples PART II : Solving the SVM dual • General Decomposition Algorithm • Good Working Set • Implementation Details

Mathematical Optimization min f0(x) Objective function s.t. fi(x) ≤ 0 Inequality constraints hi(x) = 0 Equality constraints x is a feasible point fi(x) ≤ 0, hi(x) = 0 x is a strictlyfeasible point fi(x) < 0, hi(x) = 0 Feasible region - set of all feasible points

Convex Optimization min f0(x) Objective function s.t. fi(x) ≤ 0 Inequality constraints hi(x) = 0 Equality constraints Feasible region is convex Objective function is convex Convex set??? Convex function???

Convex Set Line Segment x1 x2 c x1 + (1 - c) x2 c  [0,1] Endpoints

Convex Set x1 x2 All points on the line segment lie within the set For all line segments with endpoints in the set

Non-Convex Set x1 x2

Examples of Convex Sets x1 x2 Line Segment

Examples of Convex Sets x1 x2 Line

Examples of Convex Sets Hyperplane aTx - b = 0

Examples of Convex Sets Halfspace aTx - b ≤ 0

Examples of Convex Sets t x2 x1 Second-order Cone ||x|| ≤ t

Operations that Preserve Convexity Intersection Polyhedron / Polytope

Operations that Preserve Convexity Intersection

Operations that Preserve Convexity Affine Transformation x  Ax + b

Convex Function f(x) x1 x2 x Blue point always lies above red point

Convex Function f(x) x1 x2 x f( c x1 + (1 - c) x2 ) ≤ c f(x1) + (1 - c) f(x2) Domain of f(.) has to be convex

Convex Function f(x) x1 x2 x f( c x1 + (1 - c) x2 ) ≤ c f(x1) + (1 - c) f(x2) -f(.) is concave

Convex Function Once-differentiable functions f(y) + f(y)T (x - y) ≤ f(x) f(x) (y,f(y)) f(y) + f(y)T (x - y) x Twice-differentiable functions 2f(x) 0

Convex Function and Convex Sets f(x) x Epigraph of a convex function is a convex set

Examples of Convex Functions Linear function aTx p-Norm functions (x1p + x2p + xnp)1/p, p ≥ 1 Quadratic functions xTQx Q 0

Operations that Preserve Convexity Non-negative weighted sum f1(x) f2(x) + w2 + …. w1 x x xTQx + aTx + b Q 0

Operations that Preserve Convexity Pointwise maximum f1(x) f2(x) , max x x Pointwise minimum of concave functions is concave

Convex Optimization min f0(x) Objective function s.t. fi(x) ≤ 0 Inequality constraints hi(x) = 0 Equality constraints Feasible region is convex  Objective function is convex 

Lagrangian min f0(x) s.t. fi(x) ≤ 0 hi(x) = 0 f0(x) L(x,,) + ∑i i fi(x) i ≥ 0 + ∑i i hi(x)

Lagrangian Dual f0(x) L(x,,) + ∑i i fi(x) i ≥ 0 + ∑i i hi(x) g(,) minx L(x,,) x belongs to intersection of domains of f0, fi and hi x D

Lagrangian Dual g(,) = f0(x) minx + ∑i i fi(x) i ≥ 0 + ∑i i hi(x) Pointwise minimum of affine (concave) functions Dual function is concave

Lagrangian Dual p* = min f0(x) ≥ s.t. fi(x) ≤ 0 For all (,) hi(x) = 0 g(,) = f0(x) minx + ∑i i fi(x) i ≥ 0 + ∑i i hi(x)

The Dual Problem The lower bound could be far from p* Best lower bound? Easy to obtain d* = max, f0(x) minx + ∑i i fi(x) i ≥ 0 + ∑i i hi(x) p* - d* ≥ 0 Duality Gap

G u The Geometric Interpretation u v t (fi(x), hi(x), f0(x)) G x D t p*

The Geometric Interpretation (, ,1)T (u, v, t) ≥ g(, ) t G p* d* u g()

The Duality Gap p* = min f0(x) ≥ s.t. fi(x) ≤ 0 hi(x) = 0 d* = f0(x) + ∑i i fi(x) i ≥ 0 + ∑i i hi(x) max, minx

The Duality Gap p* - d* Duality Gap p* - d* ≥ 0 Weak Duality p* - d* = 0 Strong Duality

Strong Duality Problem is convex There exists a strictly feasible point Taken care of by most solvers Slater’s Condition

At Strong Duality f0(x*) = g(*, *) = minx ( f0(x) + ∑i i*fi(x) + ∑ii*hi(x)) ≤ f0(x*) + ∑i i*fi(x*) + ∑ii*hi(x*) ≤ f0(x*) Inequalities hold with equality x* minimizes the Lagrangian at (*, *)

At Strong Duality f0(x*) = g(*, *) = minx ( f0(x) + ∑i i*fi(x) + ∑ii*hi(x)) ≤ f0(x*) + ∑i i*fi(x*) + ∑ii*hi(x*) ≤ f0(x*) Inequalities hold with equality i*fi(x*) = 0

KKT Conditions fi(x*) ≤ 0 hi(x*) = 0 Primal feasible i* ≥ 0 Dual feasible i*fi(x*) = 0 Complementary Slackness f0(x*) + ∑i i*fi(x*) + ∑i i*hi(x*) = 0 Necessary conditions for strong duality

KKT Conditions fi(x*) ≤ 0 hi(x*) = 0 Primal feasible i* ≥ 0 Dual feasible i*fi(x*) = 0 Complementary Slackness f0(x*) + ∑i i*fi(x*) + ∑i i*hi(x*) = 0 Necessary and sufficient for convex problems

Linear Program min cTx s.t. Ax = b x ≥ 0

QCQP min (1/2)xTP0x + q0x + r0 s.t. (1/2)xTPix + qix + ri

Entropy Maximization min ∑i xi log(xi) s.t. Ax ≤ b ∑i xi= 1

2/||w|| The SVM Framework wTx + b = 0 min 1/2 wTw + C  i yi (wTxi + b) ≥ 1 - i i ≥ 0 Points X = {xi} Convex Quadratic Program Labels y= {yi} yi {-1, +1}

The SVM Dual min (1/2) TQ - T1 s.t. Ty = 0 0 ≤  ≤ C1 Qij = yiyjxiTxj = yiyj k(xi,xj)

The SVM Dual min (1/2) TQ - T1 s.t. Ty = 0 0 ≤  ≤ C1 Choose ‘q’ variables. Fix the rest. Best set B? Change unfixed variables, satisfying constraints, to decrease objective function (small problem). Repeat. Minimum ‘q’ ??? Till When ???

KKT Conditions min (1/2) TQ - T1 s.t. Ty = 0 0 ≤  ≤ C1 eq iup ilo g() -1 + Q + eqy - lo + up = 0 ilo i = 0 iup (i - C) = 0 ilo ≥ 0 iup ≥ 0

KKT Conditions -1 + g() + eqy - lo + up = 0 ilo i = 0 iup (i - C) = 0 ilo ≥ 0 iup ≥ 0 For all 0 < i < C -1 + gi() + eqyi = 0 For all i = 0 -1 + gi() + eqyi - ilo = 0 For all i = C -1 + gi() + eqyi + iup = 0

KKT Conditions -1 + g() + eqy - lo + up = 0 ilo i = 0 iup (i - C) = 0 ilo ≥ 0 iup ≥ 0 gi() = yi∑j jyj k(xi,xj) git() = gi(t-1) + yi∑j B (jt - jt-1)yj k(xi,xj) Best set of ‘q’ variables (Working set)

Exploiting Duality (Particularly the dual of SVM)