Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion

Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

About today’s discussion… • Last time: discussed convex opt. • Today: Will apply what we learned to 4 pattern analysis problems given in book: • (1) Smallest enclosing hypersphere (one-class SVM) • (2) SVM classification • (3) Support vector regression (SVR) • (4) On-line classification and regression

About today’s discussion… • This time for the most part: • Describe problems • Derive solutions ourselves on the board! • Apply convex opt. knowledge to solve • Mostly board work today

Recall: KKT Conditions • What we will use: • Key to remember ch. 7: • Complementary slackness -> sparse dual rep. • Convexity -> efficient global solution

Novelty Detection: Hypersphere • Train data – learn support • Capture with hypersphere • Outside – ‘novel’ or ‘abnormal’ or ‘anomaly’ • Smaller sphere = more fine-tuned novelty detection

1st: Smallest Enclosing Hypersphere • Given: • Find center, c, of smallest hypersphere containing S

S.E.H. Optimization Problem • O.P.: • Let’s solve using Lagrangian and KKT and discuss

Cheat

S.E.H.: Solution • H(x) = 1 if x>=0, 0 o.w. Dual=primal @

Theorem on bound of false positive

Hypersphere that only contains some data – soft hypersphere • Balance missing some points and reducing radius • Robustness –single point could throw off • Introduce slack variables (repeated approach) • 0 within sphere, squared distance outside

Hypersphere optimization problem • Now with trade off between radius and training point error: • Let’s derive solution again

Cheat

Soft hypersphere solution

Linear Kernel Example

Similar theorem

Remarks • If data lies in subspace of feature space: • Hypersphere overestimates support in perpendicular dir. • Can use kernel PCA (next week discussion) • If normalized data (k(x,x)=1) • Corresponds to separating hyperplane, from origin

Maximal Margin Classifier • Data and linear classifier • Hinge loss, gamma margin • Linear separable if

Margin Example

Typical formulation • Typical formulation fixes gamma (functional margbin) to 1 and allows w to vary since scaling doesn’t affect decision, margin proportional to 1/norm(w) to vary. • Here we fix w norm, and vary functional margin gamma

Hard Margin SVM • Arrive at optimization problem • Let’s solve

Cheat

Solution • Recall:

Example with Gaussian kernel

Soft Margin Classifier • Non-separable - Introduce slack variables as before • Trade off with 1-norm of error vector

Solve Soft Margin SVM • Let’s solve it!

Soft Margin Solution

Soft Margin Example

Support Vector Regression • Similar idea to classification, except turned inside-out • Epsilon-insensitive loss instead of hinge • Ridge Regression: Squared-error loss

Support Vector Regression • But, encourage sparseness • Need inequalities • epsilon-insensitive loss

Epsilon-insensitive • Defines band around function for 0-loss

SVR (linear epsilon) • Opt. problem: • Let’s solve again

SVR Dual and Solution • Dual problem

Online • So far batch: processed all at once • Many tasks require data processed one at a time from start • Learner: • Makes prediction • Gets feedback (correct value) • Updates • Conservative only updates if non-zero loss

Simple On-line Alg.: Perceptron • Threshold linear function • At t+1 weight updated if error • Dual update rule: • If

Algorithm Pseudocode

Novikoff Theorem • Convergence bound for hard-margin case • If training points contained in ball of radius R around origin • w* hard margin svm with no bias and geometric margin gamma • Initial weight: • Number of updates bounded by:

Proof • From 2 inequalities: • Putting these together we have: • Which leads to bound:

Kernel Adatron • Simple modification to perceptron, models hard margin SVM with 0 threshold alpha stops changing, either alpha positive and right term 0, or right term negative

Kernel Adatron – Soft Margin • 1-norm soft margin version • Add upper bound to the values of alpha (C) • 2-norm soft margin version • Add constant to diagonal of kernel matrix • SMO • To allow a variable threshold, updates must be made on pair of examples at once • Results in SMO • Rate of convergence both algs. sensitive to order • Good heuristics, e.g. choose points most violate conditions first

On-line regression • Also works for regression case • Basic gradient ascent with additional constraints

Online SVR

Questions • Questions, Comments?

Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion

Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion

Presentation Transcript

Chapter 7 (part 2)

Convex Optimization: Part 1 of Chapter 7 Discussion

CSE203B Convex Optimization

Chapter 7: Advanced SQL Part 2

Bi-Parametric Convex Quadratic Optimization

Chapter 7-2 Sensitivity Analysis

Query Optimization, part 2

Chapter 7 – Part 2

Modeling Convex Optimization Problems

Chapter 7 : Risk Management – part 2

Chapter 7, part 2

Chapter 7- Designs For Using Information (2 nd part)

Chapter 7, Part 2

Chapter 2-OPTIMIZATION

Chapter 7- Part 2

Chapter 7 Optimization

Unit 7 – Chapter 9, Part 2

Simulated annealing for convex optimization

Chapter 7 (part 2)

Simulated annealing for convex optimization

Chapter 7: Optimization

Chapter 2-OPTIMIZATION