Trading Convexity for Scalability

Marco A. Alvarez CS7680 Department of Computer Science Utah State University Trading Convexity for Scalability

Paper • Collobert, R., Sinz, F., Weston, J., and Bottou, L. 2006. Trading convexity for scalability. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, June 25 - 29, 2006). ICML '06, vol. 148. ACM Press, New York, NY, 201-208.

Introduction • Previously in Machine Learning • Non-convex cost function in MLP • Difficult to optimize • Work efficiently • SVM are defined by a convex function • Easier optimization (algorithms) • Unique solution (we can write theorems) • Goal of the paper • Sometimes non-convexity has benefits • Faster == training and testing (less support vectors) • Non-convex SVMs (faster and sparser) • Fast transductive SVMs

From SVM • Decision function • Primal formulation • Minimize ||w|| so that margin is maximized • w is a combination of a small number of data (sparsity) • Decision boundary is determined by the support vectors • Dual formulation s.t.

SVM problem • Number of support vectors increases linearly with L • Cost attributed to one example (x,y): • From:

Ramp Loss Function • Given: Outliers Non SV

Concave-Convex Procedure (CCCP) • Given a cost function: • Decompose into a convex part and a concave part • Is guaranteed to decrease at each iteration

Using the Ramp Loss

CCCP for Ramp Loss

Results

Speedup

Time and Number of SVs

Transductive SVMs

Loss Function • Cost to be minimized:

Balancing Constraint • Necessary for TSVMs

Results

Training Time

Quadratic Fit

Trading Convexity for Scalability