Efficient Large-Scale Structured Learning

Efficient Large-Scale Structured Learning Caltech UC San Diego UC San Diego Steve Branson Oscar BeijbomSerge Belongie CVPR 2013, Portland, Oregon

Overview • Structured prediction • Learning from larger datasets TINY IMAGES Deformable part models Object detection Mammal Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed Large Datasets Cost sensitive Learning

Overview • Available tools for structured learning not as refined as tools for binary classification • 2 sources of speed improvement • Faster stochastic dual optimization algorithms • Application-specific importance sampling routine Mammal Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed

Summary • Usually, train time = 1-10 times test time • Publicly available software package • Fast algorithms for multiclass SVMs, DPMs • API to adapt to new applications • Support datasets too large to fit in memory • Network interface for online & active learning Mammal Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed

Summary Mammal Cost-sensitive multiclass SVM • 10-50 times faster than SVMstruct • As fast as 1-vs-all binary SVM Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed Deformable part models • 50-1000 faster than • SVMstruct • Mining hard negatives • SGD-PEGASOS

Binaryvs. Structured Structured Dataset Binary Learner Structured Output BINARY OUTPUT BINARY DATASET SVM, Boosting, Logistic Regression, etc. Object Detection, Pose Registration, Attribute Prediction, etc.

Binaryvs. Structured • Pros: binary classifier is application independent • Cons: what is lost in terms of: • Accuracy at convergence? • Computational efficiency? Structured Dataset Binary Learner Structured Output BINARY OUTPUT BINARY DATASET SVM, Boosting, Logistic Regression, etc. Object Detection, Pose Registration, Attribute Prediction, etc.

Binaryvs. Structured Structured Prediction Loss Binary Loss Convex Upper Bound

Binaryvs. Structured Structured Prediction Loss Binary Loss Convex Upper Bound Convex Upper Bound on Structured Prediction Loss

Binaryvs. Structured Application-specific optimization algorithms that: • Converge to lower test error than binary solutions • Lower test error for all amounts of train time

Structured SVM • SVMs w/ structured output • Max-margin MRF[Taskar et al. NIPS’03] [Tsochantaridis et al. ICML’04]

Binary SVM Solvers Quadratic to linear in trainset size

Binary SVM Solvers Quadratic to linear in trainset size Linear to independent in trainset size

Binary SVM Solvers • Faster on multiple passes • Detect convergence • Less sensitive to regularization/learning rate Quadratic to linear in trainset size Linear to independent in trainset size

Structured SVM Solvers Applied to SSVMs [Ratliff et al. AIStats’07] [Shalev-Shwartz et al. JMLR’13]

Our Approach • Use faster stochastic dual algorithms • Incorporate application-specific importance sampling routine • Reduce train times when prediction time T is large • Incorporate tricks people use for binary methods Maximize Dual SSVM objective w.r.t. samples Random Example Importance Sample

Our Approach For t=1… do • Choose random training example (Xi,Yi) • ,…,ImportanceSample() • Approx. maximize Dual SSVM objective w.r.t. i end (Provably fast convergence for simple approx. solver) Maximize Dual SSVM objective w.r.t. samples Random Example Importance Sample

Recent Papers w/ Similar Ideas • Augmenting cutting plane SSVM w/ m-best solutions • Applying stochastic dual methodsto SSVMs A. Guzman-Rivera, P. Kohli, D. Batra. “DivMCuts…” AISTATS’13. S. Lacoste-Julien, et al. “Block-Coordinate Frank-Wolfe…” JMLR’13 .

Applying to New Problems • Define loss function • Implement feature extraction routine • Implement importance sampling routine 3. Importance sampling routine 2. Features 1. Loss function

Applying to New Problems 3. Implement importance sampling routine • Is fast • Favor samples w/ • High loss+ • Uncorrelated features: small

Example: Object Detection 2. Features 3. Importance sampling routine • Add sliding window & loss into dense score map • Greedy NMS 1. Loss function

Example: Deformable Part Models 2. Features 3. Importance sampling routine • Dynamic programming • Modified NMS to return diverse set of poses 1. Loss function sum of part losses

Cost-Sensitive Multiclass SVM cat fly car bus dog ant cat ant fly car bus dog 2. Features e.g., bag-of-words 3. Importance sampling routine • Return all classes • Exact solution using 1 dot product per class 1. Loss function Class confusion cost 4

Results: CUB-200-2011 • Pose mixture model, 312 part/pose detectors • Occlusion/visibility model • Tree-structured DPM w/ exact inference

Results: CUB-200-2011 5794 training examples 400 training examples • ~100X faster than mining hard negatives and SVMstruct • 10-50X faster than stochastic sub-gradient methods • Close to convergence at 1 pass through training set

Results: ImageNet Comparison to other fast linear SVM solvers Comparison to other methods for cost-sensitive SVMs • Faster than LIBLINEAR, PEGASOS • 50X faster than SVMstruct

Conclusion • Orders of magnitude faster than SVMstruct • Publicly available software package • Fast algorithms for multiclass SVMs, DPMs • API to adapt to new applications • Support datasets too large to fit in memory • Network interface for online & active learning Mammal Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed

Thanks!

Efficient Large-Scale Structured Learning

Efficient Large-Scale Structured Learning

Presentation Transcript

Large-scale Machine Learning using DryadLINQ

Efficient Parallel Software for Large-Scale Semidefinite Programs

Large-Scale Machine Learning at Twitter

Efficient Large-Scale Stereo Matching ACCV 2010

Efficient Decomposed Learning for Structured Prediction

Efficient On-Demand Operations in Large-Scale Infrastructures

Large-scale Machine Learning using DryadLINQ

LARGE SCALE

LARGE-SCALE DISTANCE LEARNING INITIATIVES

Efficient Algorithms for Large-Scale GIS Applications

Towards Efficient Simulation of Large Scale P2P Networks

Large-Scale Machine Learning: SVM

Efficient Eigensolvers for Large-scale Electronic Nanostructure Calculations

Large scale

Efficient Algorithms for Large-Scale Topology Discovery

From Tens to Thousands: Efficient Methods for Learning Large-Scale Video Concepts

Efficient Data Collection for Large-Scale Mobile Monitoring

Large-Scale Collaboration for Ill-Structured Problems

Large-Scale Deep Learning With TensorFlow

Efficient Algorithms for Large-Scale GIS Applications

Efficient Large-Scale Model Checking Henri E. Bal