290 likes | 497 Views
Efficient Large-Scale Structured Learning. Caltech. UC San Diego. UC San Diego. Steve Branson Oscar Beijbom Serge Belongie. CVPR 2013, Portland, Oregon. Overview. Structured prediction Learning from larger datasets. TINY IMAGES. Deformable part models. Object detection.
E N D
Efficient Large-Scale Structured Learning Caltech UC San Diego UC San Diego Steve Branson Oscar BeijbomSerge Belongie CVPR 2013, Portland, Oregon
Overview • Structured prediction • Learning from larger datasets TINY IMAGES Deformable part models Object detection Mammal Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed Large Datasets Cost sensitive Learning
Overview • Available tools for structured learning not as refined as tools for binary classification • 2 sources of speed improvement • Faster stochastic dual optimization algorithms • Application-specific importance sampling routine Mammal Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed
Summary • Usually, train time = 1-10 times test time • Publicly available software package • Fast algorithms for multiclass SVMs, DPMs • API to adapt to new applications • Support datasets too large to fit in memory • Network interface for online & active learning Mammal Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed
Summary Mammal Cost-sensitive multiclass SVM • 10-50 times faster than SVMstruct • As fast as 1-vs-all binary SVM Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed Deformable part models • 50-1000 faster than • SVMstruct • Mining hard negatives • SGD-PEGASOS
Binaryvs. Structured Structured Dataset Binary Learner Structured Output BINARY OUTPUT BINARY DATASET SVM, Boosting, Logistic Regression, etc. Object Detection, Pose Registration, Attribute Prediction, etc.
Binaryvs. Structured • Pros: binary classifier is application independent • Cons: what is lost in terms of: • Accuracy at convergence? • Computational efficiency? Structured Dataset Binary Learner Structured Output BINARY OUTPUT BINARY DATASET SVM, Boosting, Logistic Regression, etc. Object Detection, Pose Registration, Attribute Prediction, etc.
Binaryvs. Structured Structured Prediction Loss Binary Loss Convex Upper Bound
Binaryvs. Structured Structured Prediction Loss Binary Loss Convex Upper Bound Convex Upper Bound on Structured Prediction Loss
Binaryvs. Structured Application-specific optimization algorithms that: • Converge to lower test error than binary solutions • Lower test error for all amounts of train time
Binaryvs. Structured Application-specific optimization algorithms that: • Converge to lower test error than binary solutions • Lower test error for all amounts of train time
Structured SVM • SVMs w/ structured output • Max-margin MRF[Taskar et al. NIPS’03] [Tsochantaridis et al. ICML’04]
Binary SVM Solvers Quadratic to linear in trainset size
Binary SVM Solvers Quadratic to linear in trainset size Linear to independent in trainset size
Binary SVM Solvers • Faster on multiple passes • Detect convergence • Less sensitive to regularization/learning rate Quadratic to linear in trainset size Linear to independent in trainset size
Structured SVM Solvers Applied to SSVMs [Ratliff et al. AIStats’07] [Shalev-Shwartz et al. JMLR’13]
Our Approach • Use faster stochastic dual algorithms • Incorporate application-specific importance sampling routine • Reduce train times when prediction time T is large • Incorporate tricks people use for binary methods Maximize Dual SSVM objective w.r.t. samples Random Example Importance Sample
Our Approach For t=1… do • Choose random training example (Xi,Yi) • ,…,ImportanceSample() • Approx. maximize Dual SSVM objective w.r.t. i end (Provably fast convergence for simple approx. solver) Maximize Dual SSVM objective w.r.t. samples Random Example Importance Sample
Recent Papers w/ Similar Ideas • Augmenting cutting plane SSVM w/ m-best solutions • Applying stochastic dual methodsto SSVMs A. Guzman-Rivera, P. Kohli, D. Batra. “DivMCuts…” AISTATS’13. S. Lacoste-Julien, et al. “Block-Coordinate Frank-Wolfe…” JMLR’13 .
Applying to New Problems • Define loss function • Implement feature extraction routine • Implement importance sampling routine 3. Importance sampling routine 2. Features 1. Loss function
Applying to New Problems 3. Implement importance sampling routine • Is fast • Favor samples w/ • High loss+ • Uncorrelated features: small
Example: Object Detection 2. Features 3. Importance sampling routine • Add sliding window & loss into dense score map • Greedy NMS 1. Loss function
Example: Deformable Part Models 2. Features 3. Importance sampling routine • Dynamic programming • Modified NMS to return diverse set of poses 1. Loss function sum of part losses
Cost-Sensitive Multiclass SVM cat fly car bus dog ant cat ant fly car bus dog 2. Features e.g., bag-of-words 3. Importance sampling routine • Return all classes • Exact solution using 1 dot product per class 1. Loss function Class confusion cost 4
Results: CUB-200-2011 • Pose mixture model, 312 part/pose detectors • Occlusion/visibility model • Tree-structured DPM w/ exact inference
Results: CUB-200-2011 5794 training examples 400 training examples • ~100X faster than mining hard negatives and SVMstruct • 10-50X faster than stochastic sub-gradient methods • Close to convergence at 1 pass through training set
Results: ImageNet Comparison to other fast linear SVM solvers Comparison to other methods for cost-sensitive SVMs • Faster than LIBLINEAR, PEGASOS • 50X faster than SVMstruct
Conclusion • Orders of magnitude faster than SVMstruct • Publicly available software package • Fast algorithms for multiclass SVMs, DPMs • API to adapt to new applications • Support datasets too large to fit in memory • Network interface for online & active learning Mammal Primate Hoofed Mammal Orangutan Gorilla Odd-toed Even-toed