360 likes | 532 Views
Max-Margin Additive Classifiers for Detection. Subhransu Maji & Alexander Berg University of California at Berkeley Columbia University ICCV 2009, Kyoto, Japan. Accuracy vs. Evaluation Time for SVM Classifiers. Non-linear Kernel. Evaluation time. Linear Kernel. Accuracy.
E N D
Max-Margin Additive Classifiers for Detection Subhransu Maji & Alexander Berg University of California at Berkeley Columbia University ICCV 2009, Kyoto, Japan
Accuracy vs. Evaluation Timefor SVM Classifiers Non-linear Kernel Evaluation time Linear Kernel Accuracy
Accuracy vs. Evaluation Timefor SVM Classifiers Non-linear Kernel Evaluation time Our CVPR 08 Linear Kernel Accuracy
Accuracy vs. Evaluation Timefor SVM Classifiers Non-linear Kernel Additive Kernel Evaluation time Our CVPR 08 Linear Kernel Accuracy
Accuracy vs. Evaluation Timefor SVM Classifiers Additive Kernel Non-linear Kernel Additive Kernel Evaluation time Our CVPR 08 Linear Kernel Accuracy
Accuracy vs. Evaluation Timefor SVM Classifiers Additive Kernel Non-linear Kernel Evaluation time Our CVPR 08 Linear Kernel Additive Kernel Accuracy Made it possible to use SVMs with additive kernels for detection.
Additive Classifiers • Much work already uses them! • SVMs with additive kernels are additive classifiers • Histogram based kernels • Histogram intersection, chi-squared kernel • Pyramid Match Kernel (Grauman & Darell, ICCV’05) • Spatial Pyramid Match Kernel (Lazebnik et.al., CVPR’06) • ….
Accuracy vs. Training Timefor SVM Classifiers Non-linear Training time Linear Kernel Accuracy
Accuracy vs. Training Timefor SVM Classifiers Non-linear Training time <=1990s Linear Accuracy
Accuracy vs. Training Timefor SVM Classifiers Non-linear Training time Today Linear Accuracy • Eg. Cutting Plane, Stoc. Gradient Descend, Dual Coordinate Descend
Accuracy vs. Training Timefor SVM Classifiers Non-linear Additive Training time Our CVPR 08 Linear Accuracy
Accuracy vs. Training Timefor SVM Classifiers Non-linear Additive Training time Our CVPR 08 ✗ Linear Accuracy
Accuracy vs. Training Timefor SVM Classifiers Non-linear Additive Training time This Paper Linear Accuracy
Accuracy vs. Training Timefor SVM Classifiers Non-linear Training time This Paper Linear Additive Accuracy Makes it possible to train additive classifiers very fast.
Summary • Additive classifiers are widely used and can provide better accuracy than linear • Our CVPR 08: SVMs with additive kernels are additive classifiers and can be evaluated in O(#dim) -- same as linear. • This work: additive classifiers can be trained directly as efficiently (up to a small constant) as the best approaches for training linear classifiers. An example
Support Vector Machines Embedded Space Input Space • Kernel Function • Inner Product in the embedded space • Can learn non-linear boundaries in input space Classification Function Kernel Trick
Embeddings… • These embeddings can be high dimensional (even infinite) • Our approach is based on embeddings that approximate kernels. • We’d like this to be as accurate as possible • We are going to use fast linear classifier training algorithms on the so sparseness is important.
Key Idea: Embedding an Additive Kernel • Additive Kernels are easy to embed, just embed each dimension independently • Linear Embedding for min Kernel for integers • For non integers can approximate by quantizing
Issues: Embedding Error • Quantization leads to large errors • Better encoding x y
Issues: Sparsity • Represent with sparse values
Linear vs. Encoded SVMs • Linear SVM objective (solve with LIBLINEAR): • Encoded SVM objective (not practical):
Linear vs. Encoded SVMs • Linear SVM objective (solve with LIBLINEAR): • Encoded SVM modified (custom solver): Encourages smooth functions Closely approximates min kernel SVM Custom solver : PWLSGD (see paper)
Linear vs. Encoded SVMs • Linear SVM objective (solve with LIBLINEAR): • Encoded SVM objective (solve with LIBLINEAR) :
Additive Classifier Choices Regularization Encoding
Additive Classifier Choices Accuracy Increases Regularization Encoding Evaluation times are similar
Additive Classifier Choices Accuracy Increases Regularization Encoding Accuracy Increases Evaluation times are similar
Additive Classifier Choices Accuracy Increases Regularization Encoding Accuracy Increases Standard solver Eg. LIBSVM Few lines of code + standard solver Eg. LIBLINEAR
Additive Classifier Choices Accuracy Increases Regularization Encoding Accuracy Increases Custom solver
Additive Classifier Choices Accuracy Increases Regularization Encoding Accuracy Increases Classifier Notations
Experiments • “Small” Scale: Caltech 101 (Fei-Fei, et.al.) • “Medium” Scale: DC Pedestrians (Munder & Gavrila) • “Large” Scale : INRIA Pedestrians (Dalal & Triggs)
Experiment : DC Pedestrians (3.18s, 89.25%) (1.86s, 88.80%) (363s, 89.05%) (2.98s, 85.71%) 100x faster training time ~ linear SVM accuracy ~ kernel SVM (1.89s, 72.98%) 20,000 features, 656 dimensional 100 bins for encoding 6-fold cross validation
Experiment : Caltech 101 (291s, 55.35%) (2687s, 56.49%) (102s, 54.8%) (90s, 51.64%) 10x faster Small loss in accuracy (41s, 46.15%) 30 training examples per category 100 bins for encoding Pyramid HOG + Spatial Pyramid Match Kernel
Experiment : INRIA Pedestrians (140 mins, 0.95) (76s, 0.94) (27s, 0.88) 300x faster training time ~ linear SVM accuracy ~ kernel SVMtrains the detector in < 2 mins (122s, 0.85) (20s, 0.82) SPHOG: 39,000 features, 2268 dimensional 100 bins for encoding Cross Validation Plots
Experiment : INRIA Pedestrians 300x faster training time ~ linear SVM accuracy ~ kernel SVMtrains the detector in < 2 mins SPHOG: 39,000 features, 2268 dimensional 100 bins for encoding Cross Validation Plots
Take Home Messages • Additive models are practical for large scale data • Can be trained discriminatively: • Poor man’s version : encode + Linear SVM Solver • Middle man’s version : encode + Custom Solver • Rich man’s version : Min Kernel SVM • Embedding only Approximates kernels, leads to small loss in accuracy but up to 100x speedup in training time • Everyone should use: see code on our websites • Fast IKSVM from CVPR’08, Encoded SVMs, etc