410 likes | 436 Views
Bilinear Classifiers for Visual Recognition. Computational Vision Lab. University of California Irvine To be presented in NIPS 2009. Hamed Pirsiavash Deva Ramanan Charless Fowlkes. Reshape. (:). n. f. 2. 3. n. n. y. x. 6. 7. 6. 7. 6. 7. x. =. 6. 7. 6. 7. 4. 5. 1.
E N D
Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva Ramanan Charless Fowlkes
Reshape (:) n f 2 3 n n y x 6 7 6 7 . 6 7 x = . 6 7 . 6 7 4 5 1 £ n n n f y x Introduction • Linear model for visual recognition A window on input image Features Feature vector 1
T 2 3 2 3 T 6 7 6 7 0 > w x 6 7 6 7 0 . . 6 7 6 7 > . . 6 7 6 7 . . 6 7 6 7 4 5 4 5 Introduction • Linear classifier • Learn a template • Apply it to all possible windows Template(Model) Feature vector
T T W W Features W W W W = f f s s ( ) d d 2 3 i · n n n n : m n n n n n = f f f f s x y s n ; : : : f n . 6 7 n y . x X 6 7 . = 6 7 4 5 Reshape £ n n n f y x Introduction 2 3 2 3 : : : . 6 7 6 7 h i : : : . £ . 6 7 6 7 . = . . 6 7 6 7 . : : : d £ . n f 4 5 4 5 . d £ £ n n n f s s
Introduction • Motivation for bilinear models • Reduced rank: less number of parameters • Better generalization: reduced over-fitting • Run-time efficiency • Transfer learning • Share a subset of parameters between different but related tasks
Outline • Introduction • Sliding window classifiers • Bilinear model and its motivation • Extension • Related work • Experiments • Pedestrian detection • Human action classification • Conclusion
1 T T ( ) P ( ) L C i 0 1 + ¡ m n w w w m a x y w x = w n n 2 ; n T 0 > w x Sliding window classifiers • Extract some visual features from a spatio-temporal window • e.g., histogram of gradients (HOG) in Dalal and Triggs’ method • Train a linear SVM using annotated positive and negative instances • Detection: evaluate the model on all possible windows in space-scale domain • Use convolution since the model is linear
W Sliding window classifiers Sample template Sample image Features
Bilinear model (Definition) • Visual data are better modeled as matrices/tensors rather than vectors • Why not use the matrix structure
F X t d l M W e a u r e o e T T T T ( ( ) ) T T T W W X X 2 3 0 0 > > 2 3 2 3 n n n w n r r : x n n w x = = f f f T s x y n : : : f 2 3 2 3 : : : : : : n . 6 7 n 6 7 6 7 w y x . x X 6 7 . = . . 6 7 6 7 6 7 6 7 . 6 . 7 ( ) T . . 6 7 6 7 6 7 6 7 r . . = 4 5 . . 6 7 6 7 6 7 6 7 . . 4 5 4 5 6 7 6 7 4 5 4 5 £ n n f s Bilinear model (Definition) Features Reshape
d T T T T ( ( ) ) ( ( ) ) f d f W W W W X W T W W W X i 0 0 · > > w n n x x m n n r n = = f f f f f f s s s ; s Bilinear model (Definition) • Bilinear model 2 3 2 3 : : : . 6 7 6 7 h i : : : . £ . 6 7 6 7 . = . . 6 7 6 7 . : : : d £ . n f 4 5 4 5 . d £ £ n n n f s s
d T T T T T ( ( ( ) ) ) ( ( ( ) ) ) d f f f l d W B W W W X X W T T W W W W W X W X W i i i i 0 0 · > > n n w x x n m e a n r n n r r n a n = = = f f f f f f f f s s s s ; s s Bilinear model (Definition) • Bilinear model 2 3 2 3 : : : . 6 7 6 7 h i : : : . £ . 6 7 6 7 . = . . 6 7 6 7 . : : : d £ . n f 4 5 4 5 . d £ £ n n n f s s
d T T T T T T ( ( ( ( ) ) ) ) ( ( ( ( ) ) ) ) d f f f f l d W B W W W W X X X L W T T T W W W W W W X X W X W i i i i i 0 0 · > > n w n a x x s n m e a n r n e n n a r r r r : n a n = = = = f f f f f f f f s s s s ; s s Bilinear model (Definition) • Bilinear model 2 3 2 3 : : : . 6 7 6 7 h i : : : . £ . 6 7 6 7 . = . . 6 7 6 7 . : : : d £ . n f 4 5 4 5 . d £ £ n n n f s s
1 T T ( ) ( ) P ( ( ) ) L W T W W C T W X i 0 1 + ¡ m n r m a x y r = W n n 2 ; n f g X y n n ; Bilinear model (Learning) • Linear SVM for a given set of training pairs Objective function Regularizer Constraints Parameter
T 1 T T ( ) ( ) P ( ( ) ) W W L W W T W W C T W X i 0 1 + ¡ m n : r m a x y r = = W f s n n 2 ; n f g X y n n ; Bilinear model (Learning) • Linear SVM for a given set of training pairs Objective function Regularizer Constraints Parameter
1 T T T ( ) ( ) P ( ( ) ) L W W T W W W W C T W W X i 0 1 + ¡ m n r m a x y r = f f f f s s n n ; 2 ; s s n Bilinear model (Learning) • For Bilinear formulation • Biconvex so solve by coordinate decent • By fixing one set of parameters, it’s a typical SVM problem (with a change of basis) • Use off-the-shelf SVM solver in the loop
( ) d i · T m n n n W W W f s ; = f s n d f Motivation • Regularization • Similar to PCA, but not orthogonal and learned discriminatively and jointly with the template • Run-time efficiency • convolutions instead of Reduced dimensional Template Subspace
T W W W = f s W f Motivation • Transfer learning • Share the subspace between different problems • e.g human detector and cat detector • Optimize the summation of all objective functions • Learn a good subspace using all data
( ) L W W W ( ( ( ) ) ) L L L W W W W W W W W f x y f f ; ; t x x s y y ; ; ; ; ; Extension • Multi-linear • High-order tensors • instead of just • For 1D feature • Separable filter for (Rank=1) • Spatio-temporal templates
T ( ( ) ) T T W W W r r Related work (Rank restriction) • Bilinear models • Often used in increasing the flexibility; however, we use them to reduce the parameters. • Mostly used in generative models like density estimation and we use in classification • Soft Rank restriction • They used rather than in SVM to regularize on rank • Convex, but not easy to solve • Decrease summation of eigen values instead of the number of non-zero eigen values (rank) • Wolf et al (CVPR’07) • Used a formulation similar to ours with hard rank restriction • Showed results only for soft rank restriction • Used it only for one task (Didn’t consider multi-task learning)
Related work(Transfer learning) • Dates back to at least Caruana’s work (1997) • We got inspired by their work on multi-task learning • Worked on: Back-propagation nets and k-nearest neighbor • Ando and Zhang’s work (2005) • Linear model • All models share a component in low-dimensional subspace (transfer) • Use the same number of parameters
8 8 £ Experiments: Pedestrian detection • Baseline: Dalal and Triggs’ spatio-temporal classifier (ECCV’06) • Linear SVM on features: (84 for each cell) • Histogram of gradient (HOG) • Histogram of optical flow • Made sure that the spatiotemporal is better than the static one by modifying the learning method
d 1 4 6 8 4 5 1 0 £ n n o r = = = f s , , Experiments: Pedestrian detection • Dataset: INRIA motion and INRIA static • 3400 video frame pairs • 3500 static images • Typical values: • Evaluation • Average precision • Initialize with PCA in feature space • Ours is 10 times faster
Experiments: Human action classification 1 • 1 vs all action templates • Voting: • A second SVM on confidence values • Dataset: • UCF Sports Action (CVPR 2008) • They obtained 69.2% • We got 64.8% but • More classes: 12 classes rather than 9 • Smaller dataset: 150 videos rather than 200 • Harder evaluation protocol: 2-fold vs. LOOCV • 87 training examples rather than 199 in their case
UCF action Results Linear (0.518) (Not always feasible) PCA on features (0.444) Bilinear (0.648)
Experiments: Human action classification 2 • Transfer • We used only two examples for each of 12 action classes • Once trained independently • Then trained jointly • Shared the subspace • Adjusted the C parameter for best result
1 2 1 2 W W W W W W m m T W W W f f f s s s = f s Transfer PCA to initialize Train Train Train Train Train Train
2 1 1 2 W W W W W W W m m T f W W W f f f s s s = f s i m P ( ) L W W i m n f W i 1 ; f s = Transfer PCA to initialize Train Train Train Train Train Train Train
Results: Transfer Average classification rate
Results: Transfer (for “walking”) Iteration 1 Refined at Iteration 2
Conclusion • Introduced multi-linear classifiers • Exploit natural matrix/tensor representation of spatio-temporal data • Trained with existing efficient linear solvers • Shared subspace for different problems • A novel form of transfer learning • Got better performance and about 10X speed up in run-time compared to the linear classifier. • Easy to apply to most high dimensional features (instead of dimensionality reduction methods like PCA) • Simple: ~ 20 lines of Matlab code
1 1 T T T T T ( ( ) ) ( ( ) P ) ( P ( ( ) ) ( ) ) L L W W W T T W W W W C W W C T W X T W W X i i 0 1 0 1 + + ¡ ¡ m m n n r r m a x m y a x r y r = = f f f W f s s n n n n ; 2 2 ; ; s s n n f g x y n n ; Bilinear model (Learning details) • Linear SVM for a given set of training pairs • For Bilinear formulation • It is biconvex so solve by coordinate decent
1 1 1 1 ¡ ¡ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ T T T T 1 T T 1 T T W W ( ) ( ) P ( ( ) ) W W W A A W ( X X ) X A W ( W A X ) A A P W W W W ( ( ) ) 2 2 2 2 L W W W W C T W X i L W W W W C T W X 0 1 i 0 1 + ¡ + ¡ f m n m a x y r = = = = = = m n m a x y r f f f f = ~ = f f ~ f s f f s f s f s s n n n n s s f f W , , , , s n n W s s s s n n ; 2 ; ; 2 ; s s n n f s Bilinear model (Learning details) • Each coordinate descent iteration: freeze where then freeze where