1 / 38

Latent SVMs for Human Detection with a Locally Affine Deformation Field

Latent SVMs for Human Detection with a Locally Affine Deformation Field. Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1. 1 University of Oxford 2 Oxford Brookes University. Object Detection. Find all objects of interest Enclose them tightly in a bounding box.

tricia
Download Presentation

Latent SVMs for Human Detection with a Locally Affine Deformation Field

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický1 Phil Torr2 Andrew Zisserman1 1 University of Oxford 2 Oxford Brookes University

  2. Object Detection • Find all objects of interest • Enclose them tightly in a bounding box

  3. HOG Detector • Sliding window using learnt HOG template • Post-processing using non-maxima suppression Dalal & Triggs CVPR05

  4. HOG Detector • Sliding window using learnt HOG template • Post-processing using non-maxima suppression Dalal & Triggs CVPR05

  5. HOG Detector • Sliding window using learnt HOG template • Post-processing using non-maxima suppression Dalal & Triggs CVPR05

  6. HOG Detector • Sliding window using learnt HOG template • Post-processing using non-maxima suppression Dalal & Triggs CVPR05

  7. HOG Detector • Sliding window using learnt HOG template • Post-processing using non-maxima suppression Dalal & Triggs CVPR05

  8. HOG Detector • Sliding window using learnt HOG template • Post-processing using non-maxima suppression Dalal & Triggs CVPR05

  9. HOG Detector Classifier response : The weights w* and the bias b* learnt using the Linear SVM as: Dalal & Triggs CVPR05

  10. HOG Detector Does not fit well ! • Sliding window using learnt HOG template • Post-processing using non-maxima suppression Dalal & Triggs CVPR05

  11. Deformable Part-based Model • Allows parts to move relative to the centre • Effectively allows the template to deform • Multiple models based on an aspect ratio Felzenszwalb et al. CVPR08

  12. Deformable Part-based Model • Allows parts to move relative to the centre • Effectively allows the template to deform • Multiple models based on an aspect ratio Felzenszwalb et al. CVPR08

  13. Our approach Cells c displaced by the deformation field d:

  14. Our approach Cells c displaced by the deformation field d: Classifier response : Regularisation takes the form of a pairwise MRF:

  15. Our approach Cells c displaced by the deformation field d: Classifier response : The weights w* and the bias b* learnt using the Latent Linear SVM as:

  16. Comparison with our approach HOG template (no deformation) Part-based model (rigid movable parts) Our model (deformation field)

  17. Our approach Why hasn’t anyone tried it before?

  18. Our approach Why hasn’t anyone tried it before? • Latent models with many latent variables tend to overfit • Inference not feasible for a sliding window

  19. Our approach Why hasn’t anyone tried it before? • Latent models with many latent variables tend to overfit • Inference not feasible for a sliding window • Deformation field used before only for • Classification task (Duchenne et al ICCV11) • Rescoring of detections (Ladický, PhD thesis)

  20. Our approach We restrict the deformation field to be locally affine ( ):

  21. Our approach We restrict the deformation field to be locally affine ( ):

  22. Our approach We restrict the deformation field to be locally affine ( ):

  23. Optimisation Weights / bias (w*, b*) and the deformation fields dk estimated iteratively

  24. Optimisation Weights / bias (w*, b*) and the deformation fields dk estimated iteratively Given the deformation fields the problem is a standard linear SVM:

  25. Optimisation Weights / bias (w*, b*) and the deformation fields dk estimated iteratively Given (w*, b*) the problem is a constrained MRF optimisation: The last can be decomposed as : By defining the optimisation becomes:

  26. Optimisation • The location of the cells in the first row and in the first column fully determine the location of each cell • Any locally affine deformation field can be reached by two moves : • each column i moves by (Δcdix ,Δcdiy) • each row jmoves by (Δrdjx ,Δrdjy)

  27. Optimisation • The location of the cells in the first row and in the first column fully determine the location of each cell • Any locally affine deformation field can be reached by two moves : • each column i moves by (Δcdix ,Δcdiy) • each row jmoves by (Δrdjx ,Δrdjy)

  28. Optimisation • The location of the cells in the first row and in the first column fully determine the location of each cell • Any locally affine deformation field can be reached by two moves : • each column i moves by (Δcdix ,Δcdiy) • each row jmoves by (Δrdjx ,Δrdjy) • Such moves do not alter the local affinity

  29. Optimisation • The location of the cells in the first row and in the first column fully determine the location of each cell • Any locally affine deformation field can be reached by two moves : • each column i moves by (Δcdix ,Δcdiy) • each row jmoves by (Δrdjx ,Δrdjy) • Such moves do not alter the local affinity • Both moves can be solved quickly using dynamic programming

  30. Learning multiple poses / viewpoints

  31. Learning multiple poses / viewpoints • We define a similarity measure between two training samples as : where

  32. Learning multiple poses / viewpoints • We define a similarity measure between two training samples as : where • K-medoid clustering of S matrix clusters the data into multi model

  33. Experiments • Buffy dataset (typically used for pose estimation) • Contains large variety of poses, viewpoints and aspect ratios • Consists of 748 images • Episode s5e3 used for training • Episode s5e4 used for validation • Episodes s5e2, s5e5 and s5e6 used for testing Ferrari et al. CVPR08

  34. Clustering of training samples Each row corresponds to one model (out of 10 models)

  35. Qualitative results

  36. Quantitative results

  37. Conclusion • We propose • Novel inference for locally affine deformation field (LADF) • Object detector using LADF • Clustering using LADF

  38. Thank you Questions?

More Related