A coarse-to-fine approach for fast deformable object detection

A coarse-to-fine approach for fast deformable object detection • Marco Pedersoli Andrea Vedaldi Jordi Gonzàlez

Object detection [Fischler Elschlager 1973] [Felzenszwalb et al 08] [Zhu et al 10] [Vedaldi Zisserman 2009] 2 • Addressing the computational bottleneck • branch-and-bound [Blaschko Lampert 08, Lehmann et al. 09] • cascades[Viola Jones 01, Vedaldi et al. 09, Felzenszwalb et al 10, Weiss Taskar 10] • jumping windows [Chum 07] • sampling windows [Gualdi et al. 10] • coarse-to-fine [Fleuret German 01, Zhang et al 07, Pedersoli et al. 10] [VOC 2010]

Analysis of the cost of pictorial structures 3

The cost of pictorial structures 4 • cost of inference • one part: L • two parts: L2 • … • P parts: LP • with a tree • using dynamic programming • PL2 • Polynomial, but still too slow in practice • with a tree and quadratic springs • using the distance transform[Felzenszwalb and Huttenlocher 05] • PL • In principle, millions of times faster than dynamic programming! L = number of part locations ~ number of pixels ~ millions

A notable case: deformable part models 5 5 • Deformable part model [Felzenszwalb et al. 08] • locations are discrete • deformations are bounded • number of possible part locations: • L L / δ2 image • cost of placing two parts: C= max. deformation size C PL / δ2 • total geometric cost: • L2 LC, C << L δ

A notable case: deformable part models • With deformable part models • finding the optimal parts configuration is cheap • distance transform speed-up is limited • Standard analysis does not account for filtering: • Typical example • filter size: F = 6 × 6 × 32 • deformation size: C = 6 × 6 • Filtering dominates the finding the optimal part configuration! geometric cost: C PL / δ2 filtering cost: F PL / δ2 • F = size of filter (F + C) PL / δ2 • total cost: image

Accelerating deformable part models • Cascade of deformable parts[Felzenszwalb et al. 2010] • detect parts sequentially • stop when confidence below a threshold • Coarse-to-fine localization[Pedersoli et al. 2010] • multi-resolution search • we extend this idea todeformable part models • deformable part model cost: (F + C) PL / δ2 the key is reducing the filter evaluations

Our contribution:Coarse-to-fine for deformable models

Our model • Multi-resolution deformable parts • each part is a HOG filter • recursive arrangement • resolution doubles • bounded deformation • Score of a configuration S(y) • HOG filter score • parent-child deformation score image

Coarse-to-Fine search 10

Quantify the saving 11 • 1D view (circle = part location) • 2D view # filter evaluations CTF exact L L L 4L L 16L overall speedup 4R exponentially larger saving

Lateral constraints 12 • Geometry in deformable part models is cheap • can afford additional constraints • Lateral constraints • connect sibling parts • Inference • use dynamic programming within each level • open the cycle by conditioning one node

Lateral constraints 13 • Why are lateral constraints useful? • Encourage consistent local deformations • without lateral constraints siblings move independently • no way to make their motion coherent without lateral constraints y and y’ have thesame geometric cost with lateral constraints y can be encouraged

Experiments

Effect of deformation size 15 • INRIA pedestrian dataset • C = deformation size (HOG cells) • AP = average precision (%) • Coarse-to-fine (CTF) inference • Remarks • large C slows down inference but does not improve precision • small C implies already substantial part deformation due tomultiple resolutions

Effect of the lateral constraints 16 • Exact vs Coarse-to-fine (CTF) inference • CTF ~ exact inference scores • CTF ≤ exact • bound is tighter withlateral constraints • Effect is significant on training as well • additional coherence avoids spurious solutions • Examplelearning the head model • Big improvement with coarse-to-fine search • Example: learning the head model • Effect on the inference scores tree tree + lat. exact score CTF score CTF learning and tree CTF learning and tree + lat.

Training speed • Structured latent SVM [Felzenszwalb et al. 08, Vedaldi et al. 09] • deformations of training objects are unknown • estimated as latent variables • Algorithm • Initialization: no negative examples, no deformations • Outer loop • Inner loop • Collect hard negative examples (CTF inference) • Learn the model parameters (SGD) • Estimate the deformations (CTF inference) • The training speed is dominated by the cost of inference! > 10×speedup!

PASCAL VOC 2007 18 • Evaluate on the detection of 20 different object categories • ~5,000 images for training, ~5,000 images for testing • Remarks • very good for aeroplane, bicycle, boat, table, horse, motorbike, sheep • less good for bottle, sofa, tv • Speed-accuracy trade-off • time is drastically reduced • hit on AP is small

Comparison to the cascade of parts • Cascade of parts [Felzenszwalb et al. 10] • test parts sequentially, reject when score falls below threshold • saving at unpromisinglocations (content dependent) • difficult to use in training (thresholds must be learned) • Coarse-to-fine inference • saving is uniform (content independent) • can be used during training 19

Coarse-to-fine cascade of parts • Cascade and CTF use orthogonal principles • easily combined • speed-up multiplies! • Example • apply a threshold at the root • plot AP vs speed-up • In some cases 100 x speed-upcan be achieved CTF CTF CTF cascadescore > τ1? cascadescore > τ2? reject reject

Summary • Analysis of deformable part models • filteringdominates the geometric configuration cost • speed-up requires reducing filtering • Coarse-to-fine search for deformable models • lower resolutions can drive the search at higher resolutions • lateral constraints add coherence to the search • exponential saving independent of the image content • can be used for training too • Practical results • 10x speed-up on VOC and INRIA with minimum AP loss • can be combined with cascade of parts for multiplied speedup • Future • More complex models with rotation, foreshortening, …

Thank you!

A coarse-to-fine approach for fast deformable object detection

A coarse-to-fine approach for fast deformable object detection

Presentation Transcript

Object detection

Coarse-to-Fine Combinatorial Matching for Dense Isometric Shape Correspondence

Sparse representation for coarse and fine object recognition

Tracking Using A Highly Deformable Object Model

Regionlets for Generic Object Detection

Coarse to Fine Grained Sense Disambiguation in Wikipedia

Object Detection

General object detection with deformable part-based models

Generic object detection with deformable part-based models

Fast Collision Detection for Deformable Models using Representative-Triangles

Deformable Object Tracking: A Variational Optimization Framework

Object Detection

Coarse and Fine Grain Programmable Overlay Architectures for FPGAs

Collision Detection for Deformable Objects

A Highly Deformable Object Model for Tracking

Coarse-to-Fine Combinatorial Matching for Dense Isometric Shape Correspondence

Fast Conflict Detection for

Coarse-to-Fine Efficient Viterbi Parsing

DeepID-Net: deformable deep convolutional neural network for generic object detection

Object detection

Fast Collision Detection for Deformable Models using Representative-Triangles