1 / 52

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?. Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray Oxford Brookes University Andrew Zisserman Oxford Arasanathan Thayananthan, Bjorn Stenger, Roberto Cipolla Cambridge. Algebra.

kemal
Download Presentation

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simultaneous Segmentation and 3D Pose Estimation of HumansorDetection + Segmentation = Tracking? Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray Oxford Brookes University Andrew Zisserman Oxford Arasanathan Thayananthan, Bjorn Stenger, Roberto Cipolla Cambridge

  2. Algebra • Unifying Conjecture • Tracking = Detection = Recognition • Detection = Segmentation • therefore • Tracking (pose estimation)=Segmentation?

  3. Objective Aim to get a clean segmentation of a human… Image Segmentation Pose Estimate??

  4. Developments • ICCV 2003, pose estimation as fast nearest neighbour plus dynamics (inspired by Gavrilla and Toyoma & Blake) • BMVC 2004, parts based chamfer to make space of templates more flexible (a la pictorial structures of Huttenlocher) • CVPR 2005, ObjCut combining segmentation and detection. • ECCV 2006, interpolation of poses using the MVRVM (Agarwal and Triggs) • ECCV 2006 combination of pose estimation and segmentation using graph cuts.

  5. Tracking as Detection (Stenger et al ICCV 2003) Detection has become very efficient, e.g. real-time face detection, pedestrian detection Example: Pedestrian detection [Gavrila & Philomin, 1999]: • Find match among large number of exemplar templates Issues: • Number of templates needed • Efficient search • Robust cost function

  6. Cascaded Classifiers

  7. 1280x1024 image, 11 subsampling levels, 80s Average number of filter per patch : 6.7 First filter : 19.8 % patches remaining

  8. 1280x1024 image, 11 subsampling levels, 80s Average number of filter per patch : 6.7 Filter 10 : 0.74 % patches remaining

  9. 1280x1024 image, 11 subsampling levels, 80s Average number of filter per patch : 6.7 Filter 20 : 0.06 % patches remaining

  10. 1280x1024 image, 11 subsampling levels, 80s Average number of filter per patch : 6.7 Filter 30 : 0.01 % patches remaining

  11. 1280x1024 image, 11 subsampling levels, 80s Average number of filter per patch : 6.7 Filter 70 : 0.007 % patches remaining

  12. Hierarchical Detection • Efficient template matching (Huttenlocher & Olson, Gavrila) • Idea: When matching similar objects, speed-up by forming template hierarchy found by clustering • Match prototypes first, sub-tree only if cost below threshold

  13. Trees • These search trees are the same as used for efficient nearest neighbour. • Add dynamic model and • Detection = Tracking = Recognition

  14. Evaluation at Multiple Resolutions • One traversal of tree per time step

  15. Evaluation at Multiple Resolutions • Tree: 9000 templates of hand pointing, rigid

  16. Templates at Level 1

  17. Templates at Level 2

  18. Templates at Level 3

  19. Comparison with Particle Filters • This method is grid based, • No need to render the model on line • Like efficient search • Can always use this as a proposal process for a particle filter if need be.

  20. Interpolation, MVRVM, ECCV 2006 Code available.

  21. Energy being Optimized, link to graph cuts • Combination of • Edge term (quickly evaluated using chamfer) • Interior term (quickly evaluated using integral images) • Note that possible templates are a bit like cuts that we put down, one could think of this whole process as a constrained search for the best graph cut.

  22. Likelihood : Edges 3D Model Input Image Edge Detection Projected Contours Robust Edge Matching

  23. Chamfer Matching Input image Canny edges Distance transform Projected Contours

  24. Likelihood : Colour 3D Model Input Image Projected Silhouette Skin Colour Model Template Matching

  25. Template Matching = • Template Matching = constrained search for a cut/segmentation? • Detection = Segmentation?

  26. Objective Aim to get a clean segmentation of a human… Image Segmentation Pose Estimate??

  27. MRF for Interactive Image Segmentation, Boykov and Jolly [ICCV 2001] EnergyMRF = Unary likelihood Contrast Term Uniform Prior (Potts Model) Maximum-a-posteriori (MAP) solution x*= arg min E(x) x Data (D) Unary likelihood Pair-wise Terms MAP Solution

  28. However… • This energy formulation rarely provides realistic (target-like) results.

  29. Shape-Priors and Segmentation • Combine object detection with segmentation • Obj-Cut, Kumar et al., CVPR ’05 • Zhao and Davis, ICCV ’05 • Obj-Cut • Shape-Prior: Layered Pictorial Structure (LPS) • Learned exemplars for parts of the LPS model • Obtained impressive results = + Layer 1 Layer 2 LPS model

  30. LPS for Detection • Learning • Learnt automatically using a set of examples • Detection Tree of chamfers to detect parts, assemble with pictorial structure and belief propogation.

  31. Solve via Integer Programming • SDP formulation (Torr 2001, AI stats) • SOCP formulation (Kumar, Torr & Zisserman this conference) • LBP (Huttenlocher, many)

  32. Obj-Cut Image Likelihood Ratio (Colour) Likelihood + Distance from  Distance from  ShapePrior

  33. Integrating Shape-Prior in MRFs Pairwise potential Pixels Labels Unary potential Prior Potts model MRF for segmentation

  34. Integrating Shape-Prior in MRFs Pairwise potential Pixels Labels Unary potential Prior Potts model Pose parameters  Pose-specific MRF

  35. Do we really need accurate models? Cow Instance Layer 2 Transformations Θ1 P(Θ1) = 0.9 Layer 1

  36. Do we really need accurate models? • Segmentation boundary can be extracted from edges • Rough 3D Shape-prior enough for region disambiguation

  37. Energy of the Pose-specific MRF Energy to be minimized Pairwise potential Unary term Potts model Shape prior But what should be the value of θ?

  38. The different terms of the MRF Likelihood of being foreground given a foreground histogram Likelihood of being foreground given all the terms Shape prior model Grimson-Stauffer segmentation Shape prior (distance transform) Resulting Graph-Cuts segmentation Original image

  39. Can segment multiple views simultaneously

  40. Solve via gradient descent • Comparable to level set methods • Could use other approaches (e.g. Objcut) • Need a graph cut per function evaluation

  41. Formulating the Pose Inference Problem

  42. However… • Kohli and Torr showed how dynamic graph cuts can be used to efficiently find MAP solutions for MRFs that change minimally from one time instant to the next: Dynamic Graph Cuts (ICCV05). But… … to compute the MAP of E(x) w.r.t the pose, it means that the unary terms will be changed at EACH iteration and the maxflow recomputed!

  43. solve SA differences between A and B PB* Simpler problem A and B similar SB Dynamic Graph Cuts PA cheaper operation PB computationally expensive operation

  44. Dynamic Image Segmentation Image Segmentation Obtained Flows in n-edges

  45. Maximum flow MAP solution First segmentation problem Ga difference between Ga and Gb residual graph (Gr) second segmentation problem updated residual graph G` Gb Our Algorithm

  46. Dynamic Graph Cut vs Active Cuts • Our method flow recycling • AC cut recycling • Both methods: Tree recycling

  47. ExperimentalAnalysis Running time of the dynamic algorithm MRF consisting of 2x105 latent variables connected in a 4-neighborhood.

  48. Segmentation Comparison Grimson-Stauffer Bathia04 Our method

  49. Face Detector and ObjCut

  50. Segmentation

More Related