1 / 34

Poselets : Body Part Detectors Trained Using 3D Human Pose Annotations

Poselets : Body Part Detectors Trained Using 3D Human Pose Annotations. ZUO ZHEN 27 SEP 2011. Outline. Introduction Related work Methods Experiments Conclusion and future work. Introduction.

ardice
Download Presentation

Poselets : Body Part Detectors Trained Using 3D Human Pose Annotations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011

  2. Outline • Introduction • Related work • Methods • Experiments • Conclusion and future work

  3. Introduction The proposed poselet classifiers are directly trained to handle the visual variation associated with a common underlying semantics.

  4. Introduction • What is poselet? A poselet describes a particular part of the human pose under a given viewpoint. It is defined with a set of examples that are close in 3D configuration space. • Two criteria of “good” Poselets • Easy to find the poselet given the input image. (Tightly clustered in appearance space) • Easy to localize the 3D configuration of the person conditioned on the detection of a poselet. (Tightly clustered in configuration space) • Contribution • Propose a new notion of part, a “poselet”, and an algorithm for selecting good poselets. • Develop a novel dataset H3D(Humans in 3D) which is annotated with 3D configuration information.

  5. Related work • Work in the pictorial structure tradition Disadvantage:most natural to construct kinematic simulations of a moving person, while may not correspond to the most salient features for visual recognition. • Work in the appearance based window classification tradition Disadvantage:not suitable for pose extraction or localization of the anatomical body parts or joints. • Work of hybrid approach which have stages of one type followed by a stage of another type Disadvantage:the parts themselves are not jointly optimized with respect to combined appearance and configuration space criteria

  6. Method Left Shoulder Left Hip This paper use keypoints to annotate the joints, eyes, nose, etc. of people to find correspondence at training time

  7. Method(H3D dataset) • H3D dataset: • 2000 human annotations • Images from Flickr with Creative Commons Attributions License4. • Provides annotation of 15 types of regions of a person, and 19 types of keypoint annotations.

  8. Method (H3D dataset) • Why 3D not 2D?

  9. Method (H3D dataset) Left: H3D can generate conditional region probability masks. Right: H3D can generate scatter plots of the 2D screen locations of the right elbow and left ankle given the locations of both shoulders.

  10. Method (Finding Candidates) Define the (asymmetric) distance in configuration space from example s to example r as: Where = [x, y, z] are the normalized 3D coordinates of the i-thkeypoint of the example s. The weight term is a Gaussian with mean at the center of the patch. The term is a penalty based on the visibility mismatch of keypoint i in the two examples.

  11. Method (Generate Poselet Candidates) Example query regions (left column) and the corresponding closest matches in configuration space generated by H3D.

  12. Method (Training Poselet classifiers) • Given a seed patch • Find the closest patch (search by running a scanning window over all positions and scales of all annotations) • Sort them by residual error • Threshold them • Select a small set of poselets that are: Individually effective and complementary • Use them as positive training examples to train a linear SVM with HOG features

  13. Method (For Detection & Localization) The probability of detecting the object O at position x is: Where is the score that a poselet classifier assigns to location x and is the weight of the poselet, and the author use the Max Margin Hough Transform to learn the weight.

  14. Experiments (1) Detecting Human Torsos ROC curve comparing the proposed torso detection performance together with other published detectors on the H3D test set

  15. Experiments • Examples of torso detections using poselets

  16. Experiments (2) Detecting People on PASCAL VOC 2007 Outperform the part-based deformable detector on H3D but get comparable performance on VOC2007.

  17. Experiments (3) Detecting Keypoints Detection rate of some keypoints conditioned on true positive torso detection.

  18. Conclusion & Future Work • Conclusion The authors propose a two-layer classification/ regression model for detecting people and localizing body components. And the 3D annotation guides the search for good parts. • Future work Use H3D more widely.

  19. Birdlets: Subordinate Categorization Using Volumetric Primitivesand Pose-Normalized Appearance

  20. Outline • Introduction • Related work • Methods • Experiments • Conclusion

  21. Introduction • Application background Current research: two extremes of individuals and basic-level categories Few research on subordinate categorization • What is subordinate categorization? Distinguish by the differing properties of parts.

  22. Introduction Overview of the Proposed approach

  23. Introduction • Contribution • A framework for detecting volumetric part models • A pose-normalized appearance model for comparing part appearance • A classification model for aggregating information about part properties

  24. Related work • Image features Disadvantages: view-dependent, pose variation • Part model Disadvantages: high intra-class variability, significant articulation • Hierarchy model Disadvantages: subordinate categories have both subtle and drastic appearance variation • Attribute model Disadvantages: Insufficient to model subtle differences between parts

  25. Method • Why birds? • Exist largest subordinate-level dataset (CUB-200) • Conform with the definition of subordinate-level (share common structure & parts with many subtle part distinctions) • Involving highly variable appearances and articulations (challenging)

  26. Method (PNAD) Post-normalized appearance descriptor (PNAD) • Map points on a unit sphere onto the ellipsoid’s surface for patch sampling • Project patches on ellipsoid surface to original image plane • Extend the projected patches for extracting SIFT descriptor • Concatenate the location and appearance information for forming PAND descriptor

  27. Method (PNAD)

  28. Method (Birdlet) • Volumetric primitive templates • Two parts (head & body) • Two ellipsoids (parameters: location center, 3D orientation, scale) • Alignment (assisted by visible point features: beaktips, eyes, wingtips, feet and tails)

  29. Method (Training & Testing) • Get selection windows for detecting objects and parts in testing image(both positive and negative examples for SVM classifier) • Get birdlets for integrated classification

  30. Method (Integrated Classification) • Stacked Evidence Trees model The Stacked Evidence Tree takes a test feature and finding a set of training features that are similar both in appearance and surface location, and ultimately returning the class label distribution across this similar set

  31. Experiments • Classification Confusion Matrices (a) the PHOW/SVM Baseline (37.12% MAP), (b) the PNAD-RF performance on the top 20% of detections (40.25% MAP), and (c) the PNAD-RF performance on the ground truth part locations (66.58% MAP).

  32. Experiments • Example Volumetric Primitive Detections Top two images: the bird is detected and localized with reasonable accuracy Low two images: false positive detections

  33. Experiments Classification of Volumetric Detections. For the k top ranked detections, this plots the corresponding PNAD-RF classification performance (using mean-average precision)

  34. Conclusion • Conclusion This paper presented an approach for subordinate categorization using a pose-normalized appearance representation founded upon a volumetric part model.

More Related