1 / 39

Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan

Recognizing Human-Object Interaction in still Image by Modeling the Mutual Context of Objects and Human Poses. Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan. Yao, B., and Fei-fei , L. IEEE Transactions on PAMI (2012 ). Outline. Introduction

Download Presentation

Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognizing Human-Object Interaction in still Image by Modeling the Mutual Context of Objects and Human Poses Date: 2013/05/27 Instructor: Prof. Wang, Sheng-Jyh Student: Hung, Fei-Fan Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

  2. Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion

  3. Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion

  4. Why using context in computer vision? • simple image vs. human activities Without context: ~3-4% With mutual context: with context without context

  5. Challenges in Human Pose Estimation • Human pose estimation is challenging •  Object detection facilitate human pose estimation Difficult part appearance Self-occlusion Image region looks like a body part

  6. Challenges in Object Detection • Object detection is challenging • human pose estimation facilitate object detection Small, low-resolution, partially occluded Image region similar to detection target

  7. The Goal • To build a mutual context model in Human-Object Interaction(HOI) activities

  8. Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion

  9. Model representation A: • Modeling the mutual context of object and human poses Tennis forehand Croquet shot Volleyball smash O: Tennis racket Croquet mallet Volleyball Tennis ball Body parts , M:num of bounding box H: More than one atomic pose H in A P: body parts,

  10. Model representation activity • : co-occurrence compatibility between A,O,H • : spatial relationship between O,H • : modeling the image evidence with detectors or classifiers Human pose objects A H O1 P2 P1 PL O2

  11. 𝝓1: Co-occurrence context • co-occurrence between all A,O,H • : strength of co-occurrence interaction between A H O1 P2 P1 PL O2 : indicator function : total number of atomic poses :total number of objects :total number of activity classes

  12. 𝝓2: Spatial context : • Spatial relationship between all O and different H • : weight of • :a sparse binary vector • shows relative location • of w.r.t. A H O1 P2 P1 PL O2

  13. 𝝓3: Modeling objects • Model O in the image I using object detection score • For all object O • : vector of score of detecting • : weight of • Between Om and Om’ • : binary feature vector • : weight of and A H O1 P2 P1 PL O2

  14. 𝝓4: Modeling human pose • Model atomic pose that H belongs to and likelihood • : Gaussian likelihood function • : vector of score of detecting body part in A H O1 P2 P1 PL O2

  15. 𝝓5: Modeling activity • Model HOI activity by training activity classifier • : -dim output of one-versus-all (OVA) discriminative classifier taking image as features • : feature weight of A H O1 P2 P1 PL O2

  16. Model Properties • Spatial context between O and H • Object detectionand human pose estimation facilitate each other • Ignore the objects and body parts that are unreliable • Flexible to extend to large scale datasets and other activities • Jointly model can share all objects and atomic poses

  17. Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion

  18. Model Learning Assign human pose to atomic pose Training detectors and classifiers Estimate parameters by Maximum Likelihood

  19. Obtaining Atomic Poses • Using clustering to obtain atomic poses • Normalize the annotations • Finding missing part • Using the nearest visible neighbor • Obtain a set of atomic poses • Hierarchical clustering with maximum linkage measure : Assign human pose to atomic pose Training detectors and classifiers Estimate parameters by Maximum Likelihood

  20. Training Detectors and Classifiers • : Object detector in • : Human body part detector in • : Overall activity classifier in Assign human pose to atomic pose  deformable part model Training detectors and classifiers • Spatial pyramid matching (SPM) • SIFT + 3 level image pyramid Estimate parameters by Maximum Likelihood

  21. Estimating Model Parameters • Estimate by using ML approach with zero-mean Gaussian prior Assign human pose to atomic pose Training detectors and classifiers Estimate parameters by Maximum Likelihood

  22. Learning result

  23. Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion

  24. Model Inference New image Update human body parts Update object detection results Initialize with learned results Update A and H labels

  25. Initialization New image A: SPM classification O: object detection H: pictorial structure model Initialize with learned results Initialize Activity classification Object detection Human pose estimation

  26. Update model inference • Marginal distribution of human pose: • Using mixture of Gaussian to refine the prior of body part Update human body parts Update object detection results Update A and H labels

  27. Update model inference • Greedy forward search method : • Initial and no object in bounding box • Select • Label box as • update • Stop when <0 Update human body parts O,H O,A,H O,I Update object detection results Update A and H labels

  28. Update model inference • Enumerate possible A and H label • Optimize Update human body parts Update object detection results Update A and H labels

  29. Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion

  30. Experimental Results (Sports Dataset)

  31. Experimental Results (Sports Dataset)

  32. Experimental Results (Sports Dataset) • Activity classification

  33. Experimental results (PPMI Dataset)

  34. Experimental results (PPMI Dataset)

  35. Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion

  36. Conclusion • Mutual context can significantly improve the performance in difficult visual recognition problems • The joint model can share all the information • Annotate all the human body parts and objects in training images

  37. Reference • Yao, B., and Fei-fei, L. “Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses,” IEEE Transactions on Pattern Analysis and Machine Intelligence (2012) • B. Yao and L. Fei-Fei, “Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010 • B. Sapp, A. Toshev, and B. Taskar, “Cascade Models for Articulated Pose Estimation,” Proc. European Conf. Computer Vision, 2010. • S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006. • http://en.wikipedia.org/wiki/Hierarchical_clustering

More Related