390 likes | 541 Views
Recognizing Human-Object Interaction in still Image by Modeling the Mutual Context of Objects and Human Poses. Date: 2013/05/27 Instructor : Prof. Wang , Sheng- Jyh Student: Hung, Fei -Fan. Yao, B., and Fei-fei , L. IEEE Transactions on PAMI (2012 ). Outline. Introduction
E N D
Recognizing Human-Object Interaction in still Image by Modeling the Mutual Context of Objects and Human Poses Date: 2013/05/27 Instructor: Prof. Wang, Sheng-Jyh Student: Hung, Fei-Fan Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion
Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion
Why using context in computer vision? • simple image vs. human activities Without context: ~3-4% With mutual context: with context without context
Challenges in Human Pose Estimation • Human pose estimation is challenging • Object detection facilitate human pose estimation Difficult part appearance Self-occlusion Image region looks like a body part
Challenges in Object Detection • Object detection is challenging • human pose estimation facilitate object detection Small, low-resolution, partially occluded Image region similar to detection target
The Goal • To build a mutual context model in Human-Object Interaction(HOI) activities
Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion
Model representation A: • Modeling the mutual context of object and human poses Tennis forehand Croquet shot Volleyball smash O: Tennis racket Croquet mallet Volleyball Tennis ball Body parts , M:num of bounding box H: More than one atomic pose H in A P: body parts,
Model representation activity • : co-occurrence compatibility between A,O,H • : spatial relationship between O,H • : modeling the image evidence with detectors or classifiers Human pose objects A H O1 P2 P1 PL O2
𝝓1: Co-occurrence context • co-occurrence between all A,O,H • : strength of co-occurrence interaction between A H O1 P2 P1 PL O2 : indicator function : total number of atomic poses :total number of objects :total number of activity classes
𝝓2: Spatial context : • Spatial relationship between all O and different H • : weight of • :a sparse binary vector • shows relative location • of w.r.t. A H O1 P2 P1 PL O2
𝝓3: Modeling objects • Model O in the image I using object detection score • For all object O • : vector of score of detecting • : weight of • Between Om and Om’ • : binary feature vector • : weight of and A H O1 P2 P1 PL O2
𝝓4: Modeling human pose • Model atomic pose that H belongs to and likelihood • : Gaussian likelihood function • : vector of score of detecting body part in A H O1 P2 P1 PL O2
𝝓5: Modeling activity • Model HOI activity by training activity classifier • : -dim output of one-versus-all (OVA) discriminative classifier taking image as features • : feature weight of A H O1 P2 P1 PL O2
Model Properties • Spatial context between O and H • Object detectionand human pose estimation facilitate each other • Ignore the objects and body parts that are unreliable • Flexible to extend to large scale datasets and other activities • Jointly model can share all objects and atomic poses
Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion
Model Learning Assign human pose to atomic pose Training detectors and classifiers Estimate parameters by Maximum Likelihood
Obtaining Atomic Poses • Using clustering to obtain atomic poses • Normalize the annotations • Finding missing part • Using the nearest visible neighbor • Obtain a set of atomic poses • Hierarchical clustering with maximum linkage measure : Assign human pose to atomic pose Training detectors and classifiers Estimate parameters by Maximum Likelihood
Training Detectors and Classifiers • : Object detector in • : Human body part detector in • : Overall activity classifier in Assign human pose to atomic pose deformable part model Training detectors and classifiers • Spatial pyramid matching (SPM) • SIFT + 3 level image pyramid Estimate parameters by Maximum Likelihood
Estimating Model Parameters • Estimate by using ML approach with zero-mean Gaussian prior Assign human pose to atomic pose Training detectors and classifiers Estimate parameters by Maximum Likelihood
Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion
Model Inference New image Update human body parts Update object detection results Initialize with learned results Update A and H labels
Initialization New image A: SPM classification O: object detection H: pictorial structure model Initialize with learned results Initialize Activity classification Object detection Human pose estimation
Update model inference • Marginal distribution of human pose: • Using mixture of Gaussian to refine the prior of body part Update human body parts Update object detection results Update A and H labels
Update model inference • Greedy forward search method : • Initial and no object in bounding box • Select • Label box as • update • Stop when <0 Update human body parts O,H O,A,H O,I Update object detection results Update A and H labels
Update model inference • Enumerate possible A and H label • Optimize Update human body parts Update object detection results Update A and H labels
Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion
Experimental Results (Sports Dataset) • Activity classification
Outline • Introduction • Intuition and goal • Model Representation • Model Learning • Obtaining Atomic Poses • Training Detectors and Classifiers • Estimating Model Parameters • Model Inference • Experimental Results • Conclusion
Conclusion • Mutual context can significantly improve the performance in difficult visual recognition problems • The joint model can share all the information • Annotate all the human body parts and objects in training images
Reference • Yao, B., and Fei-fei, L. “Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses,” IEEE Transactions on Pattern Analysis and Machine Intelligence (2012) • B. Yao and L. Fei-Fei, “Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010 • B. Sapp, A. Toshev, and B. Taskar, “Cascade Models for Articulated Pose Estimation,” Proc. European Conf. Computer Vision, 2010. • S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006. • http://en.wikipedia.org/wiki/Hierarchical_clustering