360 likes | 543 Views
Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities. Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA. Outline. Introduction Modeling mutual context of object and pose Model learning
E N D
Modeling Mutual Context of Object and Human Posein Human-Object Interaction Activities Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA
Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion
Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion
Introduction • Human pose estimation& Object detection Tennis racket Left-arm Right-arm Torso Right-leg Left-leg
Introduction • Challenging:
Introduction • Mutual context: Human pose estimation& Object detection - facilitate the recognition of each other
Introduction • Mutual context V.S no mutual context
Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion
HOI activity • A:Activity class, ex : tennis server, volleyball smash • O:Object, ex : tennis racket, volleyball • H:Human pose • P:Body parts • f:visual feature • Each A have more than one type of H
The model • : edge of the model : potential function : weight • : Freguencies of co-occurrence between A, O, and H • , , : Spatial relationship among object and body parts, compute by : (position, orientation, scale)
The model • : model the dependence of the object and a body part with their corresponding image evidence
Properties of the model • Co-occurrence context for the activity class, object, and human pose • Multiple types of human pose for each activity • Spatial context between object and body parts
Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion
Model learning • Learning step needs to achieve two goals: structure learning & parameter estimation • Structure learning:discover the hidden human pose and the connectivity among the object, human pose, and body parts • Parameter estimation:for the potential weight to maximize the discrimination between different activities
Structure learning • Objective:Connectivity pattern between the object, the human pose, and the body parts • Method:hill-climbing approach with tabulist
Hill-climbing structure learning • Hill-climbing approach adds or removes edges one at a time until maximum is reached Human pose
Max-margin parameter estimation • Objective:obtain a set of potential weight that maximize the discrimination between different classes of activities Training sample : : is potential function value, disconnected edge set 0 : is the human pose H : is the class label A • If , then : is a weight vector for the r-th sub-class
Multiclass SVM • : is L2 norm • : normalization constant
Analysis of our learning algorithm • Using only one human pose for each HOI class is not enough to characterize well all the image in this class
Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion
Model inference, object detection, and human pose estimation • Given a new testing image, our objective is : - estimate the pose of the human - detect the object that is interacting with the human
Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion
The sports dataset • Cricket - defensive shot (player and cricket bat) • Cricket - bowling (player and cricket ball) • Croquet - shot (player and croquet mallet) • Tennis - forehand (player and tennis racket) • Tennis – serve (player and tennis racket) • Volleyball - smash (player and volleyball) • 30 images for training, 20 for testing
Better object detection Sliding window Pedestrian as context Our method detector
Better pose estimation • Pose estimation still difficult • Multiple pose is better than only one pose
Upper:our method • Lower left:object detection by a scanning window • Lower right:pose estimation by the state-of-art pictorial structure method
Combining object and pose for HOI activity classification • Note Gupta et.al. uses predominantly the background scene context
Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion
Conclusion • Treat object and human pose as the context of each other in different HOI activity classes • Structure learning method - connectivity important patterns between objects and human pose • Further improve : - incorporate useful background scene context to facilitate the recognition of foreground object and activity - deal with more than one object