1 / 32

Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities

Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities. Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA. Outline. Introduction Modeling mutual context of object and pose Model learning

layne
Download Presentation

Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Mutual Context of Object and Human Posein Human-Object Interaction Activities Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA

  2. Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion

  3. Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion

  4. Introduction • Human pose estimation& Object detection Tennis racket Left-arm Right-arm Torso Right-leg Left-leg

  5. Introduction • Challenging:

  6. Introduction • Mutual context: Human pose estimation& Object detection - facilitate the recognition of each other

  7. Introduction • Mutual context V.S no mutual context

  8. Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion

  9. HOI activity

  10. HOI activity • A:Activity class, ex : tennis server, volleyball smash • O:Object, ex : tennis racket, volleyball • H:Human pose • P:Body parts • f:visual feature • Each A have more than one type of H

  11. The model • : edge of the model : potential function : weight • : Freguencies of co-occurrence between A, O, and H • , , : Spatial relationship among object and body parts, compute by : (position, orientation, scale)

  12. The model • : model the dependence of the object and a body part with their corresponding image evidence

  13. Properties of the model • Co-occurrence context for the activity class, object, and human pose • Multiple types of human pose for each activity • Spatial context between object and body parts

  14. Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion

  15. Model learning • Learning step needs to achieve two goals: structure learning & parameter estimation • Structure learning:discover the hidden human pose and the connectivity among the object, human pose, and body parts • Parameter estimation:for the potential weight to maximize the discrimination between different activities

  16. Structure learning • Objective:Connectivity pattern between the object, the human pose, and the body parts • Method:hill-climbing approach with tabulist

  17. Hill-climbing structure learning • Hill-climbing approach adds or removes edges one at a time until maximum is reached Human pose

  18. Max-margin parameter estimation • Objective:obtain a set of potential weight that maximize the discrimination between different classes of activities Training sample : : is potential function value, disconnected edge set 0 : is the human pose H : is the class label A • If , then : is a weight vector for the r-th sub-class

  19. Multiclass SVM • : is L2 norm • : normalization constant

  20. Analysis of our learning algorithm • Using only one human pose for each HOI class is not enough to characterize well all the image in this class

  21. Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion

  22. Model inference, object detection, and human pose estimation • Given a new testing image, our objective is : - estimate the pose of the human - detect the object that is interacting with the human

  23. Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion

  24. The sports dataset • Cricket - defensive shot (player and cricket bat) • Cricket - bowling (player and cricket ball) • Croquet - shot (player and croquet mallet) • Tennis - forehand (player and tennis racket) • Tennis – serve (player and tennis racket) • Volleyball - smash (player and volleyball) • 30 images for training, 20 for testing

  25. Better object detection

  26. Better object detection Sliding window Pedestrian as context Our method detector

  27. Better pose estimation • Pose estimation still difficult • Multiple pose is better than only one pose

  28. Upper:our method • Lower left:object detection by a scanning window • Lower right:pose estimation by the state-of-art pictorial structure method

  29. Combining object and pose for HOI activity classification • Note Gupta et.al. uses predominantly the background scene context

  30. Outline • Introduction • Modeling mutual context of object and pose • Model learning • Model inference, object detection, and human pose estimation • Experiments • Conclusion

  31. Conclusion • Treat object and human pose as the context of each other in different HOI activity classes • Structure learning method - connectivity important patterns between objects and human pose • Further improve : - incorporate useful background scene context to facilitate the recognition of foreground object and activity - deal with more than one object

  32. Thanks!!!

More Related