10 likes | 169 Views
Human Detection in Videos using Spatio -Temporal Pictorial Structures Amir Roshan Zamir , Afshin Dehghan , Ruben Villegas University of Central Florida. Results Videos Taken From TRECVID MED11 Dataset . 2.2. Enforcing Temporal Consistency by Embedding them into the Detection Process
E N D
Human Detection in Videos using Spatio-Temporal Pictorial StructuresAmir RoshanZamir, AfshinDehghan, Ruben VillegasUniversity of Central Florida • Results • Videos Taken From TRECVID MED11 Dataset. • 2.2.Enforcing Temporal Consistency by Embedding them into the Detection Process • Our Contribution: • Extending Spatial Pictorial Structures to Spatio-Temporal Pictorial Structures. • 1. Problem • Human Detection in Videos: • Making Human Detection in Videos more Accurate. • Possibility of Numerous False Detections. • Applications: • Video surveillance, Human Tracking, Action Recognition, etc. • Human Detection Output with Temporal Consistency and Part Adjustment • Human Detection Output with Temporal Consistency • Input Frame • Human Detection Output Configuration of parts Spatial Deformation Cost Appearance • Human Detection Output with Temporal Consistency and Part Adjustment • Human Detection Output with Temporal Consistency • Input Frame • Human Detection Output Temporal Deformation Cost i • 1.1Our Approach • Using Temporal Information (Transition of Human Parts in Pictorial Structures). • False Detections Should Be Temporally Less Consistent than True Detections. • Human Parts Transition Should Convey Information Which is Ignored in the Frame-By-Frame Scenario. i i Frame Number : 2 3 1 • More Elegant Approach than Post Processing(2.1). • Best Detections Are Determined During the Optimization Process. • Configuration of Parts are Limited to Transitions in Time (Temporal Deformation). • Head Trajectories Comparison • Part’s Trajectories on Video • Our Framework • 2.1. Enforcing Temporal Consistency by Post Processing • Human Detection from Yang and Ramanan [1] Articulated Pose Estimation using Flexible Mixtures of Parts. • 3. Learning Transition of Parts • Human Body Parts Have a Set Range of Motion that Can Be Approximated. • These Movements(Trajectories) Can Be Learned by Training on the Annotated Dataset. • We will Use the HumanEva Dataset [2] for Our Training. • Head Trajectory After Temporal Adjustment • Head Trajectory Before Temporal Adjustment • Part’s Trajectories After Temporal Adjustment • Part’s Trajectories Before Temporal Adjustment • Conclusion • Temporal Part Deformation Improves Human Detection in Videos Based on Our Experiments. • Less False Detections and More True Detections. • Part Trajectories are More Precise. • Annotated Parts in each frame • Temporally Consistent Detection with Part Adjustment • Temporally Consistent Detection without Part Adjustment • Immediate Output from Human Detection • Input Frame • Next Steps • Applying the Temporal Deformation Cost in the Optimization Process. • Train a Model that Considers Usual Human Part Transitions in Time. • References • [1] Y. Yang, D. Ramanan. “Articulated Pose Estimation using Flexible Mixture of Parts” Computer Vision and Pattern Recognition (CVPR) Colorado Springs, Colorado, June 2011 • [2] L. Sigal, A. O. Balan and M. J. Black. HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulate Human Motion, International Journal of Computer Vision (IJCV), Volume 87, Number 1-2, pp. 4-27, March, 2010. • Parts Trajectories of Annotations • This Transitions will Be Learned and Embedded in Our Optimization Process to Restrict the Detections.