70 likes | 214 Views
Cong Ye 1 , Steve Maddock 1 and Frances Babbage 2 1 Department of Computer Science 2 School of English Literature, Language and Linguistics The University of Sheffield. Manually Annotating Multi-view Video Using A 3D Skeleton. Introduction.
E N D
Cong Ye1, Steve Maddock1 and Frances Babbage2 1Department of Computer Science 2School of English Literature, Language and Linguistics The University of Sheffield Manually Annotating Multi-view Video Using A 3D Skeleton
Introduction • Video can be used to provide a record of a theatre performance • Complex environment • Automatic labelling of this video is difficult • Manually annotation with semantic labels to support further computer-based study • Aim: Label the three-dimensional (3D) movement of the actors, both in terms of their pose and stage use http://commons.wikimedia.org/wiki/File:C%27etait_mieux_avant.jpg
Method • HumanEva video data (Sigalet al., 2010) • Multiview video of single person movements • Baseline automatic skeleton fitting algorithm • Ground-truth provided by optical motion capture • Labelling • 1 in 10 frames from 393 frames of video data for a jogging motion • Two experiments • One untimed – compare with Sigal et al • One timed – comparison of effect of different starting poses for labelling a single frame
Interface • Multiview video is mapped as texture walls according to the orientation of the cameras • Moveable camera and texture walls • Users manipulate skeleton from any viewpoint • Mouse input to alter joints
Experiment 1 • Untimed • Single video vs. multiview video • Compare labelled skeleton joint centre positions with ground-truth data • Final error is average of all errors in all frames • Compare with baseline algorithm (Sigal et al, 2010)
Experiment 2 • Multiview video • Timed • A: Initial pose – reuse initial default pose to start labelling process for each frame • B: Incremental pose – start with pose from last frame labelled
Conclusions • Our 3D labelling approach is comparable in accuracy to Sigalet al’s (2010) automatic baseline algorithm • Manual labelling is laborious. Efficiency improvements: • Inverse Kinematics • Pose prediction • Alternative interfaces • Sketch-based control of skeleton pose • Use of a 3D depth camera (Kinect) for pose creation • Next Step: Data capture of a real performance