1 / 7

Manually Annotating Multi-view Video Using A 3D Skeleton

Cong Ye 1 , Steve Maddock 1 and Frances Babbage 2 1 Department of Computer Science 2 School of English Literature, Language and Linguistics The University of Sheffield. Manually Annotating Multi-view Video Using A 3D Skeleton. Introduction.

Download Presentation

Manually Annotating Multi-view Video Using A 3D Skeleton

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cong Ye1, Steve Maddock1 and Frances Babbage2 1Department of Computer Science 2School of English Literature, Language and Linguistics The University of Sheffield Manually Annotating Multi-view Video Using A 3D Skeleton

  2. Introduction • Video can be used to provide a record of a theatre performance • Complex environment • Automatic labelling of this video is difficult • Manually annotation with semantic labels to support further computer-based study • Aim: Label the three-dimensional (3D) movement of the actors, both in terms of their pose and stage use http://commons.wikimedia.org/wiki/File:C%27etait_mieux_avant.jpg

  3. Method • HumanEva video data (Sigalet al., 2010) • Multiview video of single person movements • Baseline automatic skeleton fitting algorithm • Ground-truth provided by optical motion capture • Labelling • 1 in 10 frames from 393 frames of video data for a jogging motion • Two experiments • One untimed – compare with Sigal et al • One timed – comparison of effect of different starting poses for labelling a single frame

  4. Interface • Multiview video is mapped as texture walls according to the orientation of the cameras • Moveable camera and texture walls • Users manipulate skeleton from any viewpoint • Mouse input to alter joints

  5. Experiment 1 • Untimed • Single video vs. multiview video • Compare labelled skeleton joint centre positions with ground-truth data • Final error is average of all errors in all frames • Compare with baseline algorithm (Sigal et al, 2010)

  6. Experiment 2 • Multiview video • Timed • A: Initial pose – reuse initial default pose to start labelling process for each frame • B: Incremental pose – start with pose from last frame labelled

  7. Conclusions • Our 3D labelling approach is comparable in accuracy to Sigalet al’s (2010) automatic baseline algorithm • Manual labelling is laborious. Efficiency improvements: • Inverse Kinematics • Pose prediction • Alternative interfaces • Sketch-based control of skeleton pose • Use of a 3D depth camera (Kinect) for pose creation • Next Step: Data capture of a real performance

More Related