Manually Annotating Multi-view Video Using A 3D Skeleton

Cong Ye1, Steve Maddock1 and Frances Babbage2 1Department of Computer Science 2School of English Literature, Language and Linguistics The University of Sheffield Manually Annotating Multi-view Video Using A 3D Skeleton

Introduction • Video can be used to provide a record of a theatre performance • Complex environment • Automatic labelling of this video is difficult • Manually annotation with semantic labels to support further computer-based study • Aim: Label the three-dimensional (3D) movement of the actors, both in terms of their pose and stage use http://commons.wikimedia.org/wiki/File:C%27etait_mieux_avant.jpg

Method • HumanEva video data (Sigalet al., 2010) • Multiview video of single person movements • Baseline automatic skeleton fitting algorithm • Ground-truth provided by optical motion capture • Labelling • 1 in 10 frames from 393 frames of video data for a jogging motion • Two experiments • One untimed – compare with Sigal et al • One timed – comparison of effect of different starting poses for labelling a single frame

Interface • Multiview video is mapped as texture walls according to the orientation of the cameras • Moveable camera and texture walls • Users manipulate skeleton from any viewpoint • Mouse input to alter joints

Experiment 1 • Untimed • Single video vs. multiview video • Compare labelled skeleton joint centre positions with ground-truth data • Final error is average of all errors in all frames • Compare with baseline algorithm (Sigal et al, 2010)

Experiment 2 • Multiview video • Timed • A: Initial pose – reuse initial default pose to start labelling process for each frame • B: Incremental pose – start with pose from last frame labelled

Conclusions • Our 3D labelling approach is comparable in accuracy to Sigalet al’s (2010) automatic baseline algorithm • Manual labelling is laborious. Efficiency improvements: • Inverse Kinematics • Pose prediction • Alternative interfaces • Sketch-based control of skeleton pose • Use of a 3D depth camera (Kinect) for pose creation • Next Step: Data capture of a real performance

Manually Annotating Multi-view Video Using A 3D Skeleton

Manually Annotating Multi-view Video Using A 3D Skeleton

Presentation Transcript

Overview of Multi-view Video Coding

View-Upload Decoupling: A Redesign of Multi-Channel P2P Video Systems

Annotating a Text

Depth-Level-Adaptive View Synthesis for 3D Video

Evaluation of Manually Created Ground Truth for Multi-view People Localization

Video 6: Annotating the Bibliography

Annotating a Text

Annotating a Poem

Efficient Prediction Structure for Multi-view Video Coding

Annotating a Text

Annotating a Text

SKELETON Front or Anterior View

Automatic Feature Extraction for Multi-view 3D Face Recognition

Multi-view Manhole Detection, Recognition and 3D Localisation

Annotating a text

Efficient Prediction Structure for Multi-view Video Coding

Annotating a Text

Annotating genomes using proteomics data

3d bird view

Annotating Metagenomes Using the SEED

3D Multi-view Reconstruction