260 likes | 397 Views
Recognizing Action at a Distance. A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley. 3-pixel man Blob tracking vast surveillance literature. 300-pixel man Limb tracking e.g. Yacoob & Black, Rao & Shah, etc. Looking at People. Near field. Far field. Medium-field Recognition.
E N D
Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley
3-pixel man Blob tracking vast surveillance literature 300-pixel man Limb tracking e.g. Yacoob & Black, Rao & Shah, etc. Looking at People Near field Far field
Medium-field Recognition The 30-Pixel Man
Appearance vs. Motion Jackson Pollock Number 21 (detail)
Goals • Recognize human actions at a distance • Low resolution, noisy data • Moving camera, occlusions • Wide range of actions (including non-periodic)
Our Approach • Motion-based approach • Non-parametric; use large amount of data • Classify a novel motion by finding the most similar motion from the training set • Related Work • Periodicity analysis • Polana & Nelson; Seitz & Dyer; Bobick et al; Cutler & Davis; Collins et al. • Model-free • Temporal Templates [Bobick & Davis] • Orientation histograms [Freeman et al; Zelnik & Irani] • Using MoCap data [Zhao & Nevatia, Ramanan & Forsyth]
Gathering action data • Tracking • Simple correlation-based tracker • User-initialized
Figure-centric Representation • Stabilized spatio-temporal volume • No translation information • All motion caused by person’s limbs • Good news: indifferent to camera motion • Bad news: hard! • Good test to see if actions, not just translation, are being captured
Remembrance of Things Past run jog swing walk right walk left motion analysis database • “Explain” novel motion sequence by matching to previously seen video clips • For each frame, match based on some temporal extent input sequence Challenge: how to compare motions?
How to describe motion? • Appearance • Not preserved across different clothing • Gradients (spatial, temporal) • same (e.g. contrast reversal) • Edges/Silhouettes • Too unreliable • Optical flow • Explicitly encodes motion • Least affected by appearance • …but too noisy
Spatial Motion Descriptor blurred Image frame Optical flow
Spatio-temporal Motion Descriptor Temporal extent E E A A E I matrix E B B E frame-to-frame similarity matrix motion-to-motion similarity matrix blurry I … … Sequence A S … … Sequence B t
Football Actions: matching Input Sequence Matched Frames input matched
Football Actions: classification 10 actions; 4500 total frames; 13-frame motion descriptor
Classifying Ballet Actions 16 Actions; 24800 total frames; 51-frame motion descriptor. Men used to classify women and vice versa.
Classifying Tennis Actions 6 actions; 4600 frames; 7-frame motion descriptor Woman player used as training, man as testing.
Classifying Tennis • Red bars show classification results
Querying the Database run jog swing walk right walk left Action Recognition: run walk left swing walk right jog Joint Positions: input sequence database
2D Skeleton Transfer We annotate database with 2D joint positions After matching, transfer data to novel sequence Ajust the match for best fit Input sequence: Transferred 2D skeletons:
3D Skeleton Transfer We populate database with rendered stick figures from 3D Motion Capture data Matching as before, we get 3D joint positions (kind of)! Input sequence: Transferred 3D skeletons:
“Do as I Do” Motion Synthesis • Matching two things: • Motion similarity across sequences • Appearance similarity within sequence(like VideoTextures) • Dynamic Programming input sequence synthetic sequence
“Do as I Do” Source Motion Source Appearance 3400 Frames Result
“Do as I Say” Synthesis • Synthesize given action labels • e.g. video game control run walk left swing walk right jog run jog swing walk right walk left synthetic sequence
“Do as I Say” • Red box shows when constraint is applied
Actor Replacement SHOW VIDEO (GregWorldCup.avi, DivX)
Conclusions • In medium field action is about motion • What we propose: • A way of matching motions at coarse scale • What we get out: • Action recognition • Skeleton transfer • Synthesis: “Do as I Do” & “Do as I say” • What we learned? • A lot to be said for the “little guy”!