590 likes | 670 Views
M.Sc. Thesis Defense. Beyond Actions: Discriminative Models for Contextual Group Activities. Tian Lan School of Computing Science Simon Fraser University August 12, 2010. Outline. Group Activity Recognition with Context Structure-level (latent structures)
E N D
M.Sc. Thesis Defense Beyond Actions: Discriminative Models for Contextual Group Activities TianLan School of Computing Science Simon Fraser University August 12, 2010
Outline • Group Activity Recognition with Context • Structure-level (latent structures) • Feature-level (Action Context descriptor) • Introduction • Experiments
Activity Recognition • Goal Enable computers to analyze and understand human behavior. Answering a phone Kissing
Action vs. Activity Activity: a group of people forming a queue Action: Stand in a queue and facing left
Activity Recognition • Activity Recognition is important • Activity Recognition is difficult intra-class variation, background clutter, partial occlusion, etc. HCI Surveillance Sport Entertainment
Group Activity Recognition • Motivation human actions are rarely performed in isolation, the actions of individuals in a group can serve as context for each other. • Goal explore the benefit of contextual information in group activity recognition in challenging real-world applications
Group Activity Recognition Context
Group Activity Recognition • Two types of Context Talk … … group-person interaction person-person interaction
Latent Structured Model Activity activity class h y h1 y h2 … action class Action Hidden layer x2 xn Feature x1 image x0
Latent Structured Model group-person Interaction activity class hn y h1 y person-person Interaction h2 … action class x2 xn Structure-level x1 Feature-level image x0
Difference from Previous Work • Group Activity Recognition • Our work • Group activity recognition in realistic videos • Two new types of contextual information • A unified framework • Previous Work • Single-person action recognition • Schuldt et al. icpr 04 • Relative simple activity recognition • Vaswani et al. cvpr 03 • Dataset in controlled conditions
Difference from Previous Work • Latent Structured Models Previous work a pre-defined structure for the hidden layer, e.g. tree (HCRF) ( Quattoni et al. pami 07, Felzenszwalb et al. cvpr 08) Our work latent structure for the hidden layer, automatically infer it during learning and inference.
Outline • Group Activity Recognition with Context • Structure-level (latent structures) • Feature-level (Action Context descriptor) • Introduction • Experiments
Structure-level Approach activity class y person-person Interaction hn y h1 … h2 action class Structure-level x2 xn x1 Feature-level image x0
Structure-level Approach • Latent Structure Queue ? Talk Talk
Model Formulation y Input: image-label pair (x,h,y) … hn y h1 h2 Image-Action Action-Activity Image-Activity Action-Action x1 x2 xn x0
Inference • Score an image x with activity label y • Infer the latent variables NP hard !
Inference • Holding Gy fixed, • Holding hy fixed, Loopy BP ILP
Learning with Latent SVM Optimization: Non-convex bundle method (Do & Artieres, ICML 09)
Feature-level Approach activity class y person-person Interaction hn y h1 … h2 action class Structure-level x2 xn x1 Feature-level image x0
Feature-level Approach activity class y • Model action class h y h1 h2 … Action Context Descriptor x1 x2 xn image x0
Action Context Descriptor τ τ z + action Focal person Context (b) (a) action (c)
Action Context Descriptor Feature Descriptor Multi-class SVM e.g. HOG by Dalal & Triggs score score score score max action class action class action class action class …
Outline • Group Activity Recognition with Context • Structure-level (latent structures) • Feature-level (Action Context descriptor) • Introduction • Experiments
Dataset • Collective Activity Dataset (Choi et al. VS 09) • 5 action categories: crossing, waiting, queuing, walking, talking. (per person) • 44 video clips
Dataset • Nursing Home Dataset • activity categories: fall, non-fall. (per image) • 5 action categories: walking, standing, sitting, bending and falling. (per person) • In total 22 video clips (2990 frames), 8 clips for test, the rest for training. 1/3 are labeled as fall.
Baselines h2 h4 h4 h4 h4 • root (x0) + svm (no structure) • No connection • Min-spanning tree • Complete graph within r h2 h2 h2 h1 Hidden layer h1 h1 h3 h3 h3 r h1 h3 Structure-level approach
System Overview u Person Detector Model Person Descriptor Video v • Pedestrian Detection • by Felzenszwalb et al. • Background Subtraction • HOG by Dalal & Triggs • LST by Loy et al. • at cvpr 09
Results – Incorrect Examples Crossing Waiting
Walking Talking Queuing
Conclusion • A discriminative model for group activity recognition with context. • Two new types of contextual information: • group-person interaction • person-person interaction • structure-level: Latent structure • Feature-level: Action Context descriptor • Experimental results demonstrate the effectiveness of the proposed model
Future Work • Modeling Complex Structures • Temporal dependencies among action • Contextual Feature Descriptors • How to encode discriminative context? • Weakly supervised Learning • e.g. multiple instance learning for fall detection
Pairwise Weight hj y hk
Results – Nursing Home Dataset 0/1 loss – optimize overall accuracy
Results – Nursing Home Dataset new loss – optimize mean per-class accuracy
Person Detectors • Collective Activity Dataset: • Pedestrian Detector (Felzenszwalb et al., CVPR 08) • Nursing Home Dataset Background Subtraction Moving Regions Video
Person Descriptors • Collective Activity Dataset: • HOG • Nursing Home Dataset • Local Spatial Temporal (LST) Descriptor (Loy et al., ICCV 09) u v