Beyond Actions: Discriminative Models for Contextual Group Activities

M.Sc. Thesis Defense Beyond Actions: Discriminative Models for Contextual Group Activities TianLan School of Computing Science Simon Fraser University August 12, 2010

Outline • Group Activity Recognition with Context • Structure-level (latent structures) • Feature-level (Action Context descriptor) • Introduction • Experiments

Activity Recognition • Goal Enable computers to analyze and understand human behavior. Answering a phone Kissing

Action vs. Activity Activity: a group of people forming a queue Action: Stand in a queue and facing left

Activity Recognition • Activity Recognition is important • Activity Recognition is difficult intra-class variation, background clutter, partial occlusion, etc. HCI Surveillance Sport Entertainment

Group Activity Recognition • Motivation human actions are rarely performed in isolation, the actions of individuals in a group can serve as context for each other. • Goal explore the benefit of contextual information in group activity recognition in challenging real-world applications

Group Activity Recognition Context

Group Activity Recognition • Two types of Context Talk … … group-person interaction person-person interaction

Latent Structured Model Activity activity class h y h1 y h2 … action class Action Hidden layer x2 xn Feature x1 image x0

Latent Structured Model group-person Interaction activity class hn y h1 y person-person Interaction h2 … action class x2 xn Structure-level x1 Feature-level image x0

Difference from Previous Work • Group Activity Recognition • Our work • Group activity recognition in realistic videos • Two new types of contextual information • A unified framework • Previous Work • Single-person action recognition • Schuldt et al. icpr 04 • Relative simple activity recognition • Vaswani et al. cvpr 03 • Dataset in controlled conditions

Difference from Previous Work • Latent Structured Models Previous work a pre-defined structure for the hidden layer, e.g. tree (HCRF) ( Quattoni et al. pami 07, Felzenszwalb et al. cvpr 08) Our work latent structure for the hidden layer, automatically infer it during learning and inference.

Structure-level Approach activity class y person-person Interaction hn y h1 … h2 action class Structure-level x2 xn x1 Feature-level image x0

Structure-level Approach • Latent Structure Queue ? Talk Talk

Model Formulation y Input: image-label pair (x,h,y) … hn y h1 h2 Image-Action Action-Activity Image-Activity Action-Action x1 x2 xn x0

Inference • Score an image x with activity label y • Infer the latent variables NP hard !

Inference • Holding Gy fixed, • Holding hy fixed, Loopy BP ILP

Learning with Latent SVM Optimization: Non-convex bundle method (Do & Artieres, ICML 09)

Feature-level Approach activity class y person-person Interaction hn y h1 … h2 action class Structure-level x2 xn x1 Feature-level image x0

Feature-level Approach activity class y • Model action class h y h1 h2 … Action Context Descriptor x1 x2 xn image x0

Action Context Descriptor τ τ z + action Focal person Context (b) (a) action (c)

Action Context Descriptor Feature Descriptor Multi-class SVM e.g. HOG by Dalal & Triggs score score score score max action class action class action class action class …

Dataset • Collective Activity Dataset (Choi et al. VS 09) • 5 action categories: crossing, waiting, queuing, walking, talking. (per person) • 44 video clips

Collective Activity Dataset

Dataset • Nursing Home Dataset • activity categories: fall, non-fall. (per image) • 5 action categories: walking, standing, sitting, bending and falling. (per person) • In total 22 video clips (2990 frames), 8 clips for test, the rest for training. 1/3 are labeled as fall.

Nursing Home Dataset

Baselines h2 h4 h4 h4 h4 • root (x0) + svm (no structure) • No connection • Min-spanning tree • Complete graph within r h2 h2 h2 h1 Hidden layer h1 h1 h3 h3 h3 r h1 h3 Structure-level approach

System Overview u Person Detector Model Person Descriptor Video v • Pedestrian Detection • by Felzenszwalb et al. • Background Subtraction • HOG by Dalal & Triggs • LST by Loy et al. • at cvpr 09

Results – Collective Activity Dataset

Results – Correct Examples

Results – Incorrect Examples Crossing Waiting

Walking Talking Queuing

Results – Nursing Home Dataset

Results – Correct Examples

Results – Incorrect Examples

Conclusion • A discriminative model for group activity recognition with context. • Two new types of contextual information: • group-person interaction • person-person interaction • structure-level: Latent structure • Feature-level: Action Context descriptor • Experimental results demonstrate the effectiveness of the proposed model

Future Work • Modeling Complex Structures • Temporal dependencies among action • Contextual Feature Descriptors • How to encode discriminative context? • Weakly supervised Learning • e.g. multiple instance learning for fall detection

Thank you!

Pairwise Weight hj y hk

Pairwise Weight

Infer the graph structures

Results – Nursing Home Dataset 0/1 loss – optimize overall accuracy

Results – Nursing Home Dataset new loss – optimize mean per-class accuracy

Person Detectors • Collective Activity Dataset: • Pedestrian Detector (Felzenszwalb et al., CVPR 08) • Nursing Home Dataset Background Subtraction Moving Regions Video

Person Descriptors • Collective Activity Dataset: • HOG • Nursing Home Dataset • Local Spatial Temporal (LST) Descriptor (Loy et al., ICCV 09) u v

Beyond Actions: Discriminative Models for Contextual Group Activities

Beyond Actions: Discriminative Models for Contextual Group Activities

Presentation Transcript

THE TIMES GROUP

Main area: User Interface Design for Small Mobile Communication Devices Contextual: Human Interaction with Auto

URDU WORD PROCESSING

Decision Making Models

limma: Linear Models for Microarray Data

Discriminative Learning for Markov Logic Networks

Consultation Models

Sociology

2. Models for cognitive ergonomics

Traffic Flow models for Road Networks

Maximum Entropy

Linear Mixed Models: An Introduction

Chapter Twenty-Eight

Mapping Own Practice Against Models of Contextual Practice

Sampling, WLS, and Mixed Models Festschrift to Honor Professor Gary Koch

Two-way fixed-effect models Difference in difference

Business models in project-based firms – Towards a typology of solution-specific business models

A Divided Nation: The Civil War

TARGET2 information session October, 2007

Two-way fixed-effect models Difference in difference

Electronic Business Models