Human Activity Recognition at Mid and Near Range

Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan, S. Lee, C. Huang International Workshop on Video 2009 May 26, 2009

Activity Recognition: Motivation • Is the key content of a video (along with scene description) • Useful for • Monitoring (alerts) • Indexing (forensic, deep analysis, entertainment…) • HCI • ..

Activity Recognition: Goals • Goal is not just to give a name, but also a description (not just the verb but a sentence) • Who, what, when, where, why etc…? • Some of these inferences require object recognition in addition to “action” recognition • Actor, object, instrument…. • Context and story understanding is important to infer intent

Action as Change of State • A change in state, is given by some function, say f (s, s’, t), • Example: walking changes position of the walker • An event can also be defined over an interval where some properties of f are constant (or within a certain range) • Example: walking at a constant speed or in the same direction • Recognition methods require some estimate of the state, such as positions or pose of actors, their trajectories and relation to scene objects

Event Composition • Composite Events • Compositions of other, simpler events. • Composition is usually, but not necessarily, a sequence operation, e.g. getting out of a car, opening a door and entering a building. • Primitive events: those we choose not to decompose, e.g. walking • Primitive events can be recognized directly from observables, by using standard classifiers. • Graphical models, such as HMMs and CRFs are natural tools for recognition of composite events.

Hierarchical Models • Hierarchical structure of events is naturally reflected in hierarchical graphical models

Issues in Activity Recognition • Variations in image/video appearance due to changes in viewpoint, illumination, clothing, style of activity etc. • Inherent ambiguities in 2-D videos • Reliable detection and tracking of objects, especially those directly involved in activities • Temporal segmentation • “Recognition” of novel events

Mid vs Near Range • Mid-range • Limbs of human body, particularly the arms, are not distinguishable • Common approach is to detect and track moving objects and make inferences based on trajectories • Near-range • Hands/arms are visible; activities are defined by pose transitions, not just the position transitions • Pose tracking is difficult; top-down methods are commonly used

Mid-Range Example Example of abandoned luggage detection Based on trajectory analysis and simple object detection/recognition Uses a simple Bayesian classifier and logical reasoning about order of sub-events Tested on PETS and ETISEO data

Tracking in Crowded Environments • Results from CVPR09 paper

Dealing with Track Failures • In crowded environments, track fragmentation is common • Events of interest themselves may cause occlusions, e.g. two (or more) people meeting • Possible event detection can trigger a re-evaluation of the tracks • Meeting event example • People must have been separate, then get close to each other and stay together for some time • How to distinguish between passing by and meeting? Both may cause tracks to vanish.

Meeting Event Result (Videos) Meeting Event Detection Result Tracking Result

Events requiring fine Pose Tracking • Many events, e.g. gestures, requiring tracking of body pose, not just position • Humans pose has large degrees of freedom • >50 joint angles/positions • Bottom up pose tracking approaches are slow and not robust • Top down approaches attempt to recognize activity and pose simultaneously • Note that usually data is not pre-segmented into primitive action segments • Closed-world assumption

… … … … Input sequence … … … … 3D body pose Activity Recognition w/o Tracking check watch Action segments punch kick pick up throw +

Difficulties • Viewpoint change & pose ambiguity (with a single camera view) • Spatial and temporal variations (style, speed)

Key Poses and Action Nets • Key poses are determined by an automatic method that computes large changes in energy; key poses may be shared among different actions

Experiments: Training Set 15 action models 177 key poses 6372 nodes in Action Net

Action Net: Apply constraints 0o 10o …

Experiments: Test Set 50 clips, average length 1165 frames 5 viewpoints 10 actors (5 men, 5 women)

Experiments: Results

A Video Result extracted blob & ground truth original frame without action net with action net

Working with Natural Environments • Foreground segmentation is difficult • Leads to use of lower level features, e.g. edges and optical flow • Key poses are not discriminative enough w/o accurate segmentation; actor position also needs to be inferred • We introduce use of continuous pose sequence. • More general graphical models that include • Hierarchy • Transition probabilities may depend on observations • Observations may depend on multiple states • Duration models (HMMs imply an exponential decay)

Experiments Tested the approach on videos of 6 actions- sit-on-ground(SG), standup-from-ground(StG), sit-on-chair(SC), standup-from-ground(StC), pickup(PK), point(P). Collected instances of these actions around 4 tilt angles and 5 pan angles A total of 400 instances over all actions with various backgrounds. We compared the relative importance of shape, flow and duration features with our system (shape+flow+duration).

Results Combining flow and shape produces a clear improvement. Bulk of the expense is in computing the flow.

Human Activity Recognition at Mid and Near Range

Human Activity Recognition at Mid and Near Range

Presentation Transcript

Predicting mid-range global futures

Mid Range Concepts

Mid-Range Automatic Nozzle

Human Pose Recognition

human activity

human activity

Human Activity

Human Speech Recognition

Human Activity Recognition using On-body Sensing

Human Activity

Human Activity and Earth

human activity

Human Action Recognition

Logo Recognition Activity continued

Activity Recognition : Techniques and Approaches

Robust Activity Recognition

Activity range

human activity

Activity Recognition

Activity Recognition

Human Activity

Activity Recognition Journal Club