Local spatio-temporal image features for motion interpretation

Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute of Technology) SE-100 44 Stockholm, Sweden Local spatio-temporal image features for motion interpretation Ivan Laptev

Motivation Goal: Interpretation of dynamic scenes Common methods: Common problems: • Camera stabilization • Complex & changing BG • Segmentation • Appearance of new OBJ • Tracking  No global assumptions about the scene

Space-time No global assumptions  Consider local spatio-temporal neighborhoods hand waving boxing

How to deal with transformations in the data? (ICPR’04) • How to use obtained features for applications? (ICPR’04) • How to find informative neighborhoods? (ICCV’03) • How to describe the neighborhoods? (SCMVP’04) Questions • How to find informative neighborhoods? • How to deal with transformations in the data? • How to describe the neighborhoods? • How to use obtained features for applications?

How to describe the neighborhoods? (SCMVP’04) Questions • How to find informative neighborhoods? (ICCV’03) • How to deal with transformations in the data? (ICPR’04) • How to use obtained features for applications? (ICPR’04)

Look at the distribution of the gradient High image variation in space and time   Space-time gradient Covariance Spatial scale , temporal scale Space-Time interest points What neighborhoods to consider? Distinctive neighborhoods

High variation of  large eigenvalues of   Local maxima of H over (x,y,t) Second-moment matrix (similar to Harris operator [Harris and Stephens, 1988]) Space-Time interest points Distribution of within a local neighborhood

Space-Time interest points Motion event detection

Space-Time interest points Motion event detection: complex background

Space-Time interest points appearance/ disappearance accelerations split/merge

Events are well localized in time andareconsistently identified by different people. • The ability of memorizing activities has shown to bedependent on how fine we subdivide the motioninto units. Relations to psychology ”... The world presents us with a continuous stream of activity which the mind parses intoevents. Like objects, they are bounded; they have beginnings, (middles,) and ends. Likeobjects, they are structured, composed of parts.However, in contrast to objects, events arestructured in time...'' Tversky et.al.(2002), in ”The Imitative Mind”

How to describe the neighborhoods? (SCMVP’04) Questions • How to find informative neighborhoods? (ICCV’03) • How to deal with transformations in the data? (ICPR’04) • How to use obtained features for applications? (ICPR’04)

How to describe the neighborhoods? (SCMVP’04) Scale and frequency transformations Questions • How to find informative neighborhoods? (ICCV’03) • How to deal with transformations in the data? (ICCV’03) • How to use obtained features for applications? (ICPR’04)

S p P’ • • point transformation covariance transformation Spatio-temporal scale selection Image sequence f can be influenced by changes in spatial and temporal resolution

Estimate spatial and temporal extents of image structures  Scale selection Scale-selection in space [Lindeberg IJCV’98] Extension to space-time: Find normalization parameters a,b,c,d for Spatio-temporal scale selection Want to estimate S from the data

Spatio-temporal scale selection Analyze spatio-temporal blob Extrema constraints give parameter values a=1, b=1/4,c=1/2, d=3/4

Spatio-temporal scale selection  The normalized spatio-temporal Laplacian operator Assumes extrema values at positions and scales corresponding to the centers and the spatio-temporal extent of a Gaussian blob

 Scale estimation (*)  Interest point detection (**) • Fix • For each detected interest point (**) • Estimate (*) • Update covariance • Re-detect using • Iterate 3-6 until convergence of and Space-Time interest points H depends on  and, hence, on  and scale transformation S • adapt interest points by iteratively computing:

Spatio-temporal scale selection Stability to size changes, e.g. camera zoom

Spatio-temporal scale selection Selection of temporal scales captures the frequency of events

How to describe the neighborhoods? (SCMVP’04) Questions • How to find informative neighborhoods? (ICCV’03) • How to deal with transformations in the data? (ICCV’03) • How to use obtained features for applications? (ICPR’04) Scale and frequency transformations

How to describe the neighborhoods? (SCMVP’04) Transformations due to camera motion Stabilized camera Stationary camera time time Questions • How to find informative neighborhoods? (ICCV’03) • How to deal with transformations in the data? (ICPR’04) • How to use obtained features for applications? (ICPR’04)

local jet descriptors local jet descriptors Effect of camera motion

G point transformation • • P’ p G covariance transformation ’  Galilean transformation

Need to know point correspondences  Bad Space-time gradient Second-moment matrix Estimation of G Want to ”undo” the effect of G Consider local measurements:

Transformations of and  • Let Estimation of G Idea: Fix the ”normal” form of  and estimate G by normalizing.

Fix , let • Estimate according to (*) • Update • Iterate 2-3-4 until convergence of Iterative method for estimating and  Can solve for from ! (similar to Lucas&Kanade OF) ... however  Estimation of G (*)

Non-adapted neighborhoods Galilei-adapted neighborhoods Comparison of local jet responses computed in corresponding neighborhoods Estimation of G: experiments

adapt interest points by iteratively computing:  Velocity estimation (*)  Interest point detection (**) • Fix • For each detected interest point (**) • Estimate (*) • Update covariance • Re-detect using • Iterate 3-6 until convergence of and Space-Time interest points H depends on  and velocity transformation G

Stabilized camera Stationary camera Interest points Velocity-adapted interest points Adapted interest points

G-1 Evaluation: Repeatability f f’ Synthetic experiments: G is known How many points in f and f’ do correspond?

Stability of descriptors Define local jet descriptors: Distance between descriptors at corresponding points

How to describe the neighborhoods? (SCMVP’04) Questions • How to find informative neighborhoods? (ICCV’03) • How to deal with transformations in the data? (ICCV’03) • How to use obtained features for applications? (ICPR’04)

Features from human actions

Space-time neighborhoods boxing walking hand waving

A well-founded choice of local descriptors is the local jet (Koenderink and van Doorn, 1987) computed from spatio-temporal Gaussian derivatives (here at interest points pi) where Local space-time descriptors

Use of descriptors:Clustering • Group similar points in the space of image descriptors using K-means clustering • Select significant clusters Clustering c1 c2 c3 c4 Classification

Use of descriptors:Clustering

Use of descriptors:Matching • Find similar events in pairs of video sequences

Other descriptors better? Consider the following choices: • Multi-scale spatio-temporal derivatives • Projections to orthogonal bases obtained with PCA • Histogram-based descriptors Spatio-temporal neighborhood

Multi-scale derivative filters Derivatives up to order 2 or 4; 3 spatial scales; 3 temporal scales: • 9 x 3 x 3 = 81 or 34 x 3 x 3 = 306 dimensional descriptors

PCA descriptors • Compute normal flow or optic flow in locally adapted spatio-temporal neighborhoods of features • Subsample the flow fields to resolution 9x9x9 pixels • Learn PCA basis vectors (separately for each flow) from features in training sequences • Project flow fields of the new features onto the 100 most significant eigen-flow-vectors:

Position-dependent histograms • Divide the neighborhood i of each point piinto M^3subneighborhoods, here M=1,2,3 • Compute space-time gradients (Lx, Ly, Lt)T or optic flow (vx, vy)T at combinations of 3 temporal and 3 spatial scales where are locally adapted detection scales • Compute separable histograms over all subneighborhoods, derivatives/velocities and scales ...

Evaluation: Action Recognition Database: walking running jogging handwaving handclapping boxing Initially, recognition with Nearest Neighbor Classifier (NNC): • Take sequences of X subjects for training (Strain) • For each test sequence stest find the closest training sequence strain,i by minimizing the distance • Action of stest is regarded as recognized if class(stest)= class(strain,i)

Results: Recognition rates (all) Scale and velocity adapted features Scale-adapted features

Results: Recognition rates (Hist) Scale and velocity adapted features Scale-adapted features

Results: Recognition rates (Jets) Scale and velocity adapted features Scale-adapted features

Local spatio-temporal image features for motion interpretation

Local spatio-temporal image features for motion interpretation

Presentation Transcript

Behavior Recognition via Sparse Spatio-Temporal Features

Spatio-Temporal Compressive Sensing

Spatio-Temporal Data Mining

Extracting features from spatio-temporal volumes (STVs) for activity recognition

SPATIO TEMPORAL FRAMEWORKS

Tracking with Local Spatio -Temporal Motion Patterns in Extremely Crowded Scenes

Tracking Pedestrians Using Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes

Spatio Temporal Video Retrieval

Spatio-temporal HAC

Spatio-Temporal Databases

Spatio-Temporal Clustering

Spatio-Temporal Databases

SPATIO-TEMPORAL DATABASES

Spatio-Temporal WiFi Localization

SPATIO-TEMPORAL DATABASES

Local Descriptors for Spatio-Temporal Recognition

Spatio-temporal Pattern Queries

Spatio-temporal Databases

Spatio-Temporal Predicates

UCERF3 Spatio-Temporal Clustering

Spatio-Temporal Databases