VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING

VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS:A VIDEO MINING SYSTEM FOR RETAIL MARKETING Alex Leykin Indiana University PhD Thesis by:

Motivation • Automated tracking and activity recognition is missing from marketing research • Hardware is already there • Visual information can reveal a lot about human interactions with each other • Help in making intelligent marketing decisions

Goals • Process visual information to get a formal representation of human locations (Visual Tracking) • Extract semantic information from the tracks (Activity Analysis)

Related Work: Detection and Tracking • Yacoob and Davis “Learned models for estimation of rigid and articulated human motion from stationary or moving camera” IJCV 2000 • Zhao and Nevatia “Tracking multiple humans in crowded environment” CVPR 2004 • Haritaoglu, Harwood, and Davis “W-4: Real-time surveillance of people and their activities” PAMI 2000 • J. Deutscher, B. North, B. Bascle and A. Blake “Tracking through singularities and discontinuities by random sampling”, ICCV 1999 • A. Elgammal and L. S. Davis, “Probabilistic Framework for Segmenting People Under Occlusion”, ICCV 2001. • M. Isard, J. MacCormick, “BraMBLe: a Bayesian multiple-blob tracker”, ICCV 2001

Related Work: Activity Recognition • Haritaoglu and Flickner “Detection and tracking of shopping groups in stores” CVPR 2001 • Oliver, Rosario, and Pentland “A bayesian computer vision system for modeling human interactions” PAMI 2000 • Buzan, Sclaroff, and Kollios “Extraction and clustering of motion trajectories in video” ICPR 2004 • Hongeng, Nevatia, and Bremond “Video-based event recognition: activity representation and probabilistic recognition methods” CVIU 2004 • Bobick and Ivanov “Action recognition using probabilistic parsing” CVPR 1998

System Components

Background Modeling • Color • μRGB • Ilow • Ihi codebook codeword ………..

If there is no match if codebook is saturated then pixel is foreground else create new codeword Else update the codeword with new pixel information If >1 matches then merge matching codewords Adaptive Background Update • Match pixel p to the codebook b I(p) > Ilow I(p) < Ihigh (RGB(p)∙ μRGB) < TRGB t(p)/thigh > Tt1 t(p)/tlow > Tt2

Background Subtraction

Head Detection Vanishing Point Projection (VPP) Historgram Vanishing Point in Z-direction

Camera Setup • Two camera types Perspective Spherical • Mixtures of indoor and outdoor scenes • Color and thermal image sensors • Varying lighting conditions (daylight, cloud cover, incandescent, etc.)

Camera Modeling Perspective Projection Spherical Projection Lat Y y X x [Xc, Yc, Zc] Lon [Xc, Yc, Zc] Z Y X Z X, Y, Z from: [sx; sy; s] = P [X; Y; Ż; 1] using SVD Where P, is the 3x4 projection matrix X = cos(θ) tan(π-φ)(Zc-Ż) Y = sin(θ) tan(π-φ)(Zc-Ż) Z = Ż Assumption: floor plane Zf = 0

Tracking Goal: find a correspondence between the bodies, already detected in the current frame with the bodies which appear in the next frame. ? ? ? Apply Markov Chain Monte Carlo (MCMC) to estimate the next state xt-1 xt ? Add body Delete body Recover deleted Change Size Move zt

Location of each pedestrian is estimated probabilistically based on: Current image Previous state of the system Physical constraints observation likelihood Tracking The goal of our tracking system is to find the candidate state x´(a set of bodiesalong with their parameters) which, given the last known state x, will best fitthe current observation z P(x’| z, x) = L(z|x’) · P(x’{x}) state prior probability

body coordinatesare weighted uniformlywithin the rectangular region R of the floor map. U(x)R and U(y)R  variation from Kalman predicted position d(xt, x’t−1) and d(y, y’t−1) Tracking: Priors Constraintson the body parameters: N(hμ, hσ2) and N(wμ,wσ2)body width andheight Temporal continuity: d(wt, wt−1) and d(ht, ht−1) variation from the previous size N(μdoor, σdoor) distance to the closest door (for new bodies)

Tracking Likelihoods: Distance weight plane Problem: blob trackers ignore blob position in 3D (see Zhao and Nevatia CVPR 2004) Solution: employ “distance weight plane” Dxy = |Pxyz, Cxyz| where P and C are world coordinates of the camera and reference point correspondingly and

Tracking Likelihoods: Z-buffer 0 = background, 1=furthermost body, 2 = next closest body, etc

Tracking Likelihoods: Color Histogram Color observation likelihood is based on the Bhattacharya distance between candidate and observed color histograms Implementation of z-buffer (Z) and distance weight plane (D) allows to compute multiple-body configuration with one computationally efficient step. Let: I - set of all blob pixels O- set of body pixels

H t t-1 Tracking: Anisotropic Weighted Mean Shift Classic Mean-Shift Our Mean-Shift t

Actors and events • Shopper groups are formed by individual shoppers who shop together for some amount of time • More than fleeting crossing of paths • Dwelling together • Splitting and uniting after a period of time

Swarming • Shopper groups detected based on “swarming” idea in reverse • Swarming is used in graphics to generate flocking behaviour in animations. • Rules define flocking behaviour: • Avoid collisions with the neighbors. • Maintain fixed distance with neighbors • Coordinate velocity vector with neighbors.

Tracking Customer Groups We treat customers as swarming agents, acting according to simple rules (e.g. stay together with swarm members) Customer groups

Terminology • Actors: shoppers (bodies detected in tracking) • (x, y, id) • Swarming events defined as short time activity sequences of multiple agents interacting with each other. • Could be fleeting (crossing paths) • Later analysis sorts this out and ignores chance encounters.

Swarming The actors that best fit this model signal a Swarming Event Multiple swarming events are further clustered with fuzzy weights to find out shoppers in the same group over long periods.

Two actors come sufficiently close according to some distance measure: Relative position pi=(xi, yi) of actor i on the floor Body orientations αi Dwelling state δi={T,F}. Event detection Distance between two agents is a linear combination of co-location, co-ordination and co-dwelling

Event detection Perform agglomerative clustering of actors a into clusters C • Initialize: N singleton clusters • Do: merge two closest clusters • While not: validity index I reaches its maximum I consists of isolation Ini and compactness Inc Ini = isolation Inc = compactness

Event detection Final events # Iteration # Iteration

Activity Detection • The shopper group detection is accomplished by clustering the short term events over long time periods. • The events could be separated in time, but they will be part of the same shopper group if the actors are the same (the first term).

Activity detection • Higher level activities (shopper groups) detected using these events as building blocks over longer time periods • Some definitions: • Bei={bei} the set of all bodies taking part in an event ei. • τei and τej are the average times of events ei and ej happening.

Define a measure of similarity between two events Activity detection Overlap between two sets of actors Separation in time

Activity detection • Perform fuzzy agglomerative clustering • Minimize objective function • where wij are fuzzy weights • and asymmetric variants of Tukey’sbiweight estimators: • (.) is the loss function from robust statistics. • ψ(.) is the weight function • Adaptively choose only strong fuzzy clusters • Label remaining clusters as activities

Results: Swarming activities detected in space-time • Dot location: average event location • Dot size: validity • Dots of same color: belong to same activity

Group Detection Results

Quantitative Results

Tracking

Group Detection Partially identified groups (≥2 people in the group Correctly identified) false positives Ground truth (manually determined) false negatives (groups missed)

Qualitative Assesments • Longer paths provide better group detection (pval << 1) • Two-people groups are easiest to detect • Simple one-step clustering of trajectories is not sufficient for long-term group detection • Employee tracks pose a significant problem and have to be excluded • Several groups were missed by the operator in the initial ground truth • System caught groups missed by the human expert after inspection of results.

BG subtraction based on codebook (RGB+thermal) Introduced head candidate selection method based on VPP histogram Resolving track initialization ambiguity and non-unique body-blob correspondence Informed jump-diffuse transitions in MCMC tracker Weight plane and z-buffer improve likelihood estimation Anisotropic mean-shift with obstacle model Two-layer formal framework high level activity detection Implemented robust fuzzy clustering to group events into activities Contributions

Future Work • Improved Tracking (via feature points) • Demographical analysis • Focus of Attention • Sensor Fusion • Other Types of Swarming Activities

Questions? Thank you!

VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING