Video Event Recognition: Multilevel Pyramid Matching

Video Event Recognition:Multilevel Pyramid Matching • Dong Xu and Shih-Fu Chang • Digital Video and Multimedia Lab • Department of Electrical Engineering • Columbia University • *Courtesy to Eric Zavesky for preparing for the slides

Video Event Recognition: Problem • Online video search and video indexing • Events characterized by an evolution of scenes, objects and actions over time • 56 events are defined in LSCOM Airplane Flying Car Exiting

Video Event Recognition: Challenges • Geometric and photometric variances • Clutter background • Complex camera motion and object motion

Object Detection & Localization Inference Tracking “Airplane Landing” Event Recognition: Object Tracking • Detect interest object, track over time, and model spatio-temporal dynamics • Hard to detect events without explicit object motion, such as Riot ?

Keyframe Feature Similarity 18% 15% 50% ... ... Event Recognition: Key-Frame based Matching • Only key-frame is used for matching. • Low-level feature extraction, compare to other frames, overall decision on matching

feature extraction concept detectors EMD distance a ... ... X Event Recognition: Multi-level Pyramid Matching multi-level pyramid matching

edge directionhistogram σ Gabortexture σ σ μ γ μ γ μ γ grid colormoment Content Representation: Low-level Features

Image Database + - Content Representation: Mid-level Semantic Concept Scores Concept Detectors • Train detectors on low-level features • Mid-level semantic concept feature is more robust • Developed and released 374 semantic concept detectors

Earth Mover’s Distance (EMD): Approach SupplierP is with agiven amount of goods ReceiverQis with a given limited capacity dij 1 1/2 1/2 Weights:Solved by linear programming • Temporal shift:a frame at the beginning of P can be mapped to a frame at the end of Q • Scale variations: a frame from P can be mapped to multiple frames in Q

Multi-level Pyramid Matching: Motivations • One Clip = several subclips(stages of event evolution) • No prior knowledge about the number of stages in an event • Videos of the same event may include only a subset of stages Solution: Multi-level pyramid matching in temporal domain

Multi-level Pyramid Matching: Algorithm Smoke Fire • Temporally Constrained Hierarchical Agglomerative Clustering Level-2 Level-2 Level-1 • Alignment of different subclips (Level-1 as an example) Level-1 Level-0 Level-0 EMD Distance Matrix between Sub-clips Integer-value Alignment Level-2 Level-2 Level-1 Level-1 • Fusion of information from different levels. Smoke Fire

Pyramid Matching: Projected Illustration

Pyramid Matching: Animated Example

Experiments: Keyframe based feature performance Evaluation Metric: Average Precision Dataset: TRECVID2005

Experiments: EMD concept performance

Experiments: Benefits of multi-level pyramid fusion

Video Event Recognition: Conclusions • Single-level EMD outperforms key-frame based method. Multi-level Pyramid Matching further improves event detection accuracy. • First systematic study of diverse visual event recognition in the unconstrained broadcast news domain.

Thank you very much!

Video Event Recognition: Multilevel Pyramid Matching