160 likes | 292 Views
Video Event Recognition: Multilevel Pyramid Matching. Dong Xu and Shih-Fu Chang Digital Video and Multimedia Lab Department of Electrical Engineering Columbia University http://www.ntu.edu.sg/home/dongxu dongxu@ntu.edu.sg *Courtesy to Eric Zavesky for preparing for the slides.
E N D
Video Event Recognition:Multilevel Pyramid Matching • Dong Xu and Shih-Fu Chang • Digital Video and Multimedia Lab • Department of Electrical Engineering • Columbia University • http://www.ntu.edu.sg/home/dongxu • dongxu@ntu.edu.sg • *Courtesy to Eric Zavesky for preparing for the slides
Video Event Recognition: Problem • Online video search and video indexing • Events characterized by an evolution of scenes, objects and actions over time • 56 events are defined in LSCOM Airplane Flying Car Exiting
Video Event Recognition: Challenges • Geometric and photometric variances • Clutter background • Complex camera motion and object motion
Object Detection & Localization Inference Tracking “Airplane Landing” Event Recognition: Object Tracking • Detect interest object, track over time, and model spatio-temporal dynamics • Hard to detect events without explicit object motion, such as Riot ?
Keyframe Feature Similarity 18% 15% 50% ... ... Event Recognition: Key-Frame based Matching • Only key-frame is used for matching. • Low-level feature extraction, compare to other frames, overall decision on matching
feature extraction concept detectors EMD distance a ... ... X Event Recognition: Multi-level Pyramid Matching multi-level pyramid matching
edge directionhistogram σ Gabortexture σ σ μ γ μ γ μ γ grid colormoment Content Representation: Low-level Features
Image Database + - Content Representation: Mid-level Semantic Concept Scores Concept Detectors • Train detectors on low-level features • Mid-level semantic concept feature is more robust • Developed and released 374 semantic concept detectors
Earth Mover’s Distance (EMD): Approach SupplierP is with agiven amount of goods ReceiverQis with a given limited capacity dij 1 1/2 1/2 Weights:Solved by linear programming • Temporal shift:a frame at the beginning of P can be mapped to a frame at the end of Q • Scale variations: a frame from P can be mapped to multiple frames in Q
Multi-level Pyramid Matching: Motivations • One Clip = several subclips(stages of event evolution) • No prior knowledge about the number of stages in an event • Videos of the same event may include only a subset of stages Solution: Multi-level pyramid matching in temporal domain
Multi-level Pyramid Matching: Algorithm Smoke Fire • Temporally Constrained Hierarchical Agglomerative Clustering Level-2 Level-2 Level-1 • Alignment of different subclips (Level-1 as an example) Level-1 Level-0 Level-0 EMD Distance Matrix between Sub-clips Integer-value Alignment Level-2 Level-2 Level-1 Level-1 • Fusion of information from different levels. Smoke Fire
Experiments: Keyframe based feature performance Evaluation Metric: Average Precision Dataset: TRECVID2005
Experiments: Benefits of multi-level pyramid fusion
Video Event Recognition: Conclusions • Single-level EMD outperforms key-frame based method. Multi-level Pyramid Matching further improves event detection accuracy. • First systematic study of diverse visual event recognition in the unconstrained broadcast news domain.