220 likes | 378 Views
Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion. Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO Research Group, City University of Hong Kong. Consider this question first. Can you find video clips containing fire ?. …. Video 1.
E N D
Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO Research Group, City University of Hong Kong
Consider this question first Can you find video clips containing fire? … Video 1 … Video 2 … Video 3 time
12,756.32 kilometers http://jaeger.earthsci.unimelb.edu.au/Images/Topographic/Whole_Earth/Earth_50.jpg
The commercial search engines… • Texts (e.g., tags) are noisy even with human doing the annotation. • Need to look beyond texts!
Recent developments • There has been a great deal of interests in developing content-based approaches for video/image annotation. • Everingham, Zisserman, Williams and Van Gool, Oxford Tech. Report, 2006. • Fei-Fei, Fergus and Perona, CVPR workshop 2004. • Grauman and Darrell, ICCV 2005. • Lazebnik, Schmid, and Ponce, CVPR 2006. • Jiang, Ngo, Yang, CIVR 2007. sky, mountain, tree, bridge, river…
Recent developments • Bag of visual words J. Sivic et al, ICCV 2003
Our feature representation framework Chang TRECVID 2008; Jiang, Yang, Ngo & Hauptmann, IEEE TMM, to appear
Limitations • Most existing methods aim at the assignment of concept labels individually • but concepts do not occur in isolation! • Domain change between training and testing data was not considered military personnel smoke building explosion_fire vehicle road outdoor Broadcast News Videos Documentary Videos
Method overview • Domain adaptive semantic diffusion (DASD) road vehicle Water 0.01 0.12 0.91 0.18 0.05 0.11 0.58 0.10 0.13 0.02 0.05 0.19 0.80 0.46 0.13 Jiang, Wang, Chang, Ngo, ICCV 2009
Method overview (cont.) • Domain adaptive semantic diffusion (DASD) • Semantic graph • Nodes are concepts • Edges represent concept correlation • Graph diffusion • Smooth concept detection scores w.r.t the concept correlation road 0.1 0.2 0.8 0.5 0.1 … 0.4 0.1 0.6 0.1 0.1 0.0 … 0.8 0.0 0.4 0.5 0.2 0.8 … 0.7 0.0 0.1 0.9 0.2 0.1 … 0.3 vehicle Water sky
Formulation • Energy function Detection score of concept cion test samples Concept affinity matrix
Formulation (cont.) • Gradually smooth the function makes the detection scores in accordance with the concept relationships Detection score smoothing process
Formulation (cont.) • Domain changes… -- concept affinitymatrix WEAPON 0.24 CLOUDS DESERT 0.17 0.09 CAR 0.29 0.64 0.20 SKY VEHICLE 0.09 PARKING_LOT Documentary Videos Broadcast News Videos
Formulation (cont.) • Graph adaptation Graph adaptation process
Graph adaptation - example Iteration: 8 Iteration: 12 Iteration: 0 Iteration: 4 Iteration: 16 Iteration: 20 Broadcast news video domain Documentary video domain
Experiments • Datasets • TRECVID 2005, 2006, 2007 • Baseline detectors: VIREO-374 • Graph construction: • Ground-truth labels on TRECVID 2005 TRECVID 05/06 (Broadcast News Videos) TRECVID 07 (Documentary Videos) WALKING MAP SPORTS WEATHER SPORTS WEATHER OFFICE BUS CLASSROOM PEOPLE-MARCHING CORP. LEADER DESERT MOUNTAIN DESERT WATER NIGHT TIME TELEPHONE MOUNTAIN EXPLOSION- FIRE TRUCK OFFICE BUILDING ANIMAL TWO PEOPLE STREET MILITARY POLICE
Results • Performance gain on TRECVID 05-07 Datasets • SD: semantic diffusion • Consistent improvement over all 3 data sets • DASD: domain adaptive semantic diffusion • Graph adaptation further improves the performance
Results (cont.) • Per-concept performance TRECVID 2006 Test Data
Results (cont.) various baseline detectors (TRECVID-06) Comparison with the state-of-the-arts
Computational time • Complexity is O(mn) • m: # concepts; n: # video shots • Only 2 milliseconds per shot/keyframe! Single image/frame computation time
Summary • A judicious approach using local features achieves very impressive results for visual annotation • Context information is helpful ! • Domain adaptive semantic diffusion • effective for enhancing video annotation accuracy • can alleviate the effect of data domain changes • highly efficient • Future directions include: • detector reliability: diffusion over directed graph • web data annotation: utilize contextual information to improve the quality of tags