1 / 22

Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion

Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion. Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO Research Group, City University of Hong Kong. Consider this question first. Can you find video clips containing fire ?. …. Video 1.

mateo
Download Presentation

Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO Research Group, City University of Hong Kong

  2. Consider this question first Can you find video clips containing fire? … Video 1 … Video 2 … Video 3 time

  3. 12,756.32 kilometers http://jaeger.earthsci.unimelb.edu.au/Images/Topographic/Whole_Earth/Earth_50.jpg

  4. The commercial search engines… • Texts (e.g., tags) are noisy even with human doing the annotation. • Need to look beyond texts!

  5. Recent developments • There has been a great deal of interests in developing content-based approaches for video/image annotation. • Everingham, Zisserman, Williams and Van Gool, Oxford Tech. Report, 2006. • Fei-Fei, Fergus and Perona, CVPR workshop 2004. • Grauman and Darrell, ICCV 2005. • Lazebnik, Schmid, and Ponce, CVPR 2006. • Jiang, Ngo, Yang, CIVR 2007. sky, mountain, tree, bridge, river…

  6. Recent developments • Bag of visual words J. Sivic et al, ICCV 2003

  7. Our feature representation framework Chang TRECVID 2008; Jiang, Yang, Ngo & Hauptmann, IEEE TMM, to appear

  8. Recent results on TRECVID

  9. Limitations • Most existing methods aim at the assignment of concept labels individually • but concepts do not occur in isolation! • Domain change between training and testing data was not considered military personnel smoke building explosion_fire vehicle road outdoor Broadcast News Videos Documentary Videos

  10. Method overview • Domain adaptive semantic diffusion (DASD) road vehicle Water 0.01 0.12 0.91 0.18 0.05 0.11 0.58 0.10 0.13 0.02 0.05 0.19 0.80 0.46 0.13 Jiang, Wang, Chang, Ngo, ICCV 2009

  11. Method overview (cont.) • Domain adaptive semantic diffusion (DASD) • Semantic graph • Nodes are concepts • Edges represent concept correlation • Graph diffusion • Smooth concept detection scores w.r.t the concept correlation road 0.1 0.2 0.8 0.5 0.1 … 0.4 0.1 0.6 0.1 0.1 0.0 … 0.8 0.0 0.4 0.5 0.2 0.8 … 0.7 0.0 0.1 0.9 0.2 0.1 … 0.3 vehicle Water sky

  12. Formulation • Energy function Detection score of concept cion test samples Concept affinity matrix

  13. Formulation (cont.) • Gradually smooth the function makes the detection scores in accordance with the concept relationships Detection score smoothing process

  14. Formulation (cont.) • Domain changes… -- concept affinitymatrix WEAPON 0.24 CLOUDS DESERT 0.17 0.09 CAR 0.29 0.64 0.20 SKY VEHICLE 0.09 PARKING_LOT Documentary Videos Broadcast News Videos

  15. Formulation (cont.) • Graph adaptation Graph adaptation process

  16. Graph adaptation - example Iteration: 8 Iteration: 12 Iteration: 0 Iteration: 4 Iteration: 16 Iteration: 20 Broadcast news video domain Documentary video domain

  17. Experiments • Datasets • TRECVID 2005, 2006, 2007 • Baseline detectors: VIREO-374 • Graph construction: • Ground-truth labels on TRECVID 2005 TRECVID 05/06 (Broadcast News Videos) TRECVID 07 (Documentary Videos) WALKING MAP SPORTS WEATHER SPORTS WEATHER OFFICE BUS CLASSROOM PEOPLE-MARCHING CORP. LEADER DESERT MOUNTAIN DESERT WATER NIGHT TIME TELEPHONE MOUNTAIN EXPLOSION- FIRE TRUCK OFFICE BUILDING ANIMAL TWO PEOPLE STREET MILITARY POLICE

  18. Results • Performance gain on TRECVID 05-07 Datasets • SD: semantic diffusion • Consistent improvement over all 3 data sets • DASD: domain adaptive semantic diffusion • Graph adaptation further improves the performance

  19. Results (cont.) • Per-concept performance TRECVID 2006 Test Data

  20. Results (cont.) various baseline detectors (TRECVID-06) Comparison with the state-of-the-arts

  21. Computational time • Complexity is O(mn) • m: # concepts; n: # video shots • Only 2 milliseconds per shot/keyframe! Single image/frame computation time

  22. Summary • A judicious approach using local features achieves very impressive results for visual annotation • Context information is helpful ! • Domain adaptive semantic diffusion • effective for enhancing video annotation accuracy • can alleviate the effect of data domain changes • highly efficient • Future directions include: • detector reliability: diffusion over directed graph • web data annotation: utilize contextual information to improve the quality of tags

More Related