1 / 49

Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM)

Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM). Guo -Jun Qi Beckman Institute University of Illinois at Urbana-Champaign. LSCOM (Large Scale Concept Ontology for Multimedia). A broadcast news video dataset 200+ news videos/ 170 hours 61,901 shots Language

gaille
Download Presentation

Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign

  2. LSCOM (Large Scale Concept Ontology for Multimedia) • A broadcast news video dataset • 200+ news videos/ 170 hours • 61,901 shots • Language • English/Arabic/Chinese

  3. Why broadcast News ontology? • Critical mass of users, content providers, applications • Good content availability (TRECVID LDC FBIS) • Share Large set of core concepts with other domains

  4. LSCOM Provides • Richly annotated video content for accomplishing required access and analysis functions over massive amount of video content • Large scale useful well-defined semantic lexicon • More than 3000 concepts • 374 annotated concepts • Bridging semantic gap from low-level features to high-level concepts

  5. A LSCOM concept 000 - Parade Concept ID: 000Name: ParadeDefinition: Multiple units of marchers, devices, bands, banners or Music.Labeled: Yes

  6. LSCOM Hierarchy • http://www.lscom.org/ontology/index.html Thing .Individual ..Dangerous_Thing ...Dangerous_Situation ....Emergency_Incident .....Disaster_Event ......Natural_Disaster ....Natural_Hazard .....Avalance .....Earthquake .....Mudslide .....Natural_Disaster .....Tornado ...Dangerous_Tangible_Thing ....Cutting_Device

  7. Definition: What’s the ontology? (Wikipedia) • An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.

  8. Ontology • Represents the visual knowledge base in a structure way • Graph structure • Tree (hierarchy) structure • Images/videos can be effectively learned and retrieved by the coherence between concepts • Logical coherence • Statistical coherence

  9. An Ontology Hierarchy: Military Vehicle

  10. An example from Wikipedia

  11. Ontology Tree for LSCOM

  12. A Light Scale Concept Ontology for Multimedia Understanding (LSCOM-Lite) • The aim is to break the semantic space using a few concepts (39 concepts). • Selection Criteria • Semantic Coverage • As many as semantic concepts in News videos could be covered by the light concept set. • Compactness • These concept should not semantically overlap. • Modelability • These concepts could be modeled with a smaller semantic gap.

  13. Selected concept dimensions • Divide the semantic space into a multimedia-dimensional space, where each dimension is nearly orthogonal • Program Category • Setting/Scene/Site • People • Objects • Activities • Events • Graphics

  14. Histogram of LSCOM-Lite Concepts

  15. Some example keyframes

  16. Applications • Application I: Conceptual Fusion (most basic – early fusion) • Application II: Cross-Category Classification (inter-class relation) • Application III: Event Dynamic in Concept Space

  17. Application I: Conceptual Fusion Concept 1 Concept 2 Video Concept 3 Classifier … Concept n Visual Features

  18. LSCOM 374 Models • 374 LIBSVM models • http://www.ee.columbia.edu/ln/dvmm/columbia374/ • Feature used (MPEG-7 descriptors) • Color Moments • Edge Histogram • Wavelet Texture • LIBSVM – a library for support vector machine at http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  19. Application II: cross-category classification with concept transfer G.-J. Qi et al. Towards Cross-Category Knowledge Propagation for Learning Visual Concepts, in CVPR 2011

  20. Instance-Level Concept Correlation Mountain Castle +1 +1 Mountain and castle Castle only Mountain only -1 -1

  21. Transfer Function Mountain, Castle Mountain Castle None of them

  22. Model Concept Relations

  23. Automatically construct ontology in a data-driven manner

  24. An application III – Event Dynamics in Concept Space

  25. Event Detection with Concept Dynamics W. Jiang et al, Semantic event detection based on visual concept prediction, ICME, Germany, 2008.

  26. Open Problems • Cross-Dataset Gap • Generalize LSCOM dataset to other dataset (e.g., non-news video dataset) • Cross-Domain Gap • Text script associated with news videos • Can help information extraction for visual concepts? • Automatic ontology construction • Task dependent v.s. task independent • Data driven v.s. preliminary knowledge (e.g., WordNet) • Incorporate prior human knowledge (logic relation etc.)

  27. TRECVID Competition • Task 1: High-Level Feature Extraction • Input: subshot • Output: detection results for 39 LSCOM-Lite concepts in the subshot

  28. High-Level Feature Extraction • Each concept assumed to be binary (absent or present) in each subshot • Submission: Find subshots that contain a certain concept, rank them by the detection confidence score, and submit the top 2000. • Evaluations: NIST evaluated 20 medium frequent concepts from 39 concepts using a 50% random samples of all the submission pools

  29. 20 Evaluated Concepts

  30. Evaluation Metric: Average Precision • Relevant subshots should be ranked higher than the irrelevant ones. R is the number of relevant images in total, Rj is the number of relevant images in top j images, Ij indicates if the jth image is irrelevant or not.

  31. Results

  32. TRECVID Competition • Task II: Video Search • Input: text-based 24 topics • Output: relevant subshots in the database

  33. Topics to search

  34. Topics to search (cont’d)

  35. Topics to search

  36. Three Types of Search Systems

  37. Results: Automatic Runs

  38. Results: Manual Runs

  39. Results: Interactive Runs

  40. Machine Problem 7: Shot Boundary Detection in Videos

  41. Goals • Detect the abrupt content changes between consecutive frames. • Scene changes • Scene cuts

  42. Steps • Step 1: Measuring the change of content between video frames • Visual/Acoustic measurements • Step 2: Compare the content distance between successive frames. If the distance is larger than a certain threshold, then a shot boundary may exist.

  43. Measuring Content based on Visual Information • 256 dimensional Color Histogram • In RGB space, normalize the r, g, b in [0,1] • Color space nr 8X8 histogram ng

  44. Color Histograms • Divide each image into four parts, each part has a 8X8 histogram, and 256 dim features in total.

  45. Acoustic Features • 12 cepstral coefficients • Energy (sum of square of raw signals) • Zero crossing rates (ZCR) ZCR = sum(|sign(S(2:N))-sign(S(1:N-1))|) • Hints: normalize energy to avoid it over-dominating when computing distances between successive frames

  46. Datasets • Two videos of little over one minute • Manually label the shot boundary

  47. What to submit • Source code • Report • compare shot boundary detection results returned by your algorithm with the manually labeled boundaries • Compare • Explain your choice of threshold • Explain the differences between the acoustic-based and visual-based detection results

  48. Where and when to submit • Email to ece.ece.ece.417@gmail.com • Due: May 2nd

  49. Thanks! Q&A

More Related