450 likes | 627 Views
A Framework for the Representation and Integration of Multimedia Content and Context Information. Radu S. Jasinschi Philips Research. Overview. Introduction Related work Problem statement Proposed formalism Representation: content and context Multimodal integration:
E N D
A Framework for the Representation and Integrationof Multimedia Content and Context Information Radu S. JasinschiPhilips Research ECE-CMU, April 29, 2002
Overview • Introduction • Related work • Problem statement • Proposed formalism • Representation: content and context • Multimodal integration: • Bayesian networks: content information • Hierarchical priors: context information • Application: Video Scout • Experiments • Conclusion ECE-CMU, April 29, 2002
Introduction • Market facts: • Digital video consumption: 300 + channels • Personalized Video Recorders: next wave • Web search engines: exponential grow in multimedia information • Research needs: • Content-based video analysis and retrieval of multimedia information • High-level video content information indexing • Proposed framework: • Content and context information • Structured representation • probabilistic integration ECE-CMU, April 29, 2002
Related Work • Video databases • QBIC (IBM) • Informedia (CMU) • Virage • VideoQ (Columbia University) • Probabilistic Methods • M. Naphade (IBM) • N. Vasconcelos (COMPAQ) • Speech driven applications • C. Neti (IBM) • T. Chen (CMU) ECE-CMU, April 29, 2002
Problem Statement • How do we segment, index, and, store many hours of video from 300 + TV channels? • How do we represent and integratemultimodal information? ECE-CMU, April 29, 2002
Overview • Introduction • Related work • Problem statement • Proposed formalism: • Representation: content and context • Multimodal integration: • Bayesian networks: content information • Hierarchical priors: context information • Application: Video Scout • Experiments • Conclusion ECE-CMU, April 29, 2002
Proposed Formalism • Structured multimedia representation • Content information • Granularity • Abstraction • Context information • Probabilistic method of multimodal information integration • Bayesian networks • Hierarchical priors ECE-CMU, April 29, 2002
Multimedia Content Information • Multimedia content: objects • Three modalities • visual: shots, faces, trees, etc. • audio: speech, music, etc. • text: transcript, keywords, etc. • Structured content representation • Levels of granularity and abstraction • Allows for the consistent representation and integration of multimedia content information ECE-CMU, April 29, 2002
Structured Content Representation • Content granularity: levels of detail • Content abstraction: semantic information ECE-CMU, April 29, 2002
Multimedia Context Information • Context information • Underlying structure, signature or patterns • Supports an interpretation but it is not and interpretation itself • Can be used to constraint the content information, reducing the number of possible interpretations • Content versus context information ECE-CMU, April 29, 2002
Multimedia Context Taxonomy ECE-CMU, April 29, 2002
Semantic (Textual) Context • Formalized in the linguistic domain • Example: the proposition P (“Holmes is a detective”) has an ambiguous meaning • Knowledge of its semantic context, in this case the stories of Sherlock Holmes, disambiguates the statement • Formalization: ist (context-of (“Sherlock Holmes stories”, “Holmes is a detective”)) • General structure ist(C, P), where C is the context ECE-CMU, April 29, 2002
Multimedia Context • Visual context taxonomy ECE-CMU, April 29, 2002
Multimedia Context • Audio context taxonomy ECE-CMU, April 29, 2002
Multimodal Integration • Combine evidence: robustness • Use all modalities: visual, audio, text • Integrate content information • Integrate content and context ECE-CMU, April 29, 2002
Probabilistic Framework • Bayesian network • Integrate content information at the same granularity level • intra-modality: same mode, different attributes • inter-modality: different mode and attributes • Link different levels of granularity • Hierarchical priors • Integrate content and context • Context use as “prior” information to content ECE-CMU, April 29, 2002
Bayesian Network: Example ECE-CMU, April 29, 2002
Bayesian Network: Elements • Directed acyclic graph • Conditional probability • Joint probability distribution ECE-CMU, April 29, 2002
Hierarchical Priors: Example ECE-CMU, April 29, 2002
Hierarchical Priors: Elements • Chapman-Kolmogoroff equation • Nested priors ECE-CMU, April 29, 2002
Content and Context Layers • Combine Bayesian networks and hierarchical priors ECE-CMU, April 29, 2002
Overview • Introduction • Related work • Problem statement • Proposed formalism: • Representation: content and context • Multimodal integration: • Bayesian networks: content information • Hierarchical priors: context information • Application: Video Scout • Experiments • Conclusion ECE-CMU, April 29, 2002
Application: Video Scout • End-to-end system prototype of personal video recorder • Input • Broadcast TV program video • Electronic program guide (EPG) • Personal profiles: program (PPP) and content (CPP) • Output • Segmented and indexed TV program segments by topics ECE-CMU, April 29, 2002
Video Scout: Overview ECE-CMU, April 29, 2002
Content and Context Layers ECE-CMU, April 29, 2002
TV Programs • Domain structure • Commercials versus program parts • Commercials: short (~30sec.), fast pace • Program: long (> 5min.), specific structure • Multimodal (visual, audio, and transcript) information • Structural correlation • Stochastic nature of multimedia information ECE-CMU, April 29, 2002
PSS Frames 11 PSS 12 Program sub-segments PS 1 PSS 1N1 COMM PS 2 Program segments COMM PS 3 COMM PS 4 TV Program Structure Commercials ECE-CMU, April 29, 2002
Overview • Introduction • Related work • Problem statement • Proposed formalism: • Representation: content and context • Multimodal integration: • Bayesian networks: content information • Hierarchical priors: context information • Application: Video Scout • Experiments • Conclusion ECE-CMU, April 29, 2002
Experiments • Input • 9 TV programs (~6 hrs.) • Financial news and talk shows • Features • Visual: keyframes, visual text, faces • Audio: noise (No), speech (Sp), music (Mu), Sp+Mu, Sp+Sp, and Sp+Mu • Transcript (close captioning): 20 categories • Output: TV program segments and their classification according to topics ECE-CMU, April 29, 2002
Algorithm for Segmentation and Indexing 1. Commercial segmentation 2. Program sub-segment (PSS) generation 3. Frame-based probability generation 4. PSS probabilities’ computation: P_AUDIO_FIN, P_AUDIO_TALK, P_CC_FIN, P_CC_TALK P_FACETEXT_FIN, P_FACETEXT_TALK 5. Combine PSS with context probabilities 6. Compute joint probabilities: P_FIN_TOPIC, P_TALK_TOPIC ECE-CMU, April 29, 2002
Example: Letterman • CC Categories ECE-CMU, April 29, 2002
CC Categories Example: Letterman PSS # 12 • Mid-level audio probabilities • Mid-level visual features’ probabilities ECE-CMU, April 29, 2002
PSS Content Probabilities • PSS # = 12, start_time = 23614, end_time = 24727 (frames) • Visual • P_V_FACE = 0.91, P_V_TEXT = 0.09 • Audio • P_NOISE = 0.11, P_SPEECH = 0.74, P_MUSIC = 0.00, P_SPEECH + NOISE = 0.00, • P_SPEECH + SPEECH = 0.00, P_SPEECH + MUSIC = 0.15 • Transcript (Close Captions) • P_CC_WEATHER = 0.20, P_CC_INTERNATIONAL = 0.20, P_CC_CRIME = 0.00, P_CC_SPORT = 0.20, P_CC_MOVIE = 0.20, P_CC_FASHION = 0.00, • P_CC_TECH_STOCK = 0.00, P_CC_MUSIC = 0.00, P_CC_AUTOMOBILE = 0.00, P_CC_WAR = 0.00, P_CC_ECONOMY = 0.20, P_CC_ENERGY = 0.00, • P_CC_STOCK = 0.00, P_CC_VIOLENCE = 0.00, P_CC_FINANCIAL = 0.00, P_CC_NATIONAL = 0.00, P_CC_BIOTECH = 0.00, P_CC_DISASTER = 0.00, • P_CC_ART = 0.00, P_CC_POLITICS = 0.00 ECE-CMU, April 29, 2002
Audio Genre Context Probabilities ECE-CMU, April 29, 2002
Visual Genre Context Probabilities ECE-CMU, April 29, 2002
Audio Genre Context Extraction 1. Select TV programs of a known genre 2. Segment commercials 3. Tessellate the program part into units, such as the PSS based on close captions 4.Determine a probability for each PSS based on the vote/probability table 5. Sum up the votes for each vote/probability 6. Select the best vote/probability: context (probability) pattern ECE-CMU, April 29, 2002
Vote/Probability Table and Results ECE-CMU, April 29, 2002
Vote/Probability Results: News ECE-CMU, April 29, 2002
Combining Content & Context • Final multimodal joint probabilities: P_FACETEXT_FIN = 0.0, P_FACETEXT_TALK = 1.0 P_AUDIO_FIN = 0.0, P_AUDIO_TALK = 1.0 P_CC_CAT_FIN = 0.5, P_CC_CAT_TALK = 0.5 • Final joint topic probabilities: P_FIN-TOPICS = 0.0, P_TALK-TOPICS = 0.5 • Accumulated classification results for first 12 segments: Class.: # of FIN SEGS = 2, # of TALK SEGS = 10, Comm. = 0 ECE-CMU, April 29, 2002
Classification Results: Content and Context Integration Precision: 91.4%, Recall: 85.7% Precision: 81.1%, Recall: 86.9% ECE-CMU, April 29, 2002
Classification Results: Financial News with and without Integration With context/content integration No context/content integration ECE-CMU, April 29, 2002
Classification Results: Talk Show with and without Integration With context/content integration No context/content integration ECE-CMU, April 29, 2002
Conclusion • Novel multimedia framework: • Representation: • Content data tessellation: granularity • Content semantic structure: abstraction • multimedia context • Multilayered content/context structure • Multimodal integration: • Context and context • probabilistic method: • Bayesian networks • hierarchical priors • Video Scout: beyond the TiVo paradigm ECE-CMU, April 29, 2002
Achievements • Exhibitions • Philips Corporate Research Exhibition (CRE) 2001 • ICME 2000 Exhibition • ACM 2000 Exhibition • Customer presentation 2000 • 7 papers • 5 International conferences (presented) • 1 International conference (accepted) • 1 Journal paper (submitted) • 14 Patents (filed) ECE-CMU, April 29, 2002
Acknowledgement • CIM team that collaborated in this work: • Nevenka Dimitrova • Lalitha Agnihotri • Jennifer Louie • Thomas McGee • Radu Jasinschi • Dongge Li • Mei Shi • John Zimmerman ECE-CMU, April 29, 2002