340 likes | 481 Views
Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues. Alex Hauptmann, Howard Wactlar Carnegie Mellon University Pittsburgh, USA October 2004. Our M eaning of Contexture.
E N D
Informedia Contexture:Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard Wactlar Carnegie Mellon University Pittsburgh, USAOctober 2004
Our Meaning of Contexture Definition:The weaving or assembly of [multimedia] parts into a cohesive whole in order to provide a more complete picture or information structure [to both questions and answers] • Interpreting and communicating an associated visual and verbal context to information • May contain language, imagery and gestures • May illuminate the meaning or significance • May explain its circumstances • More like a collegial expert response than an encyclopedic source • Accelerating discovery by both system and analyst • Understanding video perspectives • Subtle opinions, attitudes, biases • Both visual and textual rhetoric • Continuously updating video biographs and event timelines
. . . . . . . . . . . . . . . . Foreign Sources . . . . . . . . . . . . . . . . Domestic Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Synthesizedvideo clips Interviewee Extract Semantic Data Context Analysis Monologue Word Quotation • Semantic relations on entities • Harden with structured data • Perspective interpretation • Video ontology Visual Quotation People Association • Scene classification • Event detection • Title/topic labeling • Named entity extraction • Verify entities with structured data Multiple Multimedia Information Sources StructuredData Biograph data Being an interviewee in a news program as a presidential candidate for Being a CNN Military Analyst on military deployment in preparation for the wa A word quotation of Retired Wesley Clark describing for the president in the Being a CNN Military Analyst on action in the Iraq War. A visual quotation of Wesley Clark (referred commander) describing President Milosevic’s intent in. Wesley Clark (without being mentioned in the video clip) sitting next to Madeleine hearingon a resolution that would direct President Clinton to Contextual info 9/11/2001 3/18/1998 7/2/1999 9/19/2001 3/19/2003 9/17/2003 Visualizations Generate Information Contexture - Understand questions - Provide context-rich answers - Produce video biographs - Enable context-based iterative QA process Analyst’s Profile &QA History Analyst Conceptual Overview 0 01011100101010011010101001100010010011100
Understanding multimediaquestions and their context Extracting information from video sources for finding answers Understanding the biasof source, topical, orrhetorical perspectives Applying broadcast TVnews ontology (joint with USC) Contexturedialogue Integration andevaluation Incorporating video biographs and perspectives intoanswer contextures Learning fromthe analyst Scope of Work
Contexturedialogue Integration andevaluation Learning fromthe analyst Scope of Work Understanding multimediaquestions and their context Extracting information from video sources for finding answers Understanding the biasof source, topical, orrhetorical perspectives Applying broadcast TVnews ontology (joint with USC) Incorporating video biographs and perspectives intoanswer contextures
Contexturedialogue Learning fromthe analyst Scope of Work Understanding multimediaquestions and their context Extracting information from video sources for finding answers Understanding the biasof source, topical, orrhetorical perspectives Applying broadcast TVnews ontology (joint with USC) Integration andevaluation Incorporating video biographs and perspectives intoanswer contextures
Understanding Multimedia Questions • Find shots of Pope John Paul II. • Find shots of a rocket or missile taking off • Find shots of the Tomb of the Unknown Soldier at Arlington National Cemetery. • Find shots of the front of the White House in the day-time with the fountain running.
… Multiple Modality Video Analysis Experts Speech Trans. Video OCR Audio Feature Color Feature Semantic Class Filter Texture Feature Weighted Fusion of Similarity Rankings Final Ranked List of Video Shots Automatic Video Retrieval System Multi-modal Query Pope John Paul II Video Library
Query Similarity rankings from multiple experts Finding the combination weights Training Queries Training Data Learn Weights Video Library Offline Online
Query Similarity rankings from multiple experts Finding the combination weights Training Queries Training Data Learn Weights Classify Queries Video Library Offline Online
Query Types for Video Retrieval • Named person queries, possibly with constraints “Find shots of Yasser Arafat“, “Find shots of Ronald Reagan speaking". • Named object queries for an object with a unique name. “Find shots of the Statue of Liberty“, “Find shots of the Mercedes logo". • General object queries for a type of objects. They may be qualified. “Find shots of snow-covered mountains“, “Find shots of one or more cats". • Scene queries for multiple types of objects in certain relationships. “Find shots of roads with lots of vehicles“, “Find shots of people spending leisure time on the beach".
Finding the Combination Weightsfor Merging Search Results • Uniform, fixed weights for all queries • Individual weightings for each query • Not enough known about each query • Weightings for each of 4 query types • Text search usually does better and is more consistent than any other single search modality
Query Classification Query X organization or location name Named-Entityextraction people name Specific Object Q Person Q “Find shots of Capitol Hill” “Find shots of Bill Clinton” no propername multiple NPs POS tagging + NP chunking Scene Q “Find shots with (multiple pedestrians) and (multiple vehicles in motion)” single NP Syntacticparsing no nested NP nested NPs Scene Q General Object Q “Find shots of (a person diving into the water)” “Find shots of (one or more cats)”
Text Retrieval Retrieval Expert n Retrieval Expert 1 Query Type Hierarchical Mixture of Experts u l Video Shots Query
Current Limitations • Unable to assign multiple query types to one query • “Finding Bill Clintonspeaking in front of a US flag” (person, object) • Unable to capture the query-specific aspects • “Finding day-time scenes of the Federal Reserve Building, Washington DC”
Understanding multimediaquestions and their context Understanding the biasof source, topical, orrhetorical perspectives Applying broadcast TVnews ontology (joint with USC) Contexturedialogue Integration andevaluation Incorporating video biographs and perspectives intoanswer contextures Learning fromthe analyst Scope of Work Extracting information from video sources for finding answers
Labeling Every Face with aNews Structure Model (NSM) Sources of information: • Audio transcripts + Named Entity extraction • Overlaid text • Speaker audio characteristics • Temporal position of name w.r.t. video segment • Temporal structure of news (“Grammar”) • Constraints based on image similarity • Constraints from speaker audio similarity
shot s Transcript clues exist for anchor OR s is first shot in story N Y anchorname Transcript clues exist for reporter Y N reporter name(s)by distance news-subject name(s)by distance Baseline Algorithm
Overlaid text Rep. NEWT GINGRICH VOCR text rgp nev~j ginuhicij Edit distance to names: Bill Clinton (0.67) Newt Gingrich (0.46) David Ensor (0.72) Saddam Hussein (0.78) Elizabeth Vargas (0.88) Bill Richardson (0.80) Overlaid Text with Video OCR
Detection of Anchors, Reporters and News-Subjects anchor news-subject reporter news-subject reporter news-subject anchor reporter
Correct Error Visual Gender Classification Haar wavelets Facedetection male male Output Boosting classifiers Feature extraction female male female Original scale
Understanding multimediaquestions and their context Extracting information from video sources for finding answers Applying broadcast TVnews ontology (joint with USC) Integration andevaluation Incorporating video biographs and perspectives intoanswer contextures Learning fromthe analyst Scope of Work Understanding the biasof source, topical, orrhetorical perspectives Contexturedialogue
Show length and shot type 2 min 24 sec 22 sec 12 sec
FOX and CNN news coverage of David Kay Report on the search for WMD in Iraq. Perspective of broadcaster can be seen in text overlay
FOX and CNN news coverage of David Kay Report on the search for WMD in Iraq. FOX uses faster cut rate and has more participation by the anchor
Understanding multimediaquestions and their context Extracting information from video sources for finding answers Understanding the biasof source, topical, orrhetorical perspectives Applying broadcast TVnews ontology (joint with USC) Integration andevaluation Incorporating video biographs and perspectives intoanswer contextures Learning fromthe analyst Scope of Work Contexturedialogue
Metrics-based Evaluations NIST TRECVID 2004 Video Search Evaluation • Submitted classification results for 10 different semantic features • Similar to a “Routing Task” for video clips • Submitted Informedia system video search answers for • Interactive runs comparing expert/novice users • Interactive runs using either complete or only visual information • Automatic/Manual runs contrasting components of the system Results to be announced later in October ...
Thank you Carnegie Mellon University Pittsburgh, PA USA