1 / 34

Alex Hauptmann, Howard Wactlar Carnegie Mellon University Pittsburgh, USA October 2004

Informedia Contexture: Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues. Alex Hauptmann, Howard Wactlar Carnegie Mellon University Pittsburgh, USA October 2004. Our M eaning of Contexture.

ilya
Download Presentation

Alex Hauptmann, Howard Wactlar Carnegie Mellon University Pittsburgh, USA October 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Informedia Contexture:Analyzing and Synthesizing Video and Verbal Context for Intelligence Analysis Dialogues Alex Hauptmann, Howard Wactlar Carnegie Mellon University Pittsburgh, USAOctober 2004

  2. Our Meaning of Contexture Definition:The weaving or assembly of [multimedia] parts into a cohesive whole in order to provide a more complete picture or information structure [to both questions and answers] • Interpreting and communicating an associated visual and verbal context to information • May contain language, imagery and gestures • May illuminate the meaning or significance • May explain its circumstances • More like a collegial expert response than an encyclopedic source • Accelerating discovery by both system and analyst • Understanding video perspectives • Subtle opinions, attitudes, biases • Both visual and textual rhetoric • Continuously updating video biographs and event timelines

  3. . . . . . . . . . . . . . . . . Foreign Sources . . . . . . . . . . . . . . . . Domestic Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Synthesizedvideo clips Interviewee Extract Semantic Data Context Analysis Monologue Word Quotation • Semantic relations on entities • Harden with structured data • Perspective interpretation • Video ontology Visual Quotation People Association • Scene classification • Event detection • Title/topic labeling • Named entity extraction • Verify entities with structured data Multiple Multimedia Information Sources StructuredData Biograph data Being an interviewee in a news program as a presidential candidate for Being a CNN Military Analyst on military deployment in preparation for the wa A word quotation of Retired Wesley Clark describing for the president in the Being a CNN Military Analyst on action in the Iraq War. A visual quotation of Wesley Clark (referred commander) describing President Milosevic’s intent in. Wesley Clark (without being mentioned in the video clip) sitting next to Madeleine hearingon a resolution that would direct President Clinton to Contextual info 9/11/2001 3/18/1998 7/2/1999 9/19/2001 3/19/2003 9/17/2003 Visualizations Generate Information Contexture - Understand questions - Provide context-rich answers - Produce video biographs - Enable context-based iterative QA process Analyst’s Profile &QA History Analyst Conceptual Overview 0 01011100101010011010101001100010010011100

  4. Understanding multimediaquestions and their context Extracting information from video sources for finding answers Understanding the biasof source, topical, orrhetorical perspectives Applying broadcast TVnews ontology (joint with USC) Contexturedialogue Integration andevaluation Incorporating video biographs and perspectives intoanswer contextures Learning fromthe analyst Scope of Work

  5. Contexturedialogue Integration andevaluation Learning fromthe analyst Scope of Work Understanding multimediaquestions and their context Extracting information from video sources for finding answers Understanding the biasof source, topical, orrhetorical perspectives Applying broadcast TVnews ontology (joint with USC) Incorporating video biographs and perspectives intoanswer contextures

  6. Contexturedialogue Learning fromthe analyst Scope of Work Understanding multimediaquestions and their context Extracting information from video sources for finding answers Understanding the biasof source, topical, orrhetorical perspectives Applying broadcast TVnews ontology (joint with USC) Integration andevaluation Incorporating video biographs and perspectives intoanswer contextures

  7. Understanding Multimedia Questions • Find shots of Pope John Paul II. • Find shots of a rocket or missile taking off • Find shots of the Tomb of the Unknown Soldier at Arlington National Cemetery. • Find shots of the front of the White House in the day-time with the fountain running.

  8. Multiple Modality Video Analysis Experts Speech Trans. Video OCR Audio Feature Color Feature Semantic Class Filter Texture Feature Weighted Fusion of Similarity Rankings Final Ranked List of Video Shots Automatic Video Retrieval System Multi-modal Query Pope John Paul II Video Library

  9. Query Similarity rankings from multiple experts Finding the combination weights Training Queries Training Data Learn Weights Video Library Offline Online

  10. Query Similarity rankings from multiple experts Finding the combination weights Training Queries Training Data Learn Weights Classify Queries Video Library Offline Online

  11. Query Types for Video Retrieval • Named person queries, possibly with constraints “Find shots of Yasser Arafat“, “Find shots of Ronald Reagan speaking". • Named object queries for an object with a unique name. “Find shots of the Statue of Liberty“, “Find shots of the Mercedes logo". • General object queries for a type of objects. They may be qualified. “Find shots of snow-covered mountains“, “Find shots of one or more cats". • Scene queries for multiple types of objects in certain relationships. “Find shots of roads with lots of vehicles“, “Find shots of people spending leisure time on the beach".

  12. Finding the Combination Weightsfor Merging Search Results • Uniform, fixed weights for all queries • Individual weightings for each query • Not enough known about each query • Weightings for each of 4 query types • Text search usually does better and is more consistent than any other single search modality

  13. Query Classification Query X organization or location name Named-Entityextraction people name Specific Object Q Person Q “Find shots of Capitol Hill” “Find shots of Bill Clinton” no propername multiple NPs POS tagging + NP chunking Scene Q “Find shots with (multiple pedestrians) and (multiple vehicles in motion)” single NP Syntacticparsing no nested NP nested NPs Scene Q General Object Q “Find shots of (a person diving into the water)” “Find shots of (one or more cats)”

  14.  Text Retrieval Retrieval Expert n Retrieval Expert 1 Query Type Hierarchical Mixture of Experts u  l Video Shots Query

  15. Performance of different weighting schemes

  16. Performance of different weighting schemes

  17. Current Limitations • Unable to assign multiple query types to one query • “Finding Bill Clintonspeaking in front of a US flag” (person, object) • Unable to capture the query-specific aspects • “Finding day-time scenes of the Federal Reserve Building, Washington DC”

  18. Understanding multimediaquestions and their context Understanding the biasof source, topical, orrhetorical perspectives Applying broadcast TVnews ontology (joint with USC) Contexturedialogue Integration andevaluation Incorporating video biographs and perspectives intoanswer contextures Learning fromthe analyst Scope of Work Extracting information from video sources for finding answers

  19. Labeling Every Face with aNews Structure Model (NSM) Sources of information: • Audio transcripts + Named Entity extraction • Overlaid text • Speaker audio characteristics • Temporal position of name w.r.t. video segment • Temporal structure of news (“Grammar”) • Constraints based on image similarity • Constraints from speaker audio similarity

  20. shot s Transcript clues exist for anchor OR s is first shot in story N Y anchorname Transcript clues exist for reporter Y N reporter name(s)by distance news-subject name(s)by distance Baseline Algorithm

  21. Overlaid text Rep. NEWT GINGRICH VOCR text rgp nev~j ginuhicij Edit distance to names: Bill Clinton (0.67) Newt Gingrich (0.46) David Ensor (0.72) Saddam Hussein (0.78) Elizabeth Vargas (0.88) Bill Richardson (0.80) Overlaid Text with Video OCR

  22. Detection of Anchors, Reporters and News-Subjects anchor news-subject reporter news-subject reporter news-subject anchor reporter

  23. Image and Audio Similarity Constraints

  24. Naming Accuracy of Different Approaches

  25. Correct Error Visual Gender Classification Haar wavelets Facedetection male male Output Boosting classifiers Feature extraction female male female Original scale

  26. Interface Showing the People Labeled with Names

  27. Understanding multimediaquestions and their context Extracting information from video sources for finding answers Applying broadcast TVnews ontology (joint with USC) Integration andevaluation Incorporating video biographs and perspectives intoanswer contextures Learning fromthe analyst Scope of Work Understanding the biasof source, topical, orrhetorical perspectives Contexturedialogue

  28. Finding Stories with Different Perspectives

  29. Show length and shot type 2 min 24 sec 22 sec 12 sec

  30. FOX and CNN news coverage of David Kay Report on the search for WMD in Iraq. Perspective of broadcaster can be seen in text overlay

  31. FOX and CNN news coverage of David Kay Report on the search for WMD in Iraq. FOX uses faster cut rate and has more participation by the anchor

  32. Understanding multimediaquestions and their context Extracting information from video sources for finding answers Understanding the biasof source, topical, orrhetorical perspectives Applying broadcast TVnews ontology (joint with USC) Integration andevaluation Incorporating video biographs and perspectives intoanswer contextures Learning fromthe analyst Scope of Work Contexturedialogue

  33. Metrics-based Evaluations NIST TRECVID 2004 Video Search Evaluation • Submitted classification results for 10 different semantic features • Similar to a “Routing Task” for video clips • Submitted Informedia system video search answers for • Interactive runs comparing expert/novice users • Interactive runs using either complete or only visual information • Automatic/Manual runs contrasting components of the system Results to be announced later in October ...

  34. Thank you Carnegie Mellon University Pittsburgh, PA USA

More Related