210 likes | 362 Views
Digital Video Library. Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002 Howard D. Wactlar Carnegie Mellon University, USA. Outline. Goals for QA from multimedia Background Informedia Information extraction Determining answer information
E N D
Digital Video Library Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002 Howard D. Wactlar Carnegie Mellon University, USA
Outline • Goals for QA from multimedia • Background • Informedia • Information extraction • Determining answer information • Presenting the answer and follow-up
Why is Multimedia Important • TV and radio broadcasts record human events across the globe • Broadcast interviews, analysis and opinions created globally provide varied interpretive perspectives and context • Images of people, events, maps and charts provide additional content not conveyed orally • May be correlated with the spoken words • Some pictures are worth a thousand words
Annual Video and Audio Production Commercial • 4500 motion pictures -> 9,000 hours/year (4.5 TB) • 33,000 TV stations x 4 hrs/day -> 48,000,000 hrs/yr (24,000 TB) • 44,000 radio stations x 4 hrs/day -> 65,500,000 hrs/yr (3,275 TB) Personal • Photographs: 80 billion images -> 410,000 TB/yr • Home videos: 1.4 billion tapes -> 300,000 TB/yr • X-rays: 2 billion -> 17,000 TB/yr Surveillance • Airports: 14,000 terminals x 140 cameras x 24 hrs/day -> 48 M hrs/day
Establishment of large video libraries as a network searchable information resource Background Exploit operational Informedia DVL infrastructure and technology Mission: Enable Search and Discovery in the Video Medium REQUIREMENTS: • Automated process for information extraction from video • Full-content search and retrieval from any spoken language and visual document • APPROACH: • Integration of machine speech, image and natural language understanding for library creation and exploration
Information Exploration & Discovery Information Collection & Analysis OFFLINE ONLINE Requested Segment or Summarization Broadcast TV Radio Surveillance MultimodalQueries Digital Encoding 0 1 1 1 0 1 0 Indexed Database Indexed Segmented Transcript Compressed Audio/Video 1 0 0 0 1 1 0 Browsingand Query Refinement & Images Processing Analyst ImageAnalysis Speech Analysis Indexing Entity ExtractionFace, OCR Text Recognition Distribution To Users Relevant Result Set Informedia System Architecture
Related Language Processing Work • MUC, DUC, TREC especially QA track • Pronoun and Anaphora resolution • Part-of-speech tagging • Fact extraction • Summarization • Question-answering …Electronic text focus
Why is Multimedia Hard • It’s a fundamentally linear, temporal medium • Speech, image and language understanding are all errorful, ambiguous and incomplete • Information must be time-synchronized and correlated across modalities for both produced and natural video • Verbal content lacks: • sentence boundaries, • punctuation, • capitalization …that enables a syntactic analysis • Image recognition w/o known context is very limited • Many errors from many sources!
Why We Think the Problems are Trackable • Lot’s of data enables LEARNING systems • Have shown complete or perfect information is not necessary • Utilize multiple sources of information jointly: • text, image, audio, web text and databases
Research Focus • Determining the answer information • Resolving co-references • Discovering semantic relations • Learning Information flow • Hardening uncertain information • Organizing and presenting the answer result • Text summaries • Augmenting contextual material • Maps, charts and images to allow follow-up questions • Explicit representation of uncertainty
Resolving Co-references • When is the same person mentioned (or seen, or identified) • Places referenced (in words, on signs, on maps) • Organizations cited (verbally, on signage, in charts) • Requires: • Pronoun resolution • Merge multiple spellings, abbreviations and contractions • Merge across media (OCR, audio, text, faces)
Mining Links and Learning Semantic Relations • Visualize co-occurrence in documents, in location, in time • Location can be variably sized regions • Times can be arbitrary periods • Finding semantic roles for related named entities • Dr. X is CEO of company Y
Active Hardening of Evidence • Extracted information is noisy • Acquire new supporting or falsifying evidence from other sources (web) • On-demand or • Automatically when original evidence is weak …Result is higher fidelity information
Hidden Hidden Source 1 Source 2 Wiretap 1 (Saudi Arabia) Learning Information Flow Information flow Tightly correlated 3-6 days Conditional information flow Hidden CNN ABC Source 3 3-6 days News on Middle East, 407 days Hidden Source 4 Radio Duetsch Welle(Germany) Lifestyle news Radio Tehran (Iran)
Learning Information Flow • Where did a fact originate? • Multiple sources report facts over time, with small changes • E.g. Different newspapers get the same story from AP or Reuters source. Story ‘looks’ different. • Imagery frequently is reused as well • Columbia’s Newsblaster exploits this idea for summarization of the core story sentences
Integrated Analysis Environment • Summarize multimedia information visually and textually • Allow explicit display of and control over acceptable level of uncertainty • Show link structure of entities and relations • Interactive visualization for drill-down and follow-up
Strategic Advantages of Multimedia Analysis and Response • Collect Large Amounts of Data • Learning Approaches • Leverage across media types • Perfection is not necessary (80% solution may be ok) • User in the loop filters remaining errors • Effective interfaces and visualizations