1 / 21

Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002

Digital Video Library. Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002 Howard D. Wactlar Carnegie Mellon University, USA. Outline. Goals for QA from multimedia Background Informedia Information extraction Determining answer information

gizi
Download Presentation

Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital Video Library Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002 Howard D. Wactlar Carnegie Mellon University, USA

  2. Outline • Goals for QA from multimedia • Background • Informedia • Information extraction • Determining answer information • Presenting the answer and follow-up

  3. Why is Multimedia Important • TV and radio broadcasts record human events across the globe • Broadcast interviews, analysis and opinions created globally provide varied interpretive perspectives and context • Images of people, events, maps and charts provide additional content not conveyed orally • May be correlated with the spoken words • Some pictures are worth a thousand words

  4. Annual Video and Audio Production Commercial • 4500 motion pictures -> 9,000 hours/year (4.5 TB) • 33,000 TV stations x 4 hrs/day -> 48,000,000 hrs/yr (24,000 TB) • 44,000 radio stations x 4 hrs/day -> 65,500,000 hrs/yr (3,275 TB) Personal • Photographs: 80 billion images -> 410,000 TB/yr • Home videos: 1.4 billion tapes -> 300,000 TB/yr • X-rays: 2 billion -> 17,000 TB/yr Surveillance • Airports: 14,000 terminals x 140 cameras x 24 hrs/day -> 48 M hrs/day

  5. Establishment of large video libraries as a network searchable information resource Background Exploit operational Informedia DVL infrastructure and technology Mission: Enable Search and Discovery in the Video Medium REQUIREMENTS: • Automated process for information extraction from video • Full-content search and retrieval from any spoken language and visual document • APPROACH: • Integration of machine speech, image and natural language understanding for library creation and exploration

  6. Information Exploration & Discovery Information Collection & Analysis OFFLINE ONLINE Requested Segment or Summarization Broadcast TV Radio Surveillance MultimodalQueries Digital Encoding 0 1 1 1 0 1 0 Indexed Database Indexed Segmented Transcript Compressed Audio/Video 1 0 0 0 1 1 0 Browsingand Query Refinement & Images Processing Analyst ImageAnalysis Speech Analysis Indexing Entity ExtractionFace, OCR Text Recognition Distribution To Users Relevant Result Set Informedia System Architecture

  7. Related Language Processing Work • MUC, DUC, TREC especially QA track • Pronoun and Anaphora resolution • Part-of-speech tagging • Fact extraction • Summarization • Question-answering …Electronic text focus

  8. Why is Multimedia Hard • It’s a fundamentally linear, temporal medium • Speech, image and language understanding are all errorful, ambiguous and incomplete • Information must be time-synchronized and correlated across modalities for both produced and natural video • Verbal content lacks: • sentence boundaries, • punctuation, • capitalization …that enables a syntactic analysis • Image recognition w/o known context is very limited • Many errors from many sources!

  9. Why We Think the Problems are Trackable • Lot’s of data enables LEARNING systems • Have shown complete or perfect information is not necessary • Utilize multiple sources of information jointly: • text, image, audio, web text and databases

  10. Research Focus • Determining the answer information • Resolving co-references • Discovering semantic relations • Learning Information flow • Hardening uncertain information • Organizing and presenting the answer result • Text summaries • Augmenting contextual material • Maps, charts and images to allow follow-up questions • Explicit representation of uncertainty

  11. Resolving Co-references • When is the same person mentioned (or seen, or identified) • Places referenced (in words, on signs, on maps) • Organizations cited (verbally, on signage, in charts) • Requires: • Pronoun resolution • Merge multiple spellings, abbreviations and contractions • Merge across media (OCR, audio, text, faces)

  12. Mining Links and Learning Semantic Relations • Visualize co-occurrence in documents, in location, in time • Location can be variably sized regions • Times can be arbitrary periods • Finding semantic roles for related named entities • Dr. X is CEO of company Y

  13. Active Hardening of Evidence • Extracted information is noisy • Acquire new supporting or falsifying evidence from other sources (web) • On-demand or • Automatically when original evidence is weak …Result is higher fidelity information

  14. Hidden Hidden Source 1 Source 2 Wiretap 1 (Saudi Arabia) Learning Information Flow Information flow Tightly correlated 3-6 days Conditional information flow Hidden CNN ABC Source 3 3-6 days News on Middle East, 407 days Hidden Source 4 Radio Duetsch Welle(Germany) Lifestyle news Radio Tehran (Iran)

  15. Learning Information Flow • Where did a fact originate? • Multiple sources report facts over time, with small changes • E.g. Different newspapers get the same story from AP or Reuters source. Story ‘looks’ different. • Imagery frequently is reused as well • Columbia’s Newsblaster exploits this idea for summarization of the core story sentences

  16. Integrated Analysis Environment • Summarize multimedia information visually and textually • Allow explicit display of and control over acceptable level of uncertainty • Show link structure of entities and relations • Interactive visualization for drill-down and follow-up

  17. Strategic Advantages of Multimedia Analysis and Response • Collect Large Amounts of Data • Learning Approaches • Leverage across media types • Perfection is not necessary (80% solution may be ok) • User in the loop filters remaining errors • Effective interfaces and visualizations

  18. Digital Video Library

More Related