150 likes | 244 Views
The START Information Access System. Boris Katz http://www.ai.mit.edu/projects/infolab/. The Problem:. Finding information on line Two Approaches: 1. Keyword search (search engines, e.g., AltaVista) 2. Natural language processing. What’s Wrong with Keyword Search?.
E N D
The START InformationAccess System Boris Katz http://www.ai.mit.edu/projects/infolab/
The Problem: • Finding information on line • Two Approaches: • 1. Keyword search (search engines, e.g., AltaVista) • 2. Natural language processing
What’s Wrong with Natural Language Processing (today)? • 1. Too hard • Full-text NL understanding still beyond reach • Intersentential reference • Paraphrasing • Summarization • Common sense implication • 2. Too slow • 3. Not all information is language • Most Web resources are not textual • Maps and Images • Sound and Video • Multimedia • Web resources are distributed across numerous non-traditional databases
What is START? • START (SynTactic Analysis using Reversible Transformations) provides multimedia information access using natural language. • Natural language • Natural language is human language. You don’t have to learn a special language to use START. Ask your questions in English; enter information using English. • Multimedia access using natural language annotations • START lets you use English to access any kind of information: text, pictures, movies, and more. • “Just the right information” • START gives you the answer you want without including a thousand others. • Virtual collaboration • START retrieves information from its own knowledge base and from databases all over the Web.
Natural Language • Natural language is human language. You don’t have to learn a special language to use START. Ask your questions in English; enter information using English
Multimedia Access Using Natural Language Annotations • START lets you use English to access anykind of information: text, pictures, movies, and more.
Just the Right Information • START gives you the answer you want without including a thousand other answers.
Virtual Collaboration • START retrieves information from its own knowledge base and from databases all over the Web.
Natural Language Annotations • Bridge the gap between our ability to analyze natural language sentences and other information and our desire to access the huge amount of data now available on the Web. • Annotations are collections of natural language sentences and phrases that describe the content of various information segments. • START • analyzes these annotations • creates the necessary representational structures • produces special pointers to the information segments summarized by the annotations.
Natural Language Annotations Document Annotation + Xxx xx xx xxx xxxx x “Neptune was discovered using mathematics.” START Server START Server Xxx xx xxxx xx xx xxxxx x xxx xxx x xxx x xxx START Server START Server Information Provider (negotiation) Question “How was Neptune discovered?” (submitted) Information Seeker (retrieved) Document Xxx xx xx xxx xxxx x Xxx xx xxxx xx xx xxxxx x xxx xxx x xxx x xxx
Uniform Access NL questions IMDb Queries U.S. Census START Omnibase Fortune500 Data Multimedia responses POTUS HPKB • Local knowledge base of ternary expressions • Core vocabulary • Uniform interface to multiple database formats (Web, text, etc.) • Extended lexicon
How START Works Omnibase (external knowledge) Scripts Potus IMDb U.S. Census World Factbook WWW Web browser START HTML English English Scripts Parser Generator Input T-exps Matcher Annotations Native knowledge T-exps from KB Database of T-exps