320 likes | 432 Views
A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources. Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer Science and Artificial Intelligence Laboratory October 8, 2004. MIT AQUAINT Phase 2 Focus. Moving Forward. Question answering today…
E N D
A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer Science and Artificial Intelligence Laboratory October 8, 2004
MIT AQUAINT Phase 2 Focus Moving Forward • Question answering today… • Mostly focused on simple questions • Driven by IR and named-entity detection • One-shot interactions: “context free” • Focused on textual documents • Future directions • More complex questions • Deeper semantic processing • Knowledge from multiple resources • Extended user interactions: “scenario-based QA” • Multimodal QA: retrieving audio and video
Project Goals • Develop advanced QA capabilities • Push the envelope in NLP technology • Create natural user-system interactions • Provide seamless access to heterogeneous data • Fuse knowledge from multiple resources • Integrate linguistic, statistical, and knowledge-based strategies • Build a comprehensive end-to-end QA system • Focus on deployment in real-world environments • Contribute to theories of knowledge representation and language comprehension
Top Layer: Understanding Language • Coordinate natural language interactions with users • Primary responsibilities: • Analyze natural language sentences • Disambiguate user information needs interactively • Manage discourse and dialog
Bottom Layer: Accessing Resources • Complex questions require multiple heterogeneous resources to answer • Our solution: OmniStore, a uniform knowledge repository based on ternary expressions • Sources of knowledge: • Structured and semi-structured databases • Syntactic and semantic relations automatically extracted from free text • Natural language annotations attached to opaque knowledge segments
Middle Layer: Connecting the Pieces • Bridge the gap between questions and knowledge required to answer those questions • Knowledge fusion and complex reasoning: • Decompose complex questions into combinations of simpler questions • Efficiently access resources required to answer individual questions • Combine smaller “nuggets of knowledge” into a coherent response
In This Presentation The beginnings of... • Explicit, syntactically-based decomposition of questions • Using syntactic cues to decompose questions into combinations of simpler questions • Answering simpler questions with different resources • Implicit, semantically-based decomposition of questions • Applying domain rules to decompose questions into combinations of simpler questions • Answering simpler questions using the CNS WMD Terrorism Database • Managing extended user interactions • Creating more natural dialog by handling ellipsis
MIT AQUAINT QA Server START+ IMPACT+ Omnibase+ WMD Terrorism database Infoplease Biography.com WorldBook
Answering Complex Questions • Syntactically decomposing questions: • Semantically decomposing questions: How many people live in the capital of the third largest Asian country? What is the third largest Asian country? What is its capital? How many people live there? Could HAMAS carry out an attack in the United States with biological agents? Does HAMAS have the expertise to carry out an attack using biological agents? Does HAMAS have the motivation to carry out an attack in the United States?
1. What is the 3rd largest Asian country? 3. How many people live in Almaty? 2. What is the capital of Kazakhstan? ANSWER = Kazakhstan ANSWER = Almaty ANSWER = 1.2 million Syntactic Decomposition • Parse questions into nested ternary expressions • Successively resolve groups of ternary expressions containing unbound variables • Answer sub-questions by replacing variables with values How many people live in the capital of the 3rd largest Asian country?
country+9813 = Kazakhstan The third largest Asian country is Kazakhstan. < < people+9814 live > in capital+9815 > < people+9814 quantity *numeral* > < capital+9815 related-to Kazakhstan > capital+9815 = Almaty The capital of Kazakhstan is Almaty. *numeral* = 1.2 million < < people+9814 live > in Almaty > < people+9814 quantity *numeral* > The population of Almaty is 1.2 million. A Complete Example How many people live in the capital of the 3rd largest Asian country? < < people+9814 live > in capital+9815 > < people+9814 quantity *numeral* > < capital+9815 related-to country+9813 > < country+9813 is Asian > < country+9813 is largest+9816 > < largest+9816 mod third >
Ellipsis • There are three NPs in the previous query. Which one should be replaced? • START employs linguistic and ontological knowledge to resolve ambiguities: • Lexical semantic properties of English nouns • Reasoning over relevant domain knowledge "What country in Africa has the largest population?" "How about area?" possible antecedents X "country" "Africa" "population" X "area"
A Visualization Natural Language Questions Symbolic Queries Syntactic and Semantic Decomposition using Domain Knowledge Resourcen Resource1 Individual Resources Resource2 …
Knowledge Templates • stylized natural language “wrappers” around selected database fields In [1995], [religious cult] [Aum Supreme Truth] carried out a [use of agent] in [Japan], involving [chemical agent] [sarin].
Query Arguments In [1995], [religious cult] [Aum Supreme Truth] carried out a [use of agent] in [Japan], involving [chemical agent] [sarin]. constants (e.g., "1995") unnamed variables (e.g., "something") named variables (e.g., "some year") restricted variables (e.g., "some year (> 1990)") reported variables (e.g., "what year") ... Each field can be similarly treated…
From Language to Queries • Many natural language questions can be represented by the same knowledge template “Could the KKK be involved in an attack using biological weapons?” “Could an attack be carried out in Italy involving chemical weapons?” “Are any groups trying to conduct an attack in the United States?” “What groups will be able to carry out an attack in the US?” “In what countries could Hizballah execute an attack?” “Aum Shinrikyo could carry out an attack with what agent types?” [<a group>] could carry out an attack in [<a country>] using a [<an agent type>].
Two Domain Rules [some group] could carry out an attack in [some country] using a [some agent type]. [some group] has the expertise to carry out an attack using a [some agent type]. [some group] has the motivation to carry out an attack in [some country]. (A group could carry out an attack if the group has the expertise and the motivation to do so.) [some group] has the expertise to carry out an attack using a [some agent type]. In [something], [something] [some group] carried out a [something (<>attempted acquisition) (<>hoax/prank/threat) (<>plot only)] in [something], involving [some agent type] [something]. (A group has the expertise to carry out an attack if the group has been involved in a WMD terrorism incident other than an attempted acquisition, hoax, etc., or unexecuted plot.)
Terrorist Activities • Which groups have been involved in attacks in the United States? • Has Aum Shinrikyo carried out an attack in Japan with a biological agent? • In what countries have organizations executed attacks with radiological weapons? • Did the Japanese Red Army carry out a threat in Japan? • Has the KKK been engaged in an attack in the US? • What groups have put on a hoax in the United States? • Did Aum Supreme Truth plot to use a chemical agent in the United States? • Which groups have acquired a chemical weapon? • What groups have issued a threat? • Did the Animal Liberation Front issue a threat?
Database Contents • What group types are there? • What groups are in the WMD DB? • Is Aum Shinrikyo portrayed in the WMD Terrorism Database? • Is Turkey in the WMD Terrorism DB? • Is PFLP in the WMD Terrorism DB? • Is the Japanese Red Army specified in the WMD Terrorism Database? • Is the KKK included? • What event types are specified in the WMD DB? • What countries are in the WMD DB? • Is the Netherlands in the WMD DB? • What agent types are in the WMD Terrorism Database?
Relationships • What group type is Dark Harvest? • What groups are right-wing organizations? • What event types are left-wing groups associated with? • What group types are in Mexico? • What countries have criminal organizations? • Are criminal organizations in Lithuania? • Religious cults have a presence in what countries? • Are nationalist groups associated with radiological agents? • What groups are associated with use of agents? • Is Aum Shinrikyo associated with use of agents? • What groups does Canada have? • The Red Army Faction is in what countries? • What groups have a presence in Turkey? • What groups are associated with nuclear agents? • Has use of agents occurred in Germany?
Capabilities and Motivations • Does Hizballah want to carry out an attack in Lebanon? • Which groups have the motivation to carry out an attack in France? • Does Hizballah have the expertise to carry out an attack with a chemical agent? • What groups have the expertise to carry out an attack with a chemical agent? • Could Hizballah conduct an attack in Turkey using a biological agent? • Using what agent types could Hizballah execute an attack in Lebanon? • Are the Chechen rebels able to carry out an attack in Georgia using a chemical agent? • In what countries could Hizballah carry out an attack using biological agents?
Summary • Initial realization of the tripartite QA architecture completed • New capabilities: • Explicit, syntactically-based decomposition of questions • Augmented handling of elliptic questions • Implicit, semantically-based decomposition of questions • Incorporated resources: • CNS WMD Terrorism database • A range of web-based resources (e.g., Infoplease)