310 likes | 418 Views
MO Q A Meaning Oriented Question Answering. An AQUAINT Project from. ILIT. CRL is a research department in the School of Arts and Sciences at NMSU Director: Jim Cowie Currently has a staff of 10 PhDs Mainly focuses on language engineering research
E N D
MOQAMeaning Oriented Question Answering An AQUAINT Project from ILIT
CRL is a research department in the School of Arts and Sciences at NMSU • Director: Jim Cowie • Currently has a staff of 10 PhDs • Mainly focuses on language engineering research • Languages include – Arabic, Farsi, Turkish, Spanish, Chinese, Japanese, Korean Contact: Jim Cowie – jcowie@crl.nmsu.edu
Advanced-technology company in Ithaca, New York • Founded in 1990 by Dr. Richard Kittredge, Dr. Tanya Korelsky, and Dr. Owen Rambow. • Goal is to transform results from research in natural language processing into practical software applications. • Has developed a core set of text generation tools • Current focus is on expanding the range of applications for this technology, with a particular focus on the Web. Contact: Tanya Korelsky – tanya@cogentex.com
ILIT • The Institute for Language and Information Technologies at University of Maryland Baltimore County • Sergei Nirenburg, Director • Opened September 2002 with a team of 3 senior personnel • Sergei Nirenburg • Stephen Beale • Marge McShane Contact: Sergei Nirenburg – sergei@cs.umbc.edu
Meaning-Oriented Question-Answering with Ontological Semantics • Domain: travel and meetings • question understanding and interpretation; • determining the answer and • presenting the answer • two kinds of data source • open text (in English, Arabic and Persian) • Structured Fact Repository containing instances of ontological entities
Project Tasks • Design and Implementation of System Architecture • Knowledge Acquisition • Question Understanding • Question Interpretation • Answer Determination • Answer Formulation • Documentation; User and Evaluator Training; Testing; and System Evaluation
Knowledge Sources Question Understanding LEXICONS FACT REPOSITORY: :ONTOLOGY Each Language including Output : Input : in System: instances of including goals, User Question goals, plans, plans, scripts including names System Response scripts and phrases in English in English Processing Modules and Intermediate Results NL Query NL Query Generation: Answer Determination Task-Oriented Answer from open text: Answer Formulation Determination from Fact in English, Arabic and and Presentation Database: one of Persian, Russian, IR w Using XML Spanish IE w IE from Fact Database Production of TMRs w for Textual Fillers of IE Templates Dialog and Self- Question Interpretation: Awareness-related Goal and Plan Answer Determination: task context Ÿ Processing dialog context Ÿ (for running commentary Manager user profile Ÿ and workflow and context- analyst team profile Ÿ related communication) Basic Text System Extended TMR: Goal Attainment and Meaning Response adds a statement of active Plan Execution Representation goals, plans and scripts in Agenda (TMR) in TMR the system System Working Memory
Development Methodology • Rapid Prototyping • Using pre-existing components • Grow the various development activities toward an integrated system • Today – look at one example of each User Interaction XML - TMR Resources Analysis
Deliverables • A QA system in the domain of travel and meetings, with a capability to search for information in open texts in three languages and in a structured, ontology-based Fact DB; • an enhanced text analysis system for each of the languages; • a question interpretation module that takes into account user goals and the context of the dialog; • an integrated IR/IE module working on open text in three languages, on the basis of ontologically defined extraction templates;
Deliverables (Cont.) • an ontology of about 6,500 concepts; • A Fact DB of about 100,000 facts; • a system for automating the acquisition of the Fact DB; • a semantic lexicon for each of the languages in the system, at about 20,000 entries • a decision-making module that determines the answer(s) and system action(s) at each step of the dialog/task processing; • an intuitive and intelligent multi-modal user interface, which uses natural language generation in answers and for query validation
Project Status • Approval to spend given from August 21st 2002 • UMBC – Ontology and English Lexicon Improvement, Development of Scripts, Meaning Based Text Analysis • CoGenTex – Interface design, Human Factors, Text Generation • NMSU – Text-preprocessing, Arabic and Farsi resources and analysis, data collection, system integration.
Corpora Being Collected • Arabic : • http://www.aljazirah.net • http://news.bbc.co.uk/hi/arabic/news/ • http://www.irna.com/ar/index.shtml • English: • http://www.cnn.com • http://www.bbc.co.uk/worldservice/index.shtml • http://www.irna.com/en/ • Persian: • http://www.hamshahri.net/ • http://www.bbc.co.uk/persian/index.shtml • http://www.irna.com/pe/index.shtml
Ontology • An ontology is a formally and semantically defined repository of concepts and relations about the world. • Including knowledge about events, objects, and work flow scripts • Linked to the ontology are: • fact repositories, including facts about actual events, objects, places, personalities, etc. • lexica, defining words in a language in ontological terms • “onomastica”, or multilingual proper name lists
Structured Common Fact Repository • Uniform organization for all kinds of data • Support for multiple applications and tools • Semantically anchored in general ontology • Constantly updated; today, manually; tomorrow, semi-automatically; long-term, automatically • Supports both domain knowledge and workflow specification
Original Text CIA Report 02 [15 October, 2002, from a source, said to be credible, in Jordan]: A man named Majed H., using a faked Jordanian passport and travel visa, traveled from Aman, Jordan to Chicago, Illinois on 12 July, 2002. Majed H. is now known to have resided in Afghanistan for two years (1996-1997) and has been identified as a member of Al-Qaeda. Gloss An unnamed source, who is very reliable, informed the CIA sometime between July 12, 2002 and October 15, 2002, of a travel-event by Majed H. on July 12, 2002, from Aman (Amman) Jordan to Chicago Illinois. In order to take the trip, Majed H. used a fake passport and visa issued by Jordan. Majed H. was located in Afghanistan from January 1, 1996, through December 31, 1997, and is a member of Al-Qaeda. Populating the Fact Repository
Gloss An unnamed source, who is very reliable, informed the CIA sometime between July 12, 2002 and October 15, 2002, of a travel-event by Majed H. on July 12, 2002, from Aman (Amman) Jordan to Chicago Illinois. In order to take the trip, Majed H. used a fake passport and visa issued by Jordan. Majed H. was located in Afghanistan from January 1, 1996, through December 31, 1997, and is a member of Al-Qaeda. Populating the Fact Repository (2) Facts (12 Total) INFORM-1 AGENT: SOCIAL-ROLE-4 BENEFICIARY: ORGANIZATION-1 THEME: TRAVEL-EVENT-0 TIME: <> 07/12/2002 10/15/2002 MODALITY-EPISTEMIC: > 0.6 …………… NATION-2 HAS-NAME: "Jordan" CITY-1 HAS-NAME: "Amman" IN-NATION: NATION-2
Using Ontology to Support Retrieval • Documents need to be retrieved using the language of the document • The representation of queries in the system is in terms of ontological concepts and “facts” • We will use the ontology to support retrieval in all three languages • Current experiment uses Chinese and Spanish- Ontology-Language lexicons exist for these languages
Generation Tasks Months 1-6 • Subtask 1: First prototype of intelligent question answering user interface, involving hypertext generation (December demo) • Subtask 2: Gathering of end users feedback on the interface functionality, look-and-feel and user customization • Subtask 3: Design of extensions to cover broader collection of concepts from the ontology • Milestone at next 6 months: Report on user feedback
MOQA User Interface • Support for both natural languages queries and structured queries • Intuitive web-based multi-modal interface for answers • Tables, text, maps, time line, and social network graphs are interconnected by hyperlinks • Natural language generation used in answers and for query validation • Implemented using XML-based technology • Positive reviews at the kick-off from an HCI expert and program management
MOQA User Interface – Query Page • Support for NL-based queries and structured queries • Structured query validation with automatically generated NL paraphrase • WYSIWYM editing of structured queries
MOQA User Interface –Underlying XML Queries • Uses standard XML technology (e.g. XML-compliant browser, XML parsers, etc.) • Supports modularity – the XML representation is viewable and exchangeable between subsystems • Assures automatic validation of query instances using a query class hierarchy described by XML schemas • Uses logical expressions in XML to support complex queries
MOQA User Interface – Results Page (kick-off concept demo version) • Concept demo helped to perform requirements analysis • Demonstrated integrated display using tables, textual summary, map and time line • Illustrated filtering table data by using hyperlinks in text
MOQA User Interface - Details Page (concept demo) • Demonstrated display of additional types of information including social networks and source document extracts • Illustrated “drill-down” by hyperlinks and typical follow-up queries based on underlying ontology
MOQA User Interface – Research Plans • Presentation of partially understood natural language queries • Personalization of answer presentation both content-wise (based on user expertise) and form-wise (based on user presentation preferences) • Intelligent maintenance of session history based on typical work flow and collaboration patterns within groups • Interface portability between subject domains • Incremental evolution based on validation by domain experts