260 likes | 387 Views
Meaning-Oriented Question-Answering with Ontological Semantics. An AQUAINT Project from. ILIT. CRL is a research department in the School of Arts and Sciences at NMSU Funded externally Currently has a staff of 10 PhDs Mainly focuses on language engineering research
E N D
Meaning-Oriented Question-Answering with Ontological Semantics An AQUAINT Project from ILIT
CRL is a research department in the School of Arts and Sciences at NMSU • Funded externally • Currently has a staff of 10 PhDs • Mainly focuses on language engineering research • Languages include – Arabic, Farsi, Turkish, Spanish, Chinese, Japanese, Korean
Advanced-technology company in Ithaca, New York • Founded in 1990 by Dr. Richard Kittredge, Dr. Tanya Korelsky, and Dr. Owen Rambow. • Goal is to transform results from research in natural language processing into practical software applications. • Has developed a core set of text generation tools • Current focus is on expanding the range of applications for this technology, with a particular focus on the Web.
ILIT • The Institute for Language and Information Technologies at University of Maryland Baltimore County • Sergei Nirenburg, Director • Begins operation in September 2002 with a team of 3 senior personnel • Close collaboration with NMSU CRL
Recent Projects: CRL CREST: Cross-Language Retrieval, Extract-ion, Summarization and Translation (a TIDES project) An Arabic-English Translation System (a TIDES project) MINDS: multilingual summarization Keizai, MINDSEYE: cross- language retrieval FLAX: HTML parsing Shiraz: Farsi-English, Dari-English MT Expedition: Rapid Ramp-Up of MT for Low Density Languages
Recent Projects: CoGenTex • Production of user directed multi-document summaries (RIPTIDES) • Multimedia display will include fluent English responses coordinated with tables, diagrams, and hypertext follow-up (Reporter) • Deep generation techniques that employ an explicit representation of communicative structure (FoG and LFS) • Rule based text generation tools, both for answer planning and syntactic generation (Exemplars and RealPro)
Meaning-Oriented Question-Answering with Ontological Semantics • Domain: travel and meetings • question understanding and interpretation; • determining the answer and • presenting the answer • two kinds of data source • open text (in English, Arabic and one of Persian, Russian or Spanish) • Structured Fact Database containing instances of ontological entities
Project Tasks • Design and Implementation of System Architecture • Knowledge Acquisition • Question Understanding • Question Interpretation • Answer Determination • Answer Formulation • Documentation; User and Evaluator Training; Testing; and System Evaluation
Question Understanding Static Knowledge Sources Fact Database: Lexiconsfor Ontology: Each Language including Output : Input : in System: instances of including goals, User Question goals, plans, plans, scripts including names System Response scripts and phrases in English in English Processing Modules and Intermediate Results NL Query NL Query Generation: Answer Determination Task-Oriented Answer from open text: Answer Formulation Determination from Fact in English, Arabic and and Presentation Database: one of Persian, Russian, IR w Spanish IE w IE from Fact Database Production of TMRs w for Textual FIllers of IE Templates Dialog and Self- Question Interpretation: Awareness-related Goal and Plan Answer Determination: task context Ÿ Processing dialog context Ÿ (for running commentary Manager user profile Ÿ and workflow and context- analyst team profile Ÿ related communication) Basic Text System Extended TMR: Goal Attainment and Meaning Response adds a statement of active Plan Execution Representation goals, plans and scripts in Agenda (TMR) in TMR the system System Working Memory
Development Strategy • Rapid Prototyping • Using pre-existing components • Evaluation of end-to-end system performance for specific tasks
Deliverables • A QA system in the domain of travel and meetings, with a capability to search for information in open texts in three languages and in a structured, ontology-based Fact DB; • an enhanced text analysis system for each of the languages; • a question interpretation module that takes into account user goals and the context of the dialog; • an integrated IR/IE module working on open text in three languages, on the basis of ontologically defined extraction templates;
Deliverables (Cont.) • an ontology of about 6,500 concepts; • A Fact DB of about 100,000 facts; • a system for automating the acquisition of the Fact DB; • a semantic lexicon for each of the languages in the system, at about 20,000 entries • a decision-making module that determines the answer(s) and system action(s) at each step of the dialog/task processing; • an ontological-semantic text generation module.
Sergei Nirenburg sergei@crl.nmsu.edu Jim Cowie jcowie@crl.nmsu.edu Tanya Korelsky tanya@cogentex.com Richard Kittredge richard@cogentex.com
Structured Common Fact Database • Uniform format for all kinds of data • Uniform support for multiple applications and tools • Semantically anchored in general ontology • Constantly updated; today, manually to semi-automatically; tomorrow, automatically • Supports both domain knowledge and workflow specification
Ontology Defined • An ontology is a formally and semantically defined repository of concepts and relations about the world. • Including knowledge about events, objects, and work flow scripts • Linked to the ontology are: • fact databases, including facts about actual events, objects, places, personalities, etc. • “onomastica”, or multilingual proper name lists
Travel Tracking Template PERSON-TRAVELLING NAME ALIAS NATIONALITY AFFILIATION POSITION PURPOSE-OF-TRAVEL (attend meeting of world leaders) DESTINATION (location of meeting) FLIGHT-INFORMATION departure from departure time arrival at arrive time flight number
Text Meaning Representation Input: Hakan Sukur arrived in Istanbul from London on British Airways Flight 633 on March 2, 2002 Output: • proposition _1 • head %travel_1 • agent human_544 “Hakan Sukur” • source location_23 “London” • destination location_25 “Istanbul” • means flight-17776 “BA633” • tmr-time • time-begin 20000702 “March 2, 2002” • aspect • iteration single; phase end… “departed”