200 likes | 373 Views
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing. Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.08.02. Outlines. Introduction Sample Applications and their MRLs
E N D
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007)Learning for Semantic Parsing Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.08.02
Outlines • Introduction • Sample Applications and their MRLs • Systems for Learning Semantic Parsers • Experimental Evaluation • Future Research
Introduction • Semantic parsing • the task of mapping a natural language sentence into a complete, formal meaning representation (MR) or logical form • Meaning representation language (MRL) • a formal unambiguous language that allows for automated inference and processing • such as first-order predicate logic
Introduction (cont.) • MRL of this paper • is “executable” and can be directly used by another program to perform some task, such as • answering questions from a database • controlling the actions of a real or simulated robot • The goal of these systems • is to induce an efficient and accurate semantic parser that can map novel sentences into this MRL • Training corpus • sentences annotated (NL, MR) pairs • extra training input: such as syntactic parse trees or semantically annotated parse trees
Sample Applications and their MRLs • Database query language • a sample database on U.S. geography • logical query language based on Prolog • Coaching language for robotic soccer • developed for the RoboCup Coach Competition • a formal language called CLANG • Tactics and behaviors are expressed in terms of if-then rules
Systems for Learning Semantic Parsers • Three approaches to learning statistical semantic parsers • SCISSOR (CoNLL-2005, COLING-ACL-06) • adds detailed semantics to a statisticalsyntactic parser • WASP (HLT/NAACL-06) • adapts statisticalmachine translation methods to map from NL to MRL • KRISP (COLING-ACL-06) • uses SVM with a subsequence kernel specialized for text learning
Systems for Learning Semantic Parsers –SCISSOR • SCISSOR • Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations • learns a statistical parser that generates a semantically augmented parse tree (SAPT) • Training data: (NL, SAPT, MR) triples • Process • (1) an enhanced version of Collin’s parser (head-driven model 2) is trained to produce SAPTs • (2) a recursive procedure is used to compositionally construct the MR for each node in the SAPT given the MRs of its children
Systems for Learning Semantic Parsers –SCISSOR (cont.) Ball owner (type concept) Predicate concept
Systems for Learning Semantic Parsers –WASP • WASP • Word Alignment-based Semantic Parsing • uses Statistical Machine Translation (SMT) techniques (parallel corpora) • to translate from NL to MRL • Process • (1) An SMT word alignment system, GIZA++ is used to produce an N to 1 alignment between the words in the NL sentence and a sequence of MRL productions. • (2) A synchronous CFG (SCFG) produces complete MRs by combining these NL substrings and their translations.
Systems for Learning Semantic Parsers –WRISP • KRISP • Kernel-based Robust Interpretation for Semantic Parsing • uses SVMs with string kernels to build semantic parsers • Process • (1) learns classifiers: a word or phrase a particular concept in the MRL • (2) learns classifiers: NL substrings a production • (3) each classifier estimates the probability of each production covering different substrings of the sentence.
Experimental Evaluation (1) • Two corpora of NL sentences paired with MRs • CLANG • the average NL sentence length: 22.52 words • 300 pieces of coaching advice • GEOQUERY • the average NL sentence length: 6.87 words • 250 questions • manually translated into logical form
Experimental Evaluation (2) • Evaluation • 10-fold cross validation • Recall: % sentences resulted in complete MRs • Precision: % MRs that were correct • CLANG: exact match except reorder of arguments • GEOQUERY: same retrieved answer from DB
Future Research (1) • SCISSOR: more accurate • requires additional human annotation in the form of SAPTs constructed automatically • Domain & corpus • Limited domains open domain • Constructing large annotated corpus of (NL MR) pairs • OntoNotes corpus is assembling currently.
Future Research (2) • Another way to obtain the requisite supervision • to allow ordinary users themselves to provide the necessary feedback • Sentence-meaning pair could be automatically constructed • inferring the meaning of a sentence from the context in which it was uttered
Future Research (3) • Symbol Grounding Problem (SGP) • Harnad, S. (1990) • Extended from Chinese Room Argument (Searle, 1980) • Challenge against Turing Test • the Dictionary-Go-Round (1) • Suppose you had to learn Chinese as a second language and the only source of information you had was a Chinese/Chinese dictionary. • The Dictionary-Go-Round (2) -- SGP • Suppose you had to learn Chinese as a first language and the only • source of information you had was a Chinese/Chinese dictionary! Clearly, a deep understanding of most natural language requires capturing the connection between the abstract concepts underlying words and phrases and their embodiment in the physical world.