Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.08.02

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007)Learning for Semantic Parsing Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.08.02

Outlines • Introduction • Sample Applications and their MRLs • Systems for Learning Semantic Parsers • Experimental Evaluation • Future Research

Introduction • Semantic parsing • the task of mapping a natural language sentence into a complete, formal meaning representation (MR) or logical form • Meaning representation language (MRL) • a formal unambiguous language that allows for automated inference and processing • such as first-order predicate logic

Introduction (cont.) • MRL of this paper • is “executable” and can be directly used by another program to perform some task, such as • answering questions from a database • controlling the actions of a real or simulated robot • The goal of these systems • is to induce an efficient and accurate semantic parser that can map novel sentences into this MRL • Training corpus • sentences annotated (NL, MR) pairs • extra training input: such as syntactic parse trees or semantically annotated parse trees

Sample Applications and their MRLs • Database query language • a sample database on U.S. geography • logical query language based on Prolog • Coaching language for robotic soccer • developed for the RoboCup Coach Competition • a formal language called CLANG • Tactics and behaviors are expressed in terms of if-then rules

Systems for Learning Semantic Parsers • Three approaches to learning statistical semantic parsers • SCISSOR (CoNLL-2005, COLING-ACL-06) • adds detailed semantics to a statisticalsyntactic parser • WASP (HLT/NAACL-06) • adapts statisticalmachine translation methods to map from NL to MRL • KRISP (COLING-ACL-06) • uses SVM with a subsequence kernel specialized for text learning

Systems for Learning Semantic Parsers –SCISSOR • SCISSOR • Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations • learns a statistical parser that generates a semantically augmented parse tree (SAPT) • Training data: (NL, SAPT, MR) triples • Process • (1) an enhanced version of Collin’s parser (head-driven model 2) is trained to produce SAPTs • (2) a recursive procedure is used to compositionally construct the MR for each node in the SAPT given the MRs of its children

Systems for Learning Semantic Parsers –SCISSOR (cont.) Ball owner (type concept) Predicate concept

Systems for Learning Semantic Parsers –WASP • WASP • Word Alignment-based Semantic Parsing • uses Statistical Machine Translation (SMT) techniques (parallel corpora) • to translate from NL to MRL • Process • (1) An SMT word alignment system, GIZA++ is used to produce an N to 1 alignment between the words in the NL sentence and a sequence of MRL productions. • (2) A synchronous CFG (SCFG) produces complete MRs by combining these NL substrings and their translations.

Systems for Learning Semantic Parsers –WASP (cont.)

Systems for Learning Semantic Parsers –WRISP • KRISP • Kernel-based Robust Interpretation for Semantic Parsing • uses SVMs with string kernels to build semantic parsers • Process • (1) learns classifiers: a word or phrase  a particular concept in the MRL • (2) learns classifiers: NL substrings  a production • (3) each classifier estimates the probability of each production covering different substrings of the sentence.

Systems for Learning Semantic Parsers –WRISP (cont.)

Experimental Evaluation (1) • Two corpora of NL sentences paired with MRs • CLANG • the average NL sentence length: 22.52 words • 300 pieces of coaching advice • GEOQUERY • the average NL sentence length: 6.87 words • 250 questions • manually translated into logical form

Experimental Evaluation (2) • Evaluation • 10-fold cross validation • Recall: % sentences resulted in complete MRs • Precision: % MRs that were correct • CLANG: exact match except reorder of arguments • GEOQUERY: same retrieved answer from DB

Experimental Evaluation (3)

Experimental Evaluation (4)

Future Research (1) • SCISSOR: more accurate • requires additional human annotation in the form of SAPTs  constructed automatically • Domain & corpus • Limited domains  open domain • Constructing large annotated corpus of (NL MR) pairs • OntoNotes corpus is assembling currently.

Future Research (2) • Another way to obtain the requisite supervision • to allow ordinary users themselves to provide the necessary feedback • Sentence-meaning pair could be automatically constructed • inferring the meaning of a sentence from the context in which it was uttered

Future Research (3) • Symbol Grounding Problem (SGP) • Harnad, S. (1990) • Extended from Chinese Room Argument (Searle, 1980) • Challenge against Turing Test • the Dictionary-Go-Round (1) • Suppose you had to learn Chinese as a second language and the only source of information you had was a Chinese/Chinese dictionary. • The Dictionary-Go-Round (2) -- SGP • Suppose you had to learn Chinese as a first language and the only • source of information you had was a Chinese/Chinese dictionary! Clearly, a deep understanding of most natural language requires capturing the connection between the abstract concepts underlying words and phrases and their embodiment in the physical world.

Thanks!!

Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.08.02