660 likes | 711 Views
Learning for Semantic Parsing. Raymond J. Mooney Yuk Wah Wong Ruifang Ge Rohit Kate. Syntactic Natural Language Learning. Most computational research in natural-language learning has addressed “low-level” syntactic processing. Morphology (e.g. past-tense generation) Part-of-speech tagging
E N D
Learning for Semantic Parsing Raymond J. Mooney Yuk Wah Wong Ruifang Ge Rohit Kate
Syntactic Natural Language Learning • Most computational research in natural-language learning has addressed “low-level” syntactic processing. • Morphology (e.g. past-tense generation) • Part-of-speech tagging • Shallow syntactic parsing (chunking) • Syntactic parsing
Semantic Natural Language Learning • Learning for semantic analysis has been restricted to relatively “shallow” meaning representations. • Word sense disambiguation (e.g. SENSEVAL) • Semantic role assignment (determining agent, patient, instrument, etc., e.g. FrameNet, PropBank) • Information extraction
Semantic Parsing • A semantic parser maps a natural-language sentence to a complete, detailed semantic representation: logical form ormeaning representation (MR). • For many applications, the desired output is immediately executable by another program. • Two application domains: • CLang: RoboCup Coach Language • GeoQuery: A Database Query Application
CLang: RoboCup Coach Language • In RoboCup Coach competition teams compete to coach simulated players • The coaching instructions are given in a formal language called CLang If the ball is in our penalty area, then all our players except player 4 should stay in our half. Simulated soccer field Coach Semantic Parsing ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) CLang
GeoQuery: A Database Query Application • Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] How many states does the Mississippi run through? User Semantic Parsing answer(A, count(B, (state(B), C=riverid(mississippi), traverse(C,B)), A)) Query
Semantic-Parser Learner Semantic Parser Natural Language Learning Semantic Parsers • Manually programming robust semantic parsers is difficult due to the complexity of the task. • Semantic parsers can be learned automatically from sentences paired with their logical form. NLMR Training Exs Meaning Rep
Engineering Motivation • Most computational language-learning research strives for broad coverage while sacrificing depth. • “Scaling up by dumbing down” • Realistic semantic parsing currently entails domain dependence. • Domain-dependent natural-language interfaces have a large potential market. • Learning makes developing specific applications more tractable. • Training corpora can be easily developed by tagging existing corpora of formal statements with natural-language glosses.
Cognitive Science Motivation • Most natural-language learning methods require supervised training data that is not available to a child. • General lack of negative feedback on grammar. • No treebank or sense or semantic-role tagged data. • Assuming a child can infer the likely meaning of an utterance from context, NLMR pairs are more cognitively plausible training data.
Our Semantic-Parser Learners • CHILL+WOLFIE (Zelle & Mooney, 1996; Thompson & Mooney, 1999, 2003) • Separates parser-learning and semantic-lexicon learning. • Learns a deterministic parser using ILP techniques. • COCKTAIL(Tang & Mooney, 2001) • Improved ILP algorithm for CHILL. • SILT (Kate, Wong & Mooney, 2005) • Learns symbolic transformation rules for mapping directly from NL to MR. • SCISSOR (Ge & Mooney, 2005) • Integrates semantic interpretation into Collins’ statistical syntactic parser. • WASP(Wong & Mooney, 2006) • Uses syntax-based statistical machine translation methods. • KRISP (Kate & Mooney, 2006) • Uses a series of SVM classifiers employing a string-kernel to iteratively build semantic representations.
GeoQuery On-Line Demo http://www.cs.utexas.edu/users/ml/geo.html
S-bowner NP-player VP-bowner PRP$-team NN-player CD-unum VB-bowner NP-null our player 2 has DT-null NN-null the ball S-bowner NP-player VP-bowner PRP$-team NN-player CD-unum VB-bowner NP-null our player 2 has DT-null NN-null the ball SCISSOR: Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations • Based on a fairly standard approach to compositional semantics [Jurafsky and Martin, 2000] • A statistical parser is used to generate a semantically augmented parse tree (SAPT) • Augment Collins’ head-driven model 2 to incorporate semantic labels • Translate SAPT into a complete formal meaning representation (MR) MR: bowner(player(our,2))
NL Sentence learner SAPT Training Examples SAPT TRAINING TESTING ComposeMR MR Overview of SCISSOR Integrated Semantic Parser
SCISSOR SAPT Parser Implementation • Semantic labels added to Bikel’s (2004) open-source version of the Collins statistical parser. • Head-driven derivation of production rules augmented to also generate semantic labels. • Parameter estimates during training employ an augmented smoothing technique to account for additional data sparsity created by semantic labels. • Parsing of test sentences to find the most probable SAPT is performed using a standard beam-search constrained version of CKY chart-parsing algorithm.
ComposeMR bowner player bowner null team player unum bowner 2 null null our player has the ball
ComposeMR bowner(_) player(_,_) bowner(_) null team player(_,_) unum bowner(_) 2 null null our player has the ball
player(team,unum) bowner(player) ComposeMR bowner(player(our,2)) bowner(_) bowner(_) bowner(_) bowner(_) player(our,2) player(_,_) player(_,_) null null team player(_,_) unum bowner(_) 2 null null our player has the ball
WASPA Machine Translation Approach to Semantic Parsing • Based on a semantic grammar of the natural language. • Uses machine translation techniques • Synchronous context-free grammars (SCFG) (Wu, 1997; Melamed, 2004; Chiang, 2005) • Word alignments (Brown et al., 1993; Och & Ney, 2003) • Hence the name: Word Alignment-based Semantic Parsing
Synchronous Context-Free Grammars (SCFG) • Developed by Aho & Ullman (1972) as a theory of compilers that combines syntax analysis and code generation in a single phase • Generates a pair of strings in a single derivation
Compiling, Machine Translation, and Semantic Parsing • SCFG: formal language to formal language (compiling) • Alignment models: natural language to natural language (machine translation) • WASP: natural language to formal language (semantic parsing)
of STATE Ohio Context-Free Semantic Grammar QUERY What QUERY What isCITY is CITY CITY the capitalCITY the capital CITY CITY ofSTATE STATE Ohio
Productions of Synchronous Context-Free Grammars • Referred to as transformation rules in Kate, Wong & Mooney (2005) pattern template QUERY What isCITY /answer(CITY)
What is CITY answer ( CITY ) the capital CITY capital ( CITY ) loc_2 ( STATE ) of STATE stateid ( 'ohio' ) Ohio CITY ofSTATE / loc_2(STATE) CITY the capitalCITY / capital(CITY) QUERY What isCITY / answer(CITY) Synchronous Context-Free Grammars QUERY QUERY answer(capital(loc_2(stateid('ohio')))) Whatis thecapital of Ohio STATE Ohio / stateid('ohio')
Parsing Model of WASP • N (non-terminals)= {QUERY, CITY, STATE, …} • S (start symbol)= QUERY • Tm (MRL terminals) = {answer, capital, loc_2, (, ), …} • Tn (NL words) = {What, is, the, capital, of, Ohio, …} • L (lexicon) = • λ (parameters of probabilistic model) = ? QUERY What isCITY / answer(CITY) CITY the capitalCITY / capital(CITY) CITY ofSTATE / loc_2(STATE) STATE Ohio / stateid('ohio')
CITY capital CITY / capital(CITY) CITY of STATE / loc_2(STATE) Probabilistic Parsing Model d1 CITY CITY capital capital ( CITY ) CITY of loc_2 ( STATE ) STATE Ohio stateid ( 'ohio' ) STATE Ohio / stateid('ohio')
CITY capital CITY / capital(CITY) CITY of RIVER / loc_2(RIVER) Probabilistic Parsing Model d2 CITY CITY capital capital ( CITY ) CITY of loc_2 ( RIVER ) RIVER Ohio riverid ( 'ohio' ) RIVER Ohio / riverid('ohio')
CITY capital CITY / capital(CITY) CITY capital CITY / capital(CITY) CITY of STATE / loc_2(STATE) CITY of RIVER / loc_2(RIVER) + + Probabilistic Parsing Model d1 d2 CITY CITY capital ( CITY ) capital ( CITY ) loc_2 ( STATE ) loc_2 ( RIVER ) stateid ( 'ohio' ) riverid ( 'ohio' ) 0.5 0.5 λ λ 0.3 0.05 0.5 0.5 STATE Ohio / stateid('ohio') RIVER Ohio / riverid('ohio') Pr(d1|capital of Ohio) =exp( ) / Z 1.3 Pr(d2|capital of Ohio) = exp( ) / Z 1.05 normalization constant
Parsing Model of WASP • N (non-terminals)= {QUERY, CITY, STATE, …} • S (start symbol)= QUERY • Tm (MRL terminals) = {answer, capital, loc_2, (, ), …} • Tn (NL words) = {What, is, the, capital, of, Ohio, …} • L (lexicon) = • λ (parameters of probabilistic model) QUERY What isCITY / answer(CITY) CITY the capitalCITY / capital(CITY) CITY ofSTATE / loc_2(STATE) STATE Ohio / stateid('ohio')
Overview of WASP Unambiguous CFG of MRL Lexical acquisition Training set, {(e,f)} Lexicon,L Parameter estimation Parsing model parameterized by λ Training Testing Input sentence, e' Output MR, f' Semantic parsing
Lexical Acquisition • Transformation rules are extracted from word alignments between an NL sentence, e, and its correct MR, f, for each training example, (e, f)
Word Alignments • A mapping from French words to their meanings expressed in English Le programme a été mis en application And the program has been implemented
Lexical Acquisition • Train a statistical word alignment model (IBM Model 5) on training set • Obtain most probablen-to-1 word alignments for each training example • Extract transformation rules from these word alignments • Lexicon L consists of all extracted transformation rules
Word Alignment for Semantic Parsing • How to introduce syntactic tokens such as parens? The goalie should always stay in our half ( ( true ) ( do our { 1 } ( pos ( half our ) ) ) )
Use of MRL Grammar The RULE (CONDITION DIRECTIVE) goalie CONDITION (true) should DIRECTIVE (do TEAM {UNUM} ACTION) always TEAM our top-down, left-most derivation of an un-ambiguous CFG stay UNUM 1 in ACTION (pos REGION) n-to-1 our REGION (half TEAM) half TEAM our
Extracting Transformation Rules RULE (CONDITION DIRECTIVE) The CONDITION (true) goalie should DIRECTIVE (do TEAM {UNUM} ACTION) always TEAM our stay UNUM 1 in ACTION (pos REGION) our TEAM REGION (half TEAM) half TEAM our TEAM our / our
REGION TEAMhalf / (half TEAM) Extracting Transformation Rules RULE (CONDITION DIRECTIVE) The CONDITION (true) goalie should DIRECTIVE (do TEAM {UNUM} ACTION) always TEAM our stay UNUM 1 in ACTION (pos REGION) REGION TEAM REGION (half TEAM) REGION (half our) half TEAM our
ACTION stay in REGION/ (pos REGION) Extracting Transformation Rules RULE (CONDITION DIRECTIVE) The CONDITION (true) goalie should DIRECTIVE (do TEAM {UNUM} ACTION) always TEAM our ACTION stay UNUM 1 in ACTION (pos REGION) ACTION (pos (half our)) REGION REGION (half our)
Probabilistic Parsing Model • Based on maximum-entropy model: • Features fi (d) are number of times each transformation rule is used in a derivation d • Output translation is the yield of most probable derivation
Parameter Estimation • Maximum conditional log-likelihood criterion • Since correct derivations are not included in training data, parameters λ* are learned in an unsupervised manner • EM algorithm combined with improved iterative scaling, where hidden variables are correct derivations (Riezler et al., 2000)
KRISP: Kernel-based Robust Interpretation by Semantic Parsing • Learns semantic parser from NL sentences paired with their respective MRs given MRL grammar • Productions of MRL are treated like semantic concepts • SVM classifier is trained for each production with string subsequence kernel • These classifiers are used to compositionally build MRs of the sentences
Experimental Corpora • CLang • 300 randomly selected pieces of coaching advice from the log files of the 2003 RoboCup Coach Competition • 22.52 words on average in NL sentences • 14.24 tokens on average in formal expressions • GeoQuery [Zelle & Mooney, 1996] • 250 queries for the given U.S. geography database • 6.87 words on average in NL sentences • 5.32 tokens on average in formal expressions • Also translated into Spanish, Turkish, & Japanese.
Experimental Methodology • Evaluated using standard 10-fold cross validation • Correctness • CLang: output exactly matches the correct representation • Geoquery: the resulting query retrieves the same answer as the correct representation • Metrics
Tactical Natural Language Generation • Mapping a formal MR into NL • Can be done using statistical machine translation • Previous work focuses on using generation in interlingual MT (Hajič et al., 2004) • There has been little, if any, research on exploiting statistical MT methods for generation
Tactical generation Tactical Generation • Can be seen as inverse of semantic parsing The goalie should always stay in our half Semantic parsing ((true) (do our {1} (pos (half our))))