260 likes | 373 Views
A Generative Model for Parsing Natural Language to Meaning Representations. Luke S. Zettlemoyer Massachusetts Institute of Technology. Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore. Classic Goal of NLP: Understanding Natural Language.
E N D
A Generative Model for Parsing Natural Language to Meaning Representations Luke S. Zettlemoyer Massachusetts Institute of Technology Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Classic Goal of NLP: Understanding Natural Language • Mapping Natural Language (NL) to Meaning Representations (MR) How many states do not have rivers ? … … … … … … … Meaning Representation Natural Language Sentence
Meaning Representation (MR) QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) STATE:state(all) STATE:loc_1(RIVER) RIVER:river(all) How many states do not have rivers ?
MR production • Meaning representation production (MR production) • Example: NUM:count(STATE) • Semantic category: NUM • Function symbol: count • Child semantic category: STATE • At most 2 child semantic categories
Task Description • Training data: NL-MR pairs • Input: A new NL sentence • Output: An MR
Challenge • Mapping of individual NL words to their associated MR productions is not given in the NL-MR pairs
Mapping Words to MR Productions QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) STATE:state(all) STATE:loc_1(RIVER) RIVER:river(all) how many states do not have rivers ? 7
Talk Outline • Generative model • Goal: flexible model that can parse a wide range of input sentences • Efficient algorithms for EM training and decoding • In practice: correct output is often in top-k list, but is not always the best scoring option • Reranking • Global features • Evaluation • Generative model combined with reranking technique achieves state-of-the-art performance
Hybrid Tree NL-MR Pair QUERY:answer(NUM) NUM:count(STATE) STATE:exclude(STATE STATE) Hybrid sequences STATE:state(all) STATE:loc_1(RIVER) RIVER:river(all) How many states do not have rivers ?
Model Parameters QUERY:answer(NUM) w: the NL sentence m: the MR T: the hybrid tree NUM:count(STATE) ? How many STATE:exclude(STATE STATE) STATE:state(all) do not STATE:loc_1(RIVER) states have RIVER:river(all) rivers P(w,m,T) =P(QUERY:answer(NUM)|-,arg=1) *P(NUM ?|QUERY:answer(NUM)) *P(NUM:count(STATE)|QUERY:answer(NUM),arg=1) *P(How many STATE|NUM:count(STATE)) *P(STATE:exclude(STATE STATE)|NUM:count(STATE),arg=1) *P(STATE1do notSTATE2|STATE:exclude(STATE STATE)) *P(STATE:state(all)|STATE:exclude(STATE STATE),arg=1) *P(states|STATE:state(all)) *P(STATE:loc_1(RIVER)|STATE:exclude(STATE STATE),arg=2) *P(have RIVER|STATE:loc_1(RIVER)) *P(RIVER:river(all)|STATE:loc_1(RIVER),arg=1) *P(rivers|RIVER:river(all)) MR Model Parameters ρ(m’|m,arg=k)
Model Parameters QUERY:answer(NUM) w: the NL sentence m: the MR T: the hybrid tree NUM:count(STATE) ? How many STATE:exclude(STATE STATE) STATE:state(all) do not STATE:loc_1(RIVER) states have RIVER:river(all) rivers P(How many STATE|NUM:count(STATE)) = P(mwY|NUM:count(STATE)) * P(How|NUM:count(STATE),BEGIN) * P(many|NUM:count(STATE),How) * P(STATE|NUM:count(STATE),many) * P(END|NUM:count(STATE),STATE) Pattern Parameters Φ(r|m)
Hybrid Patterns • M is an MR production, w is a word sequence • Y and Z are respectively the first and second child MR production • Note: [] denotes optional
Model Parameters QUERY:answer(NUM) w: the NL sentence m: the MR T: the hybrid tree NUM:count(STATE) ? How many STATE:exclude(STATE STATE) STATE:state(all) do not STATE:loc_1(RIVER) states have RIVER:river(all) rivers P(How many STATE|NUM:count(STATE)) = P(mwY|NUM:count(STATE)) * P(How|NUM:count(STATE),BEGIN) * P(many|NUM:count(STATE),How) * P(STATE|NUM:count(STATE),many) * P(END|NUM:count(STATE),STATE) Emission Parameters θ(t|m,Λ)
Assumptions : Model I, II, III NUM:count(STATE) BEGIN How many STATE END Model I Model II Model III Θ(ti|M,Λ) = P(ti|M) Θ(ti|M,Λ) = P(ti|M,ti-1) Θ(ti|M,Λ) = [P(ti|M,ti-1) + P(ti|M)] * 0.5 Unigram Model Bigram Model Mixgram Model
Model Parameters • MR model parameters Σmi ρ(mi|mj,arg=k) = 1 They model the meaning representation • Emission parameters ΣtΘ(t|mj,Λ) = 1 They model the emission of words and semantic categories of MR productions. Λ is the context. • Pattern parameters ΣrΦ(r|mj) = 1 They model the selection of hybrid patterns
Parameter Estimation • MR model parameters are easy to estimate • Learning the emission parameters and pattern parameters is challenging • Inside-outside algorithm with EM • Naïve implementation: O(n6m) • n: number of words in an NL sentence • m: number of MR productions in an MR • Improved efficient algorithm • Two-layer dynamic programming • Improved time complexity: O(n3m)
Decoding • Given an NL sentence w, find the optimal MR M*: M* = argmaxm P(m|w) = argmaxmΣT P(m,T |w) = argmaxmΣT P(w,m,T ) • We find the most likely hybrid tree M* = argmaxmmaxT P(w,m,T ) • Similar DP techniques employed • Implemented Exact top-k decoding algorithm
Reranking • Weakness of the generative model • Lacks the ability to model long range dependencies • Reranking with the averaged perceptron • Output space • Hybrid trees from exact top-k (k=50) decoding algorithm for each training/testing instance’s NL sentence • Single correct reference • Output of Viterbi algorithm for each training instance • Feature functions • Features 1-5 are indicator functions, while feature 6 is real-valued. • Threshold b that prunes unreliable predictions even when they score the highest, to optimize F-measure
Reranking Features: Examples QUERY:answer(NUM) log(P(w,m,T)) NUM:count(STATE) ? How many STATE:exclude(STATE STATE) STATE:state(all) do not STATE:loc_1(RIVER) states have RIVER:river(all) rivers Feature 1: Hybrid Rule: A MR production and its child hybrid sequence Feature 2: Expanded Hybrid Rule: A MR production and its child hybrid sequence expanded Feature 3: Long-range Unigram: A MR production and a NL word appearing below in tree Feature 4: Grandchild Unigram: A MR production and its grandchild NL word Feature 5: Two Level Unigram: A MR production, its parent production, and its child NL word Feature 6: Model Log-Probability: Logarithm of base model’s joint probability
Related Work • SILT (2005) by Kate, Wong, and Mooney • A system that learns deterministic rules to transform either sentences or their syntactic parse trees to meaning structures • WASP (2006) by Wong and Mooney • A system motivated by statistical machine translation techniques • KRISP (2006) by Kate and Mooney • A discriminative approach where meaning representation structures are constructed from the natural language strings hierarchically
Evaluation Metrics • Precision # correct output structures # output structures • Recall # correct output structures # input sentences • F measure 2 1/Precision + 1/Recall
Evaluations • Comparison over three models • I/II/III: Unigram/Bigram/Mixgram model; +R: w/ reranking • Reranking is shown to be effective • Overall, model III with reranking performs the best
Evaluations • Comparison with other models • On Geoquery: • Able to handle more than 25% of the inputs that could not be handled by previous systems • Error reduction rate of 22%
Evaluations • Comparison on other languages • Achieves performance comparable to previous system
Contributions • Introduced a hybrid tree representation framework for this task • Proposed a new generative model that can be applied to the task of transforming NL sentences to MRs • Developed a new dynamic programming algorithm for efficient training and decoding • The approach, augmented with reranking, achieves state-of-the-art performance on benchmark corpora, with a notable improvement in recall