Learning Language Semantics from Ambiguous Supervision

Learning Language Semantics from Ambiguous Supervision Rohit J. Kate Raymond J. Mooney

Semantic Parsing • Involves learning language semantics to transform natural language (NL) sentences into computer executablecomplete meaning representations (MRs) for some application • Geoquery: An example database query application Which rivers run through the states bordering Texas? Arkansas,Canadian,Cimarron, Gila,Mississippi, RioGrande … Answer Semantic Parsing Query answer(traverse(next_to(stateid(‘texas’))))

Learning for Semantic Parsing • Learning for semantic parsing consists of inducing a semantic parser from training data which can map novel sentences into their meaning representations • Many accurate learning systems for semantic parsing have been recently developed: [Ge & Mooney, 2005], [Zettlemoyer & Collins, 2005], [Wong & Mooney, 2006], [Kate & Mooney, 2006], [Nguyen, Shimazu & Phan, 2006]

Unambiguous Supervision for Learning Semantic Parsers • The training data for semantic parsing consists of hundreds of natural language sentences unambiguously paired with their meaning representations

Unambiguous Supervision for Learning Semantic Parsers • The training data for semantic parsing consists of hundreds of natural language sentences unambiguously paired with their meaning representations Which rivers run through the states bordering Texas? answer(traverse(next_to(stateid(‘texas’)))) What is the lowest point of the state with the largest area? answer(lowest(place(loc(largest_one(area(state(all))))))) What is the largest city in states that border California? answer(largest(city(loc(next_to(stateid( 'california')))))) ……

Shortcomings of Unambiguous Supervision • It requires considerable human effort to annotate each sentence with its correct meaning representation • Does not model the type of supervision children receive when they are learning a language • Children are not taught meanings of individual sentences • They learn to identify the correct meaning of a sentence from several meanings possible in their perceptual context

??? “Mary is on the phone”

Ambiguous Supervision for Learning Semantic Parsers • A computer system simultaneously exposed to perceptual contexts and natural language utterances should be able to learn the underlying language semantics • We consider ambiguous training data of sentences associated with multiple potential meaning representations • Siskind (1996) uses this type “referentially uncertain” training data to learn meanings of words • Capturing meaning representations from perceptual contexts is a difficult unsolved problem • Our system directly works with symbolic meaning representations

??? “Mary is on the phone”

??? Ironing(Mommy, Shirt) “Mary is on the phone”

??? Ironing(Mommy, Shirt) Working(Sister, Computer) “Mary is on the phone”

??? Ironing(Mommy, Shirt) Carrying(Daddy, Bag) Working(Sister, Computer) “Mary is on the phone”

??? Ambiguous Training Example Ironing(Mommy, Shirt) Carrying(Daddy, Bag) Working(Sister, Computer) Talking(Mary, Phone) Sitting(Mary, Chair) “Mary is on the phone”

Next Ambiguous Training Example Ironing(Mommy, Shirt) Working(Sister, Computer) Talking(Mary, Phone) ??? Sitting(Mary, Chair) “Mommy is ironing shirt”

Ambiguous Supervision for Learning Semantic Parsers contd. • Our model of ambiguous supervision corresponds to the type of data that will be gathered from a temporal sequence of perceptual contexts with occasional language commentary • We assume each sentence has exactly one meaning in a perceptual context • Each meaning is associated with at most one sentence in a perceptual context

Sample Ambiguous Corpus gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog)) Forms a bipartite graph

Rest of the Talk • Brief background on KRISP, the semantic parsing learning system for unambiguous supervision • KRISPER: Extended system to handle ambiguous supervision • Corpus construction • Experiments

KRISP: Semantic Parser Learner for Unambiguous Supervision • KRISP: Kernel-based Robust Interpretation for Semantic Parsing [Kate & Mooney 2006] • Takes NL sentences unambiguously paired with their MRs as training data • Treats the formal MR language grammar’s productions as semantic concepts • Trains an SVM classifier for each production with string subsequence kernel [Lodhi et al. 2002]

ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) ANSWER TRAVERSE  traverse STATE  NEXT_TO(STATE) RIVER answer NEXT_TO  next_to TRAVERSE STATE STATE  STATEID traverse STATEID  ‘texas’ NEXT_TO STATE next_to STATEID ‘texas’ stateid Meaning Representation Language MR: answer(traverse(next_to(stateid(‘texas’)))) Parse tree of MR: Productions: ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) STATE  NEXT_TO(STATE) TRAVERSE  traverse NEXT_TO  next_to STATEID  ‘texas’

NEXT_TO  next_to NEXT_TO  next_to NEXT_TO  next_to Semantic Parsing by KRISP • SVM classifier for each production gives the probability that a substring represents the semantic concept of the production 0.02 0.01 0.95 Which rivers run through the states bordering Texas?

TRAVERSE  traverse TRAVERSE  traverse Semantic Parsing by KRISP • SVM classifier for each production gives the probability that a substring represents the semantic concept of the production 0.91 0.21 Which rivers run through the states bordering Texas?

Semantic Parsing by KRISP • Semantic parsing is done by finding the most probable derivation of the sentence [Kate & Mooney 2006] 0.89 ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) 0.92 TRAVERSE  traverse 0.91 STATE  NEXT_TO(STATE) 0.81 0.95 NEXT_TO  next_to 0.98 STATE  STATEID 0.99 STATEID  ‘texas’ Which rivers run through the states bordering Texas? Probability of the derivation is the product of the probabilities at the nodes.

Semantic Parsing by KRISP • Given a sentence and a meaning representation, KRISP can also find the probability that it is the correct meaning representation for the sentence

KRISPER: KRISPwith EM-like Retraining • Extension of KRISP that learns from ambiguous supervision • Uses an iterative EM-like method to gradually converge on a correct meaning for each sentence

KRISPER’s Training Algorithm 1. Assume every possible meaning for a sentence is correct gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

KRISPER’s Training Algorithm contd. 2. Resulting NL-MR pairs are weighted and given to KRISP gave(daisy, clock, mouse) 1/2 ate(mouse, orange) Daisy gave the clock to the mouse. 1/2 ate(dog, apple) 1/4 1/4 Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) 1/4 1/4 broke(dog, box) 1/5 1/5 1/5 The dog broke the box. gave(woman, toy, mouse) 1/5 1/5 gave(john, bag, mouse) 1/3 1/3 John gave the bag to the mouse. threw(dog, ball) 1/3 1/3 runs(dog) 1/3 The dog threw the ball. 1/3 saw(john, walks(man, dog))

KRISPER’s Training Algorithm contd. 3. Estimate the confidence of each NL-MR pair using the resulting parser gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

KRISPER’s Training Algorithm contd. 3. Estimate the confidence of each NL-MR pair using the resulting parser gave(daisy, clock, mouse) 0.92 ate(mouse, orange) Daisy gave the clock to the mouse. 0.11 ate(dog, apple) 0.32 0.88 Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) 0.22 0.24 broke(dog, box) 0.71 0.18 0.85 The dog broke the box. 0.14 gave(woman, toy, mouse) 0.95 gave(john, bag, mouse) 0.24 0.89 John gave the bag to the mouse. threw(dog, ball) 0.33 0.97 runs(dog) 0.81 The dog threw the ball. 0.34 saw(john, walks(man, dog))

KRISPER’s Training Algorithm contd. 4. Use maximumweightedmatching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957] gave(daisy, clock, mouse) 0.92 ate(mouse, orange) Daisy gave the clock to the mouse. 0.11 ate(dog, apple) 0.32 0.88 Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) 0.22 0.24 broke(dog, box) 0.71 0.18 0.85 The dog broke the box. 0.14 gave(woman, toy, mouse) 0.95 gave(john, bag, mouse) 0.24 0.89 John gave the bag to the mouse. threw(dog, ball) 0.33 0.97 runs(dog) 0.81 The dog threw the ball. 0.34 saw(john, walks(man, dog))

KRISPER’s Training Algorithm contd. 5. Give the best pairs to KRISP in the next iteration, continue till converges gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

Corpus Construction • There is no real-world ambiguous corpus yet available for semantic parsing to our knowledge • We artificially obfuscated the real-world unambiguous corpus by adding extra distracter MRs to each training pair (Ambig-Geoquery) • We also created an artificial ambiguous corpus (Ambig-ChildWorld) which more accurately models real-world ambiguities in which potential candidate MRs are often related

Ambig-Geoquery Corpus Start with the unambiguous Geoquery corpus MR NL MR NL MR NL NL MR NL MR

Ambig-Geoquery Corpus Insert 0 to  random MRs from the corpus between each pair MR MR NL MR MR MR NL MR MR NL MR MR MR NL MR MR MR NL MR MR

Ambig-Geoquery Corpus Form a window of width from 0 to  in either direction for each NL sentence MR MR NL MR MR MR NL MR MR NL MR MR MR NL MR MR MR NL MR MR

Ambig-Geoquery Corpus Form the ambiguous corpus MR MR NL MR MR MR NL MR MR NL MR MR MR NL MR MR MR NL MR MR

Ambig-ChildWorld Corpus • Although Ambig-Geoquery corpus uses real-world NL-MR pairs, it does not model relatedness between potential MRs for each sentence, common in perceptual contexts • Constructed a synchronous grammar [Aho & Ullman, 1972] to simultaneously generate artificial NL-MR pairs • Uses 15 verbs and 37 nouns (people, animals, things), MRs are in predicate logic without quantifiers

Ambig-ChildWorld Corpus contd. • Different perceptual contexts were modeled by choosing subsets of productions of the synchronous grammar • This leads to subsets of verbs and nouns (e.g. only Mommy, Daddy, Mary) causing more relatedness among potential MRs • For each such perceptual context, data was generated in a way similar to Ambig-Geoquery corpus

Ambiguity in Corpora • Three levels of ambiguity were created by varying parameters  and 

Methodology • Performed 10-fold cross validation • Metrics: • Measured best F-measure across the precision-recall curve obtained using output confidence thresholds

Results on Ambig-Geoquery Corpus

Results on Ambig-ChildWorld Corpus

Future Work • Construct a real-world ambiguous corpus and test this approach • Combine this system with a vision-based system that extracts MRs from perceptual contexts

Conclusions • We presented the problem of learning language semantics from ambiguous supervision • This form of supervision is more representative of natural training environment for a language learning system • We presented an approach that learns from ambiguous supervision by iteratively re-training a system for unambiguous supervision • Experimental results on two artificial corpora showed that this approach is able to cope with ambiguities to learn accurate semantic parsers

Thank you! Questions??

Learning Language Semantics from Ambiguous Supervision

Learning Language Semantics from Ambiguous Supervision

Presentation Transcript

Chapter 4: Language Semantics

Semantics + Language Preservation

Programming Language Semantics

Aspleniasitus ambiguous

Ambiguous Vocabulary

Language is ambiguous

Grounded Language Learning Models for Ambiguous Supervision

Ambiguous contents?

From Theory of Language to Language Learning

Programming Language Semantics Denotational Semantics

Semantics + Language Preservation

Programming Language Semantics

LEARNING ANY NEW LANGUAGE FROM

Programming Language Semantics Axiomatic Semantics of Parallel Programs

Programming Language Semantics Denotational Semantics

Programming Language Semantics Denotational Semantics

Supervision and Learning Styles

Programming Language Semantics Denotational Semantics

Programming Language Semantics Axiomatic Semantics

Programming Language Semantics Axiomatic Semantics