1 / 81

Learning Language from its Perceptual Context

Learning Language from its Perceptual Context. Ray Mooney Department of Computer Sciences University of Texas at Austin. Joint work with David Chen Rohit Kate Yuk Wah Wong. Current State of Natural Language Learning.

bbrigman
Download Presentation

Learning Language from its Perceptual Context

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Rohit Kate Yuk Wah Wong

  2. Current State of Natural Language Learning • Most current state-of-the-art NLP systems are constructed by training on large supervised corpora. • Syntactic Parsing: Penn Treebank • Word Sense Disambiguation: SenseEval • Semantic Role Labeling: Propbank • Machine Translation: Hansards corpus • Constructing such annotated corpora is difficult, expensive, and time consuming.

  3. Semantic Parsing • A semantic parser maps a natural-language sentence to a complete, detailed semantic representation: logical form ormeaning representation (MR). • For many applications, the desired output is immediately executable by another program. • Two application domains: • GeoQuery: A Database Query Application • CLang: RoboCup Coach Language

  4. GeoQuery: A Database Query Application • Query application for U.S. geography database [Zelle & Mooney, 1996] How many states does the Mississippi run through? User Semantic Parsing answer(A, count(B, (state(B), C=riverid(mississippi), traverse(C,B)), A)) Query DataBase 10

  5. CLang: RoboCup Coach Language • In RoboCup Coach competition teams compete to coach simulated soccer players • The coaching instructions are given in a formal language called CLang If the ball is in our penalty area, then all our players except player 4 should stay in our half. Simulated soccer field Coach Semantic Parsing ((bpos (penalty-area our)) (do (player-except our{4}) (pos (half our))) CLang

  6. Semantic-Parser Learner Semantic Parser Natural Language Learning Semantic Parsers • Manually programming robust semantic parsers is difficult due to the complexity of the task. • Semantic parsers can be learned automatically from sentences paired with their logical form. NLMR Training Exs Meaning Rep

  7. Our Semantic-Parser Learners • CHILL+WOLFIE (Zelle & Mooney, 1996; Thompson & Mooney, 1999, 2003) • Separates parser-learning and semantic-lexicon learning. • Learns a deterministic parser using ILP techniques. • COCKTAIL(Tang & Mooney, 2001) • Improved ILP algorithm for CHILL. • SILT (Kate, Wong & Mooney, 2005) • Learns symbolic transformation rules for mapping directly from NL to MR. • SCISSOR(Ge & Mooney, 2005) • Integrates semantic interpretation into Collins’ statistical syntactic parser. • WASP(Wong & Mooney, 2006; 2007) • Uses syntax-based statistical machine translation methods. • KRISP (Kate & Mooney, 2006) • Uses a series of SVM classifiers employing a string-kernel to iteratively build semantic representations.

  8. WASPA Machine Translation Approach to Semantic Parsing • Uses statistical machine translation techniques • Synchronous context-free grammars (SCFG) (Wu, 1997; Melamed, 2004; Chiang, 2005) • Word alignments (Brown et al., 1993; Och & Ney, 2003) • Hence the name: Word Alignment-based Semantic Parsing

  9. A Unifying Framework for Parsing and Generation Natural Languages Machine translation

  10. A Unifying Framework for Parsing and Generation Natural Languages Semantic parsing Machine translation Formal Languages

  11. A Unifying Framework for Parsing and Generation Natural Languages Semantic parsing Machine translation Tactical generation Formal Languages

  12. A Unifying Framework for Parsing and Generation Synchronous Parsing Natural Languages Semantic parsing Machine translation Tactical generation Formal Languages

  13. A Unifying Framework for Parsing and Generation Synchronous Parsing Natural Languages Semantic parsing Compiling: Aho & Ullman (1972) Machine translation Tactical generation Formal Languages

  14. Synchronous Context-Free Grammars (SCFG) • Developed by Aho & Ullman (1972) as a theory of compilers that combines syntax analysis and code generation in a single phase. • Generates a pair of strings in a single derivation.

  15. Synchronous Context-Free GrammarProduction Rule Natural language Formal language QUERY What isCITY /answer(CITY)

  16. What is CITY answer ( CITY ) the capital CITY capital ( CITY ) loc_2 ( STATE ) of STATE stateid ( 'ohio' ) Ohio CITY ofSTATE / loc_2(STATE) CITY the capitalCITY / capital(CITY) QUERY What isCITY / answer(CITY) Synchronous Context-Free Grammar Derivation QUERY QUERY answer(capital(loc_2(stateid('ohio')))) Whatis thecapital of Ohio STATE Ohio / stateid('ohio')

  17. CITY capital CITY / capital(CITY) CITY of STATE / loc_2(STATE) Probabilistic Parsing Model d1 CITY CITY capital capital ( CITY ) CITY of loc_2 ( STATE ) STATE Ohio stateid ( 'ohio' ) STATE Ohio / stateid('ohio')

  18. CITY capital CITY / capital(CITY) CITY of RIVER / loc_2(RIVER) Probabilistic Parsing Model d2 CITY CITY capital capital ( CITY ) CITY of loc_2 ( RIVER ) RIVER Ohio riverid ( 'ohio' ) RIVER Ohio / riverid('ohio')

  19. CITY capital CITY / capital(CITY) CITY capital CITY / capital(CITY) CITY of STATE / loc_2(STATE) CITY of RIVER / loc_2(RIVER) + + Probabilistic Parsing Model d1 d2 CITY CITY capital ( CITY ) capital ( CITY ) loc_2 ( STATE ) loc_2 ( RIVER ) stateid ( 'ohio' ) riverid ( 'ohio' ) 0.5 0.5 λ λ 0.3 0.05 0.5 0.5 STATE Ohio / stateid('ohio') RIVER Ohio / riverid('ohio') Pr(d1|capital of Ohio) =exp( ) / Z 1.3 Pr(d2|capital of Ohio) = exp( ) / Z 1.05 normalization constant

  20. Overview of WASP Unambiguous CFG of MRL Lexical acquisition Training set, {(e,f)} Lexicon,L (an SCFG) Parameter estimation SCFG parameterized by λ Training Testing Input sentence, e' Output MR, f' Semantic parsing

  21. Tactical generation Tactical Generation • Can be seen as inverse of semantic parsing The goalie should always stay in our half Semantic parsing ((true) (do our {1} (pos (half our))))

  22. Output Input Generation by Inverting WASP • Same synchronous grammar is used for both generation and semantic parsing. Tactical generation: Semantic parsing: NL: MRL: QUERY What isCITY /answer(CITY)

  23. Learning Language from Perceptual Context • Children do not learn language from annotated corpora. • Neither do they learn language from just reading the newspaper, surfing the web, or listening to the radio. • The natural way to learn language is to perceive language in the context of its use in the physical and social world. • This requires inferring the meaning of utterances from their perceptual context.

  24. Language Grounding • The meanings of many words are grounded in our perception of the physical world: red, ball, cup, run, hit, fall, etc. • Symbol Grounding: Harnad (1990) • Even many abstract words and meanings are metaphorical abstractions of terms grounded in the physical world: up, down, over, in, etc. • Lakoff and Johnson’s Metaphors We Live By • Its difficult to put my words into ideas. • Interest in competitions is up. • Most work in NLP tries to represent meaning without any connection to perception or to the physical world; circularly defining the meanings of words in terms of other words or meaningless symbols with no firm foundation.

  25. ??? “Mary is on the phone”

  26. Ambiguous Supervision for Learning Semantic Parsers • A computer system simultaneously exposed to perceptual contexts and natural language utterances should be able to learn the underlying language semantics. • We consider ambiguoustraining data of sentences associated with multiple potential MRs. • Siskind (1996) uses this type “referentially uncertain” training data to learn meanings of words. • Extracting meaning representations from perceptual data is a difficult unsolved problem. • Our system directly works with symbolic MRs.

  27. ??? “Mary is on the phone”

  28. ??? “Mary is on the phone”

  29. ??? Ironing(Mommy, Shirt) “Mary is on the phone”

  30. ??? Ironing(Mommy, Shirt) Working(Sister, Computer) “Mary is on the phone”

  31. ??? Ironing(Mommy, Shirt) Carrying(Daddy, Bag) Working(Sister, Computer) “Mary is on the phone”

  32. ??? Ambiguous Training Example Ironing(Mommy, Shirt) Carrying(Daddy, Bag) Working(Sister, Computer) Talking(Mary, Phone) Sitting(Mary, Chair) “Mary is on the phone”

  33. Next Ambiguous Training Example Ironing(Mommy, Shirt) Working(Sister, Computer) Talking(Mary, Phone) ??? Sitting(Mary, Chair) “Mommy is ironing a shirt”

  34. Ambiguous Supervision for Learning Semantic Parsers contd. • Our model of ambiguous supervision corresponds to the type of data that will be gathered from a temporal sequence of perceptual contexts with occasional language commentary. • We assume each sentence has exactly one meaning in its perceptual context. • Recently extended to handle sentences with no meaning in its perceptual context. • Each meaning is associated with at most one sentence.

  35. Sample Ambiguous Corpus gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog)) Forms a bipartite graph

  36. KRISPER: KRISPwith EM-like Retraining • Extension of KRISP that learns from ambiguous supervision. • Uses an iterative EM-like method to gradually converge on a correct meaning for each sentence.

  37. KRISPER’s Training Algorithm 1. Assume every possible meaning for a sentence is correct gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

  38. KRISPER’s Training Algorithm 1. Assume every possible meaning for a sentence is correct gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

  39. KRISPER’s Training Algorithm 2. Resulting NL-MR pairs are weighted and given to KRISP gave(daisy, clock, mouse) 1/2 ate(mouse, orange) Daisy gave the clock to the mouse. 1/2 ate(dog, apple) 1/4 1/4 Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) 1/4 1/4 broke(dog, box) 1/5 1/5 1/5 The dog broke the box. gave(woman, toy, mouse) 1/5 1/5 gave(john, bag, mouse) 1/3 1/3 John gave the bag to the mouse. threw(dog, ball) 1/3 1/3 runs(dog) 1/3 The dog threw the ball. 1/3 saw(john, walks(man, dog))

  40. 0.92 0.11 0.32 0.88 0.22 0.24 0.71 0.18 0.85 0.14 0.95 0.24 0.89 0.33 0.97 0.81 0.34 KRISPER’s Training Algorithm 3. Estimate the confidence of each NL-MR pair using the resulting trained parser gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

  41. KRISPER’s Training Algorithm 4. Use maximumweightedmatching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957] gave(daisy, clock, mouse) 0.92 ate(mouse, orange) Daisy gave the clock to the mouse. 0.11 ate(dog, apple) 0.32 0.88 Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) 0.22 0.24 broke(dog, box) 0.71 0.18 0.85 The dog broke the box. 0.14 gave(woman, toy, mouse) 0.95 gave(john, bag, mouse) 0.24 0.89 John gave the bag to the mouse. threw(dog, ball) 0.33 0.97 runs(dog) 0.81 The dog threw the ball. 0.34 saw(john, walks(man, dog))

  42. KRISPER’s Training Algorithm 4. Use maximumweightedmatching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957] gave(daisy, clock, mouse) 0.92 ate(mouse, orange) Daisy gave the clock to the mouse. 0.11 ate(dog, apple) 0.32 0.88 Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) 0.22 0.24 broke(dog, box) 0.71 0.18 0.85 The dog broke the box. 0.14 gave(woman, toy, mouse) 0.95 gave(john, bag, mouse) 0.24 0.89 John gave the bag to the mouse. threw(dog, ball) 0.33 0.97 runs(dog) 0.81 The dog threw the ball. 0.34 saw(john, walks(man, dog))

  43. KRISPER’s Training Algorithm 5. Give the best pairs to KRISP in the next iteration, and repeat until convergence gave(daisy, clock, mouse) ate(mouse, orange) Daisy gave the clock to the mouse. ate(dog, apple) Mommy saw that Mary gave the hammer to the dog. saw(mother, gave(mary, dog, hammer)) broke(dog, box) The dog broke the box. gave(woman, toy, mouse) gave(john, bag, mouse) John gave the bag to the mouse. threw(dog, ball) runs(dog) The dog threw the ball. saw(john, walks(man, dog))

  44. Results on Ambig-ChildWorld Corpus

  45. New Challenge:Learning to Be a Sportscaster • Goal: Learn from realistic data of natural language used in a representative context while avoiding difficult issues in computer perception (i.e. speech and vision). • Solution: Learn from textually annotated traces of activity in a simulated environment. • Example: Traces of games in the Robocup simulator paired with textual sportscaster commentary.

  46. Simulated Perception Perceived Facts Grounded Language Learner Language Generator SCFG Semantic Parser Grounded Language Learning in Robocup Robocup Simulator Sportscaster Score!!!! Score!!!!

  47. Robocup Sportscaster Trace purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 Natural Language Commentary Meaning Representation pass ( purple7 , purple6 ) ballstopped kick ( purple6 ) pass ( purple6 , purple2 ) ballstopped kick ( purple2 ) pass ( purple2 , purple3 ) kick ( purple3 ) badPass ( purple3 , pink9 ) turnover ( purple3 , pink9 )

  48. Robocup Sportscaster Trace purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 Natural Language Commentary Meaning Representation pass ( purple7 , purple6 ) ballstopped kick ( purple6 ) pass ( purple6 , purple2 ) ballstopped kick ( purple2 ) pass ( purple2 , purple3 ) kick ( purple3 ) badPass ( purple3 , pink9 ) turnover ( purple3 , pink9 )

  49. Robocup Sportscaster Trace purple7 passes the ball out to purple6 purple6 passes to purple2 purple2 makes a short pass to purple3 purple3 loses the ball to pink9 Natural Language Commentary Meaning Representation pass ( purple7 , purple6 ) ballstopped kick ( purple6 ) pass ( purple6 , purple2 ) ballstopped kick ( purple2 ) pass ( purple2 , purple3 ) kick ( purple3 ) badPass ( purple3 , pink9 ) turnover ( purple3 , pink9 )

  50. Sportscasting Data • Collected human textual commentary for the 4 Robocup championship games from 2001-2004. • Avg # events/game = 2,613 • Avg # sentences/game = 509 • Each sentence matched to all events within previous 5 seconds. • Avg # MRs/sentence = 2.5 (min 1, max 12) • Manually annotated with correct matchings of sentences to MRs (for evaluation purposes only).

More Related