1 / 69

A New Perspective on Language Understanding: Distributed Representation Approach

Explore the debate on structured vs. distributed language representations in cognitive science. Learn about Fodor/Chomsky vision, Sentence Gestalt model, and Google Neural Machine Translation system. Discover how a probabilistic representation of meaning and word-by-word update models enhance language understanding.

ewillie
Download Presentation

A New Perspective on Language Understanding: Distributed Representation Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Networks and Language Understanding: Do we need to rely on predetermined structured representations to understand and use language? Psychology 209 – 2019 February 21, 2019

  2. The Fodor / Chomsky Vision • We can understand and process sentences we’ve never heard before because • We use a system of structure sensitive rules • That processes sentences according to their structure and • Composes meaning as an assemblage of parts whose meanings are already known • Fodor’s example: • The man loves the woman • The woman loves the man • He claims, among other things, that the meaning of the word ‘loves’ contributes the same thing to the meaning of the overall sentence in both cases.

  3. Some sentences that pose problems for this view • John loves Mary • Mary loves John • John loves ice cream • She felt the baby kick • John poured coffee into the container • Jill put apples into the container • I like going to the movies with friends • I like eating spaghetti with meatballs • I like eating Chinese food with chopsticks • I saw the sheep grazing in the field • I saw the grand canyon flying to New York

  4. An alternative perspective • The intention of a speaker is to convey information about a situation or event • Words, and the order in which they come, are clues to aspects of meaning, but clues to any aspect of meaning can come from anywhere in the sentence • A commitment to structure just gets in the way – a learned distributed representation that discovers how to capture sentence meaning is the best solution to the problem

  5. Two Models • The Sentence Gestalt model • St. John & McClelland (1990) • Rabovsky, Hansen & McClelland (2016) • The Google Neural Machine Translation system • Wu et al, 2016 • Gideon Lewis-Kraus, The Great AI Awakening

  6. The Sentence Gestalt Model - theory(based on McClelland, St. John, & Taraban, 1989; St. John & McClelland, 1990) • Words as „cuestomeaning“ (Rumelhart, 1979) thatchangetherepresentationofsentencemeaning (correspondingtopatternofneuralactivity, modeled in artificialneuralnetwork) • Activationstateimplicitlyrepresentssubjectiveprobabilitydistributionsoverthesemanticfeaturesoftheeventdescribedbya sentence • Noassumptionofspecificformatoftheinternalrepresentationofsentences: Representationisnot directlytrainedbut insteadusedasbasistorespondtoprobes (e.g., answerquestionsconcerningdescribedevent). Feedback only on responsestoprobes • word-by-word update of a probabilisticrepresentationofmeaningwiththegoaltomaximizetheagreementbetweenthetrueprobabilityofeachpossibleanswertoeachpossiblequestionandtheestimatesthenetworkmakesgiventhewordsseen so far

  7. Simplified implemented model • Uses a simple generative model of events and sentences that describe them • Uses a simplified set of queries to constrain learning in the model (thematic roles) • Other versions of the query model are possible including • Accounting for information derived from other sources about an event or situation • Questions we might be responsible for answering based on expectations of others

  8. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Target (176) Output (176) Hidden 2 (100) Probe (176) Sentences  „The man playschess.“ Events (rolefillerpairs): Agent: man, action: play, patient: chess

  9. Learning • Model is probed for all aspects of meaning of the event after every word • languagelearnerobserveseventandhears a sentenceaboutit - learningbased on comparisonofcomprehensionoutputandevent • anticipationofsentencemeaning • Minimum ofcross-entropyerror: Activationofeachfeatureunitcorrespondstotheconditionalprobabilityofthatfeature in thatsituation(Rumelhart et al., 1995) • In ideally trained model, change in activation at the SG layer induced by each incoming word would support accurate update in the probabilities of semantic features ‚cued‘ by that word

  10. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) The manplayschess.

  11. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) The manplayschess.

  12. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) The manplayschess.

  13. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Agent? The manplayschess.

  14. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Agent? The manplayschess.

  15. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Agent? The manplayschess.

  16. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Action? The manplayschess.

  17. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Action? The manplayschess.

  18. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Action? The manplayschess.

  19. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Patient? The manplayschess.

  20. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Patient? The manplayschess.

  21. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Patient? The manplayschess.

  22. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) The man playschess.

  23. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) The man playschess.

  24. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) The man playschess.

  25. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) Action? The man playschess.

  26. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) Action? The man playschess.

  27. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) Action? The man playschess.

  28. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) Patient? The man playschess.

  29. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) Patient? The man playschess.

  30. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) Patient? The man playschess.

  31. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „chess“ Probe (176) The man playschess.

  32. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „chess“ Probe (176) The man playschess.

  33. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „chess“ Probe (176) The man playschess.

  34. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „chess“ Probe (176) Patient? The man playschess.

  35. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „chess“ Probe (176) Patient? The man playschess.

  36. update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „chess“ Probe (176) Patient? The man playschess.

  37. St J & McC Corpus and Results Sentences could be active or passive,constituents can be vaguely identified ormay be left out it strongly implied. The model could use word order and meaningas well as syntactic markers, capturing eventconstraints and using context to disambiguate Assessment of each participant depended onall words in sentence (next slide)

  38. Changing interpretations of role fillers as a sentence unfolds

  39. Limitations and Alternatives • The query language appears to build in a commitment to structure • I see this as a limitation – queries of all kinds, posed in many different kinds of ways, are likely to be the source of teaching information for real human learners • Machine translation might seem to offer one solution to this problem, but may not really require sufficient attention to meaning. • Other approaches are definitely being explored, including various kinds of question-answering systems. • What approaches do you think might be interesting?

  40. Google’s Neural Machine Translation System

  41. Which is the original, and which is the result of E –> J –> E translation? • Kilimanjaro is a mountain of 19,710 feet covered with snow and is said to be the highest mountain in Africa. The summit of the west is called “Ngaje Ngai” in Masai, the house of God. Near the top of the west there is a dry and frozen dead body of a leopard. No one has ever explained what the leopard wanted at that altitude. • Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,” the House of God. Close to the western summit there is the dried and frozen carcass of a leopard. No one has explained what the leopard was seeking at that altitude. • I added two articles to the translation.

  42. Ideas in the GNMT systems • Sequence-to-Sequence model • Blissfully uncommitted to any structure whatsoever • Attention and Bi-directionality • Is some structure sneaking in? • Words seem to have a special status, but in written text, words do appear to have external reality • They still pose problems, however

  43. Sequence to Sequence Model of Sutskever, Vinyals & Le A sentence Gestalt-like representation Some details: four stacked LSTM’s; different LSTMs on the encoding and decoding side

  44. The Wu et al GNMT model:Attention, Bi-directionality, and Skip-connections

  45. A Success and a Failure • Does well with this kind of case: • The hen that chased the dog was too shaggy • La gallinaque perseguia al perro era demasiadopeluda • But not this kind: • The trombone did not fit in the suitcase because it was too small • El trombonno cabiaen la maletaporque era demasiadopequeno • This seems to indicate it is not fully processing the semantics -- meaning is necessary to determine the correct referent here (but more work is needed…) • The best system may ultimately have to be responsible for understanding the meaning of the sentence, rather than just producing translations.

  46. The N400 component of the ERP What is the N400? • “A temporally delimited electrical snapshot of the intersection of a feedforward flow of stimulus-driven activity with a state of the distributed, dynamically active neural landscape that is semantic memory” • Tanya asks: What does this mean???

  47. The N400 component of the ERP Ouraccount: • Change of a representationofmeaningthatimplicitlyandprobabilisticallyrepresents all aspectsofmeaningoftheeventdescribedby a sentence

  48. The N400 component of the ERP Modulating variables • Semantic violations, • Contextual fit, • Frequency,... • Meaning processing • Functionalbasis? • Lexicalaccess?(Lau et al., 2008) • Semantic inhibition? (Debruille, 2007) • Semantic integration?(Baggio & Hagoort, 2011)

  49. N400 correlate: High probability Low probability Semantic violation

  50. Model environment(resultsbased on 10 runs, eachtrained on 800000 sentences)

More Related