690 likes | 702 Views
Explore the debate on structured vs. distributed language representations in cognitive science. Learn about Fodor/Chomsky vision, Sentence Gestalt model, and Google Neural Machine Translation system. Discover how a probabilistic representation of meaning and word-by-word update models enhance language understanding.
E N D
Neural Networks and Language Understanding: Do we need to rely on predetermined structured representations to understand and use language? Psychology 209 – 2019 February 21, 2019
The Fodor / Chomsky Vision • We can understand and process sentences we’ve never heard before because • We use a system of structure sensitive rules • That processes sentences according to their structure and • Composes meaning as an assemblage of parts whose meanings are already known • Fodor’s example: • The man loves the woman • The woman loves the man • He claims, among other things, that the meaning of the word ‘loves’ contributes the same thing to the meaning of the overall sentence in both cases.
Some sentences that pose problems for this view • John loves Mary • Mary loves John • John loves ice cream • She felt the baby kick • John poured coffee into the container • Jill put apples into the container • I like going to the movies with friends • I like eating spaghetti with meatballs • I like eating Chinese food with chopsticks • I saw the sheep grazing in the field • I saw the grand canyon flying to New York
An alternative perspective • The intention of a speaker is to convey information about a situation or event • Words, and the order in which they come, are clues to aspects of meaning, but clues to any aspect of meaning can come from anywhere in the sentence • A commitment to structure just gets in the way – a learned distributed representation that discovers how to capture sentence meaning is the best solution to the problem
Two Models • The Sentence Gestalt model • St. John & McClelland (1990) • Rabovsky, Hansen & McClelland (2016) • The Google Neural Machine Translation system • Wu et al, 2016 • Gideon Lewis-Kraus, The Great AI Awakening
The Sentence Gestalt Model - theory(based on McClelland, St. John, & Taraban, 1989; St. John & McClelland, 1990) • Words as „cuestomeaning“ (Rumelhart, 1979) thatchangetherepresentationofsentencemeaning (correspondingtopatternofneuralactivity, modeled in artificialneuralnetwork) • Activationstateimplicitlyrepresentssubjectiveprobabilitydistributionsoverthesemanticfeaturesoftheeventdescribedbya sentence • Noassumptionofspecificformatoftheinternalrepresentationofsentences: Representationisnot directlytrainedbut insteadusedasbasistorespondtoprobes (e.g., answerquestionsconcerningdescribedevent). Feedback only on responsestoprobes • word-by-word update of a probabilisticrepresentationofmeaningwiththegoaltomaximizetheagreementbetweenthetrueprobabilityofeachpossibleanswertoeachpossiblequestionandtheestimatesthenetworkmakesgiventhewordsseen so far
Simplified implemented model • Uses a simple generative model of events and sentences that describe them • Uses a simplified set of queries to constrain learning in the model (thematic roles) • Other versions of the query model are possible including • Accounting for information derived from other sources about an event or situation • Questions we might be responsible for answering based on expectations of others
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Target (176) Output (176) Hidden 2 (100) Probe (176) Sentences „The man playschess.“ Events (rolefillerpairs): Agent: man, action: play, patient: chess
Learning • Model is probed for all aspects of meaning of the event after every word • languagelearnerobserveseventandhears a sentenceaboutit - learningbased on comparisonofcomprehensionoutputandevent • anticipationofsentencemeaning • Minimum ofcross-entropyerror: Activationofeachfeatureunitcorrespondstotheconditionalprobabilityofthatfeature in thatsituation(Rumelhart et al., 1995) • In ideally trained model, change in activation at the SG layer induced by each incoming word would support accurate update in the probabilities of semantic features ‚cued‘ by that word
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) The manplayschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) The manplayschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) The manplayschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Agent? The manplayschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Agent? The manplayschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Agent? The manplayschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Action? The manplayschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Action? The manplayschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Action? The manplayschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Patient? The manplayschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Patient? The manplayschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „The man“ Probe (176) Patient? The manplayschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) Action? The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) Action? The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) Action? The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) Patient? The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) Patient? The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „plays“ Probe (176) Patient? The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „chess“ Probe (176) The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „chess“ Probe (176) The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „chess“ Probe (176) The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „chess“ Probe (176) Patient? The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „chess“ Probe (176) Patient? The man playschess.
update network querynetwork Sentence Gestalt (100) Input (74) Hidden 1 (100) Output (176) Hidden 2 (100) „chess“ Probe (176) Patient? The man playschess.
St J & McC Corpus and Results Sentences could be active or passive,constituents can be vaguely identified ormay be left out it strongly implied. The model could use word order and meaningas well as syntactic markers, capturing eventconstraints and using context to disambiguate Assessment of each participant depended onall words in sentence (next slide)
Changing interpretations of role fillers as a sentence unfolds
Limitations and Alternatives • The query language appears to build in a commitment to structure • I see this as a limitation – queries of all kinds, posed in many different kinds of ways, are likely to be the source of teaching information for real human learners • Machine translation might seem to offer one solution to this problem, but may not really require sufficient attention to meaning. • Other approaches are definitely being explored, including various kinds of question-answering systems. • What approaches do you think might be interesting?
Which is the original, and which is the result of E –> J –> E translation? • Kilimanjaro is a mountain of 19,710 feet covered with snow and is said to be the highest mountain in Africa. The summit of the west is called “Ngaje Ngai” in Masai, the house of God. Near the top of the west there is a dry and frozen dead body of a leopard. No one has ever explained what the leopard wanted at that altitude. • Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,” the House of God. Close to the western summit there is the dried and frozen carcass of a leopard. No one has explained what the leopard was seeking at that altitude. • I added two articles to the translation.
Ideas in the GNMT systems • Sequence-to-Sequence model • Blissfully uncommitted to any structure whatsoever • Attention and Bi-directionality • Is some structure sneaking in? • Words seem to have a special status, but in written text, words do appear to have external reality • They still pose problems, however
Sequence to Sequence Model of Sutskever, Vinyals & Le A sentence Gestalt-like representation Some details: four stacked LSTM’s; different LSTMs on the encoding and decoding side
The Wu et al GNMT model:Attention, Bi-directionality, and Skip-connections
A Success and a Failure • Does well with this kind of case: • The hen that chased the dog was too shaggy • La gallinaque perseguia al perro era demasiadopeluda • But not this kind: • The trombone did not fit in the suitcase because it was too small • El trombonno cabiaen la maletaporque era demasiadopequeno • This seems to indicate it is not fully processing the semantics -- meaning is necessary to determine the correct referent here (but more work is needed…) • The best system may ultimately have to be responsible for understanding the meaning of the sentence, rather than just producing translations.
The N400 component of the ERP What is the N400? • “A temporally delimited electrical snapshot of the intersection of a feedforward flow of stimulus-driven activity with a state of the distributed, dynamically active neural landscape that is semantic memory” • Tanya asks: What does this mean???
The N400 component of the ERP Ouraccount: • Change of a representationofmeaningthatimplicitlyandprobabilisticallyrepresents all aspectsofmeaningoftheeventdescribedby a sentence
The N400 component of the ERP Modulating variables • Semantic violations, • Contextual fit, • Frequency,... • Meaning processing • Functionalbasis? • Lexicalaccess?(Lau et al., 2008) • Semantic inhibition? (Debruille, 2007) • Semantic integration?(Baggio & Hagoort, 2011)
N400 correlate: High probability Low probability Semantic violation
Model environment(resultsbased on 10 runs, eachtrained on 800000 sentences)