250 likes | 373 Views
From Sequential Structure to Semantic Interpretation: More Connectionist Research on Language Processing. PDP Class Lecture February 14, 2011. The Simple Recurrent Network. Network is trained on a stream of elements with sequential structure At step n, target for output is next element.
E N D
From Sequential Structure to Semantic Interpretation:More Connectionist Research on Language Processing • PDP Class Lecture • February 14, 2011
The Simple Recurrent Network • Network is trained on a stream of elements with sequential structure • At step n, target for output is next element. • Pattern on hidden units is copied back to the context units. • After learning the network comes to retain information about preceding elements of the string, allowing expectations to be conditioned by an indefinite window of prior context.
Learned and imputed hidden-layer representations (average vectors over all contexts) ‘Zog’ representationderived by averagingvectors obtained byinserting novel item in place of each occurrence of ‘man’.
Components trackingconstituents withinclauses of differenttypes.
Can we extend the approach to address comprehension? Who did what to whom, etc
Some factors in comprehension • Sentence structure and constraints on events are both important: • The boy chased the girl. • The girl chased the boy. • The car was parked by the attendant. • The car was parked by the lamppost. • We ate some food with some friends that we like. • We found a painting in the gallery that was painted by Rembrandt. • The horse raced past the barn... • The horse dragged past the barn… • The cart raced past the barn…
Alternative Approaches • Parsing based approaches: • ‘Syntax proposes, Semantics Disposes’ • Although data were collected that initially seemed to support this, further studies changed the picture (see next slide). • Beam Search and Particle Filtering • Keep several explicit alternatives active at a time; discard alternatives as they become implausible. • The PDP approach • Use constituents of the sentence as they are encountered to construct a representation of the event described be the sentence directly. • Keep a single distributed representation that implicitly represents a mixture of possibilities.
A Syntactic Parsing Principle:‘Minimal Attachment’ • The principle predicts that a prepositional phrase following a direct object will be treated as a constituent of the verb phrase of a sentence . • This leads to the prediction that subjects will be slower to read the last word of (b) relative to (a) below: (a) The spy shot the policeman with the revolver. (b) The spy shot the policeman with the binoculars. • Although this seemed to be true for the sentences used, the reverse is true for other sentences: (a) The man read the article in the magazine. (b) The man read the article in the bathtub.
What about the idea that everything depends on the verb? • The spy saw the policeman with binoculars • The spy saw the policeman with a revolver • The bird saw the birdwatcher with binoculars • The bird saw its prey with binoculars • The children collected … • The rain collected …
Additional Aspects of Sentence Comprehension • Context helps us: • Select the correct meaning of ambiguous words • The boy hit the ball with the bat. • Fill in missing information • The boy spread the peanut butter on the bread. • Shade and specify the ‘meaning’ of a particular word • The container held the apples. • The container held the coffee. • The boy kissed someone under the mistletoe. • The baby rolled the ball to her daddy. • The slugger hit the ball out of the park. • John loves Mary. • John loves ice cream. • The pope loves sinners. • The {writer/student/goat} finished the book.
The Role of Situation (Elman, 2009) • The shopper saved… • The lifeguard saved… • There was a big sale at the swimshop. The lifeguard saved… • … was skating … primes ‘arena’ • … had skated … does not
Do words have meanings, or are they clues to meaning? • For a first approximation, the lexicon is the store of words in long-term memory from which the grammar constructs phrases and sentences. • [A lexical entry] lists a small chunk of phonology, a small chunk of syntax, and a small chunk of semantics. • Ray Jackendoff • My approach suggests that comprehension, like perception, should be likened to Hebb's (1949) paleontologist, who uses his beliefs and knowledge about dinosaurs in conjunction with the clues provided by the bone fragments available to construct a full-fledged model of the original. In this case the words spoken and the actions taken by the speaker are likened to the clues of the paleontologist, and the dinosaur, to the meaning conveyed through these clues. • David Rumelhart
The Sentence Gestalt Model • Input consists of sequences of words, • After each word, net attempts to complete a set of role-filler pairs (can probe with role or filler). • Sentence gestalt is used to constrain completion and serves as context for interpretation of next constituent. • Rhode (2002) extended this model to allow probes for fillers of roles with respect to particular head words (e.g. verbs) so model could deal with embedded clauses.
A probabilistic formulation of back propagation • Think of the activation of a unit as representing the network’s estimate of the probability that the unit should be on in the given context. • We can measure the degree to which the observed target values match their predicted values using a measure called ‘Cross-Entropy’ CEp = -Si[tiplog(aip) + (1-tip)log(1-aip)] • If targets are actually probabilistic, minimizing CEp maximizes the probability of the observed target values. • The minimum value of the CE will occur when the activations match the target probabilities. (SSE also has the same minimum, but lacks the explicit probabilistic interpretation). • [CE has the practical advantage of eliminating the ‘pinned output unit’ problem.]
Sentences can be active or passive,constituents can be vaguely identified ormay be left out it strongly implied.
Changing interpretations of role fillers as a sentence unfolds
B St. John’s (1992) Story Gestalt Model • Learns from stereotyped multi-proposition stories with slots and fillers • Can answer specific questions, fill in missing propositions based on typical proerties of scripts, etc.
Limitations • Can only deal with ‘one level’ event-structures • Cannot handle embeddings or modifiers or consituents, as in • ‘The policeman saw that the young girl was bitten by the mean dog’. • Two follow-on approaches • Use fuller probes for completions of embedded propositions (Bryant and Miikulainen, 2001) • Use a recursively-constructed compressed representation of the semantics of the sentence (Rohde, 2002).
Hierarchical Compressed Representation of a Moderately Complex Sentence Compressed DecodableRepresentation of a Head-relation-filler triple
Used a common representation constrained by three-role propositions and sentences. Did prediction and production as well as comprehension. Rohde’s (2002) Model
The models pre-supposes propositional representations of events … that does not seem right. Can we get rid of stipulation of structure and query the Gestalt with an English question? Can we create a target for learning based on an actual scene representation rather than a propositional representation? One complaint remains
Schematic of a Future Model Event Event