410 likes | 575 Views
Gradience and Similarity in Sound, Word, Phrase and Meaning. Jay McClelland Stanford University. Collaborators. Dave Rumelhart Mark Seidenberg Dave Plaut Karalyn Patterson Matt Lambon Ralph Cathy Harris Gary Lupyan Lori Holt Brent Vander Wyk Joan Bybee.
E N D
Gradience and Similarity in Sound, Word, Phraseand Meaning Jay McClelland Stanford University
Collaborators • Dave Rumelhart • Mark Seidenberg • Dave Plaut • Karalyn Patterson • Matt Lambon Ralph • Cathy Harris • Gary Lupyan • Lori Holt • Brent Vander Wyk • Joan Bybee
The Compositional View of Language (Fodor and Pylyshyn, 1988) • Linguistic objects may be atoms or more complex structures like molecules. • Molecules consist of combinations of atoms that are consistent with structural rules. • Mappings between form and meaning depend on structure-sensitive rules. • This allows languages to be combinatorial, productive, and systematic. • [ John [ hit [the ball] ] ] • [ [w [ ei [t]]] [^d] • S NP, VPVP V NP; NP … • word stem+affix • stem {syl}+syl’+{syl} • syl {onset} + rhymerhyme nuc + {coda} • Subj Agent • Verb Action • Obj Patient • Vi+past stemi + [^d]
Critique • The number of units present in an expression is not always clear • The number of different categories of units is not at all clear • Real native ‘idiomatic’ language ability involves many subtle patterns not easily captured by rules • There is no generally accepted framework for characterizing how rules work
There is less discreteness in some cases than others And more in some domains than to others
Some cases in language where it is hard to decide on the number of units • How many words? • Cut out, cut up, cut over; cut it out? • Barstool, shipmate; another, a whole nother • How many morphemes? • Pretend, prefer, predict, prefabricate • Chocoholic, chicketarian • Strength, length; health, wealth; dearth, filth • How many syllables? • Every, memory, livery; leveling, shoveling; evening… • How many phonemes? • Teach, boy, hint, swiftly, softly • Memory, different • What happened to you?
Cases in which it is unclear how many types of units are needed • Object types: • Species • California redwoods • Butterflies along a mountain range • Types of tomatoes • Restaurants • Japanese • Italian • Seafood • Linguistic types • Word meanings • ball • run • Segment types • fuse, fusion • dirt, dirty (cf. sturdy)
Characterizations of how rules work • Rule or exception (Pinker et al) • V + past Stem + /^d/ • gowent; dig dug; keep kept; say said • General and specific rules (Halle, Marantz) • V + past Stem +/^d/ • if stem ends in ‘eep’: ‘ee’ ‘eh’ • if stem = say: ‘ay’ ‘eh’ • Output oriented approaches • OT: e.g. ‘No Coda’ • Bybee’s output oriented past tense schemas • A lax vowel followed by a dental, as in hit, cut, bid, waited • ‘ah’ or ‘uh’ followed by a (preferably nasalized) velar as in (sang, flung, dug …)
How do the general and the specific work together? • Past Tenses • likeliked but keepkept • paypaid but saysaid • English spellingsound mapping • mint, hint, … but pint • save, wave, … but have • Meanings of sentences • John saw a dog • John saw a doctor
Can the contexts of application of the more specific patterns be well defined? • For the past tense • Generally, words with more complex rhymes will be more susceptible to reduction • *VV[S]t where [S] stands for stop consonant • Item frequency and number of other similar items both appear to contribute • For spelling to sound • Sources of spelling are lost in history • But item frequency and similar neighbors play important roles • For constructions • Characterization of constraints is generally relatively vague and seems to be a matter of degree • Subj: Human V: saw Obj: Professional ‘Paid a visit to’ • John saw an accountant • John saw an architect • The baby saw a doctor • The boy saw a doctor • Perhaps similarity to neighbors plays an important role here as well
Summary • Linguistic objects vary continuously in their degree of compositionality and in their degree of systematicity • While some forms seem highly compositional and some forms seem highly regular/systematic, there is generally a detectable degree of specificity in every familiar form (Goldberg) • Even nonce forms reflect specific effects of specific ‘neighbors’ • It may be useful to adopt the notion that language consists of tokens selected from a specified taxonomy of units and that linguistic mappings are determined by systems of rules… • BUT, an exact characterization is not possible in this framework • Units and rules are meta-linguistic constructs which do not play a role in language processing, language use or language acquisition. • These constructs impede understanding of language change
What will the alternative look like? It will be a system that allows continuous patterns over time -- articulatory gestures and auditory waveforms to generate graded and distributed internal representations that capture linguistic structure and mappings in ways that respect both the continuous and discrete aspects of linguistic structure without enumeration of units explicit representation of rules
Many neural network models rely on distributed internal representations in which there is no discrete representation of linguistic units. To date most of these models have adopted some sort of concession to units in their inputs and outputs. We do this because we have not yet achieved the ability to avoid doing so, not because we believe these units exist Units in Neural Network Models
A Connectionist Model of Word Reading (Plaut, McC, Seidenberg & Patterson, 1996) • Task is to learn to map spelling to sound, given spelling-sound pairs from 3000 word corpus. • Network learns gradually from frequency weighted exposure to pairs in the corpus. • For each presentation of each item: • Input units corresponding to spelling are activated. • Processing occurs through propagation of activation from input units through hidden units to output units, via weighted connections. • Output is compared to the item’s pronunciation. • Small adjustments to connections are made to reduce difference. /m/ /I/ /n/ /t/ M I N T
Aspects of the Connectionist Model • Mapping through hidden units forces network to use overlapping internal representations. -Allows sensitivity to combinations if necessary -Yet tends to preserve overlap based on similarity • Connections used by different words with shared letters overlap, so what is learned tends to transfer across items. /m/ /I/ /n/ /t/ M I N T
Processing Regular Items: MINT and MINE • Across the vocabulary, consistent co-occurrence of M with /m/, regardless of other letters, leads to weights linking M to /m/ by way of the hidden units. • The same thing happens with the other consonants, and most consonants in other words. • For the Vowel I: • If there’s a final E produce /ai/ • Otherwise produce /I/ /m/ /I/ /n/ /t/ M I N T
Processing an Exception: PINT • Because PINT overlaps with MINT, there’s transfer • Positive for N -> /n/ and T -> /t/ • Negative for I -> /ai/ • Of course P benefits from learning with PINK, PINE, POST, etc. • Knowledge of regular patterns is hard at work in processing this and all other exceptions. • The only special thing the network needs to learn is what to do with the vowel. • Even this will benefit from weights acquired from cases such as MIND, FIND, PINE, etc. /p/ /ai/ /n/ /t/ P I N T
pint bread hint dent Model captures patterns associated with ‘units’ of different scopes without explicitly representing them. • The model learns basic regular correspondences, generalizes appropriately to non-words. • mint, rint; seat, reat; rave, mave… • It learns to produce the correct output for all exceptions in the corpus. • pint, bread, have, etc… • It is sensitive to sub-regularities such as special vowels with certain word-final clusters, c-conditioning, final-e conditioning… • sold, nold; book, grook; plead, tread, ?klead • bake, dake; rage, dage / rice, bice • Shows graded sensitivity modulated by frequency to item-specific, rhyme-specific, and context-sensitive correspondences. Error / Settling Time High LowFrequency
How does it work? • Correspondences of different scopes are represented in the connections between the input and the output that depends on them. • Some correspondences, e.g. in the word-initial consonant cluster, are highly compositional, and the model treats them this way. • Others, such as those involving the pronunciation of the vowel, are highly dependent on context, but to a degree that varies by with the type of item.
Elman’s Simple Recurrent Network • Finds larger units with coherent internal structure from time series of inputs. • Series are usually discretized at conventional linguistic unit boundaries, but this is just for simplicity. • Uses hidden unit state from processing of previous input as context for next input.
Elman networks learn syntactic categories from word sequences
NV Agreement and Verb successor prediction S who Vp Vs N S who Vp Vs N S who Vp Vs N S who Vp Vs N S who Vp Vs N
S who Vp Vs N Prediction withan embedded clause S who Vp Vs N S who Vp Vs N S who Vp Vs N S who Vp Vs N S who Vp Vs N
Attractor Neural Networks • Advantages • Discreteness as well as continuity • Captures general and specific in a single network for semantic as well as spelling-sound regularity • General information is learned faster and is more robust to damage, capturing development and learning • Adding context would allow context to shade or select meaning
Can we do without units on the input and the output? • I think it will be crucial to do so because speech gestures are continuous. • They have attractor-like characteristics but also vary continuously in many ways and as a function of a wide range of factors • It will then be entirely up to the characteristics of the processing system to exhibit the relevant partitioning into units
Keidel’s model that learns to translate from continuous spoken input to articulatory parameters. • The input to the model is a time series of auditory parameters from actual spoken CV syllables. • Output is the identity of the C and the V, but… • It should be possible to translate from auditory input to the continuous articulatory movements that would ‘imitate’ the input. • An important future direction
Units and Rules as Emergents • In all three example models, units and rules are emergent properties that admit of matters of degree. • We can choose to talk about such things as though they have an independent existence for descriptive convenience but they may have no separate mechanistic role in language processing, language learning, language structure, or language change. • Although many models use ‘units’ in their inputs and outputs, the claim is that this is a simplification that actually limits what the model can explain.
Beyond the Phone and the Phoneme • Some additional problems with the notions of phonetic segment. • Model of gradual language change exhibiting pressure to be regular and to be brief.
Just a Few of the Problems with Segments in Phonology • Enumeration of segment types is fraught with problems. • No universal inventory; there are cross-language similarities of segments but every segment is different in every language (Pierrehumbert, 2001). • When we speak the articulation of the same “segment” depends on • Phonetic context • Word frequency and familiarity • Degree of compositionality, which in turn depends on frequency • Number of competitors • Many other aspects of context… • Presence/absence of aspects of articulation is a matter of degree. • Nasal ‘segment’, release burst, duration /degree of approximation to closure in l’s, d’s and t’s… • Language change involves a gradual process of reduction/adjustment. Segments disappear gradually, not discretely. What is it half way through the change? • The approach misses out on some of the global structure of spoken language that needs to be taken into account in any theory of phonology.
A model of language change that produces irregular past tenses (with Gary Lupyan) • Our initial interest focused on quasi-regular exceptions: • Items that add /d/ or /t/ and reduce the vowel: • Did, made, had, said, kept, heard, fled… • Items already ending in /d/ or /t/ that change (usually reduce) the vowel: • hid, slid, sat, read, bled, fought.. • We suggest these items reflect historical change sensitive to: • Pressure to be brief contingent on comprehension • Consistency in mapping between sound and meaning
My understanding of what you said Your understanding of what I said Speech My Intended Meaning Your Intended Meaning Two constraints on communication • The spoken form I produce is constrained: • To allow you to understand • To be as short as possible given that it is understood.
Your understanding of what I said What I say whenI want to communicate a particular message. Simplified version of this actually explored by Lupyan and McClelland (2003) • The network has a • Phonological word pattern • Corresponding semantic pattern For present and past tense forms of 739 verbs • It is trained with the phonological word form as input, and this is used to produce a semantic pattern. • The error at the output layer is back-propagated allowing a change in the connection weights. • The error is also back-propagated to the input units, and is used to adjust the phonological word pattern. • There is also a pressure on the phonological word form representation to be simpler, depending on how well the utterance was understood (summed error at the output units). • The improved phonological word form is then stored in the list.
Model Details: L&M Simulation 2a • Semantic patterns • ‘Quasi-componential’ representations of tense plus base word meaning are created, based on including tense information in the feature vectors passed through the encoder network. • The representation of past tense varies somewhat from word to word. • Phonological patterns have one unit per phoneme but long vowels or diphthongs have an extra unit, plus a unit for the syllabic ‘ed’. Initialized with binary values (0,1). • Although units still stand for phonemes, presence/absence is a matter of degree. • Learning rate for the representation is slow relative to learning rate for the weights. • 739 monosyllabic verbs, frequency weighted. • Training corpus is fully regularized at the start of the simulation.
Simulation of Reductive Irregularization Effects • In English, frequent items are less likely to be regular. • Also, d/t items are less likely to be regular. • The same effects emerge in the simulation. • While the past tense is usually one phoneme longer than present, this is less true for the high frequency past tense items. • Reduction of high frequency past tenses is to a phoneme other than the word final /d/ or /t/. • Regularity and role in mapping to meaning protects inflection.
Further Simulations • Simulation 2b showed that when irregulars were present in the training corpus, the network tended to preserve their irregularity. • In ongoing work an extended model shows a tendency to regularize low-frequency exceptions. • Simulation 2c used fully componential semantic representation of past tense, resulting in much less tendency to reduce.
Discussion and Future Directions • The work discussed here is a small example of what needs to be accomplished, even for a model of phonology. • Extending the approach to continuous speech input will be a big challenge • Extending continuous speech to full sentences as input and output will be a bigger challenge still • Neural network approaches are gaining promenance as processing power grows, and these things will be increasingly possible • It will still be useful to notate specific linguistic units, but machines will not need these to communicate – no more that our minds need them to speak and understand.