Memory-Based Models of Inflectional Morphology Acquisition and Processing

Memory-Based Models of inflectional morphology acquisition and processing Walter Daelemans (with Emmanuel Keuleers, CPL) CNTS, Dept. of Linguistics University of Antwerp walter.daelemans@ua.ac.be

Contents • Old and recent results using Memory-Based Learning in inflectional morphology (Pinker: fruit fly of psycholinguistics) • Memory-based learning with TiMBL • Learning is storage • Cases • (English and Dutch past tense) • More interesting • German plural • Dutch plural • Computational psycholinguistics methodology

Nature of the human language processing architecture • How is linguistic knowledge represented? • Words and Rules; mental lexicon and grammar • But: • Fuzziness, leakage • Semi-regularity, irregularity • Similarity-Based Reasoning • How is linguistic knowledge acquired? • Innate rules / constraints • Storage of (patterns of) observable linguistic items • Words, syllables, segments, …

Experience Machine Learning BIAS Learning Component Search Rj Ri Rk Output Input Rl Performance Component

Supervised Learners • Learning is approximating an underlying function f such that • y = f (x) or finding • MAXi P(yi|x) = MAXi P(y) * P(x|yi) • Discriminative: • Directly estimate parameters of P(y|x) • BUT: No estimation of underlying distributions • Memory-Based Learning: no assumptions about underlying P(y|x)distributions; non-parametric, local • Generative: • Estimate parameters of P(y) and P(x|y) • Probability distribution functions • BUT: Conditional independence assumptions

Eager vs Lazy Learning • Eager: learning is compression • Minimal Description Length principle • Ockham’s razor • Minimize size of abstracted model (core) plus length of list of exceptions not covered by model (periphery) • Lazy: learning is storage of exemplars + analogy • In language, what is core and what is periphery? • Small disjuncts, pockets of exceptions, polymorphism, … • Zipfian distributions • “Forgetting exceptions is harmful in language learning”

Visualizing disjunctivity

English Past Tense: Words + Rules • Mental Rule(s) + Memory • The past tense of to plit is plitted • VERB + ed (Default Rule), independent from non-categorical features • Explains generalization and default behavior • Is the past tense of to spling, splinged, splang, or splung (Bybee & Moder, 1983; Prasada & Pinker, 1993) • Race between Rule and Words (Associative Memory)

Prasada & Pinker Data • Rate on a scale from 1 to 7 • Today, I spling, yesterday I splinged • Today, I spling, yesterday I splung • Today, I plip, yesterday I plipped • Today, I plip, yesterday I plup

Memory-Based Alternative(Keuleers et al. forthcoming) • Features • Phonological segmental information last two syllables (onset, nucleus, coda) • = = = spl I N • Right alignment • Classes • Automatically derived from Levenshtein distance between root and past tense form • +Id, +tId, I->A • Rating ~ class distribution nearest neighbors • Simulation of results by Prasada and Pinker (1993) & Albright & Hayes (islands of regularity) data.

Prasada and Pinker Simulation

Replication Study in Dutch

Memory-based learning and classification • Learning: • Store instances in memory • Classification: • Given new test instance X, • Compare it to all memory instances • Compute similarity between X and memory instance Y • Update the top k of closest instances (nearest neighbors) • When done, extrapolate from the knearest neighbors to the class of X

Similarity / distance • Similarity determined by • Feature relevance (Gain Ratio, Quinlan) • Value similarity (mvdm, Stanfill & Waltz) • Exemplar relevance • Number of and distance to the nearest neighbors • Heterogeneity and density of the NN

The MVDM distance function • Estimate a numeric “distance” between pairs of values • “e” is more like “i” than like “p” in a phonetic task • “book” is more like “document” than like “the” in a parsing task • “NNP” is more like “NN” than like VBD in a part of speech tagging task • An instance of unsupervised learning in the context of a supervised learning task

Distance weighted class voting • Increasing the value of k is similar to smoothing • Subtler measurement of local neighborhood: making more distant neighbors count less in the class vote • Linear inverse of distance (with respect to max) • Inverse of distance • Exponential decay

Exemplar weighting • Scale the distance of a memory instance by some externally computed factor • Class prediction strength • Frequency • Typicality • Smaller distance for “good” instances • Bigger distance for “bad” instances

TiMBLhttp://ilk.uvt.nl/timbl • Tilburg Memory-Based Learner 5.2 • Version 6.0 (soon) will be open source • Available for research and education • Lazy learning, extending k-NN and IB1 • Optimized search for NN • Internal structure: tree, not flat instance base • Tree ordered by chosen feature weight • Many built-in optional metrics: feature weights, distance function, distance weights, exemplar weights, … • Server-client architecture

Back to Inflectional Morphology: German Plural • Notoriously complex but routinely acquired (at age 5) • Evidence for Words + Rules (Dual Mechanism)? -s suffix is default/regular (novel words, surnames, acronyms, …) -s suffix is infrequent (least frequent of the five most important suffixes) Vast majority of plurals should be handled in the words route according to W+R!

The default status of -s • Similar item missing Fnöhk-s • Surname, product name Mann-s • Borrowings Kiosk-s • Acronyms BMW-s • Lexicalized phrases Vergissmeinnicht-s • Onomatopoeia, truncated roots, derived nouns, ...

Data & Representation • Symbolic features • segmental information (syllable structure 2 last syllables, right alignment) • gender • ~25,000 nouns from CELEX

Clustering on MVDM materices

Acquisition Data:Summary of previous studies • Existing nouns: (Park 78; Veit 86; Mills 86; Schamer-Wolles 88; Clahsen et al. 93; Sedlak et al. 98) • Children mainly overapply -e or -(e)n • -s plurals are learned late • Novel words: (Mugdan 77; MacWhinney 78; Phillis & Bouma 80; Schöler & Kany 89) • Children inflect novel words with -e or -(e)n • More “irregular” plural forms produced than “defaults”

MBL simulation • model overapplies mainly -en and -e • -e and -en acquired fastest • -s is learned late and imperfectly • Mainly but not completely parallel to input frequency (more -s overgeneralization than -er generalization)

Bartke, Marcus, Clahsen (1995) • 37 children age 3.6 to 6.6 • pictures of imaginary things, presented as neologisms • names or roots • rhymes of existing words or not • choice -en or -s • results: • children are aware that unusual sounding words require the default • children are aware that names require the default

MBL simulation • sort CELEX data according to rhyme • compare overgeneralization • to -en versus to -s • percentage of total number of errors • results: • when new words don’t rhyme more errors are made • overgeneralization to -en drops below the level of overgeneralization to -s

Discussion • Three “classes” of plurals: ((-en -)(-e -er))(s) the former 4 suffixes seem “regular”, can be accurately learned using information from phonology and gender -s is learned reasonably well but information is lacking • Hypothesis: more “features” are needed (syntactic, semantic, meta-linguistic, …) to enrich the “lexical similarity space” • No difference in accuracy and speed of learning with and without Umlaut • Overall generalization accuracy very high: 95% • Implicitly implements schema-based learning (Köpcke). *,*,*,*,i,r,M e

The Dutch plural • Two regular suffixes: -en and -s, in complementary distribution. • Some other infrequent inflectional processes • -eren (kind-eren, rund-eren) • latin (museum - musea), greek (corpus - corpora) • suppletion (brandweerman - brandweerlieden)

What is the default process in Dutch plural inflection? • The Dutch plural is problematic for the dual mechanism model • Looking at criteria for determining which inflectional process is the default, no single default can be determined • No overregularization in acquisition

The Dual-Mechanism Model for Dutch Plural Inflection

Inflecting non-canonical roots • Keuleers, Sandra, Daelemans, Gillis, Durieux, Martens (2007) Cognitive Psychology 878, 283-318. • The inflection of atypical words is only problematic for unassimilated borrowings in the Dutch plural • Adding orthographical information to similarity space • Allows MBL single mechanism model to learn these cases • Has influence on participant’s behavior on non-word inflection • Which is again modeled by the MBL single mechanism model

Methodology: what is a good model? • Can we say anything useful about the nature of psycholinguistic processes if we find that a model simulation with a particular combination of parameter values fits our data well? • We need robustness • For different parameter settings • On different tasks / empirical datasets

MBL model of Dutch plural inflection • 23040 simulations on each of three tasks • 1 Lexical reconstruction task: cross-validation on CELEX (919/18384). • Compare to attested forms • 2 Pseudoword tasks: • Produce plural for 80 pseudowords (Baayen et al., 2002) • Produce plural for 180 pseudowords (Keuleers et al., 2007) • Compare predictions to majority of participants.

MBL model of Dutch plural inflection • Each simulation has a unique combination of information sources • Phonological information: 1,2,3, or 4 syllables • Prosodic information: with or without stress information • Orthographic information: with or without final grapheme • Alignment: onset-nucleus-coda, start-peak-end, or peak-valley • Padding: empty or delta padding

MBL model of Dutch plural inflection • Each simulation has a unique combination of task and algorithm parameter values • Classification task: categorical or transformation labels • Number of distances (k) and distance metric: • k =1,3,5, or 7 with the overlap metric • k = 1,3,5… or 51 with the MVDM metric • Distance weighting method: zero or inverse distance decay • Type merging: neighbors with equal information merged or not merged.

Results • Best models • Holdout task: 97.8 % accuracy • Pseudoword task 1: 100 % accuracy • Pseudoword task 2: 89 % accuracy • Worst models • Holdout task: 48.8 % accuracy • Pseudoword task 1: 47.5 % accuracy • Pseudoword task 2: 51.1 % accuracy

Results • What are the outliers with very low accuracy ?

Results • Apart from one-syllable models, results from MBL appear to be quite robust. • Some small differences • Including stress information does not significantly improve mean accuracy. • Using more than 2 syllables does not improve mean accuracy, except on Baayen’s pseudowords.

Results • Some small differences • Including the final grapheme in the model improves mean accuracy by about 1 %. • hond (/hont/) - honden (/hond@/) • wagen (/wag@/) - wagens (/wag@s/) • Since we know (e.g. Baayen & Ernestus, 2001) that there is a difference in the final sounds of hond (/hont/) vs lont (/lont/) , this may even be justifiable. • Alignment and padding have almost no or very small effect on mean accuracy

Results • Some small differences • Transformation classes work better than categorical labels for pseudoword tasks • For the lexical reconstruction task, it’s the other way around.

Results • Observations on k, distance weighting and metric • Overlap metric performs almost as well as MVDM metric • Inverse Distance decay increases the robustness of the model • Different pattern of results in lexical reconstruction task than in pseudoword tasks

Results • Models with low k (1,3) perform well on the corpus prediction task, but very badly on the pseudoword tasks. • K=1 describes a system that generalizes on the basis of very few exemplars • K=5,7,… means that irregular classes with few types cannot influence generalization • Apparently this is what happens in the experimental tasks

Conclusions • MBL is a robust model for inflectional morphology learning if we use sufficient information to represent exemplars • Large memory size (+18000 items) may have rendered the parameters less important • K is an important parameter with potential psycholinguistic relevance • Simulations should report ranges rather than best simulation • Simulations should address more than 1 task

Additional work with TiMBL • CNTS - CPL • Dutch gender, word stress, plural • English past tense • German plural • David Eddington • Spanish stress and gender • Italian conjugation • Andrea Krott, Baayen & Schreuder • Dutch and German compound linking element • Ingo Plag et al. • English compound stress

Memory-Based Models of Inflectional Morphology Acquisition and Processing

Memory-Based Models of Inflectional Morphology Acquisition and Processing

Presentation Transcript

Memory Models

Models of Memory

The Morphology Syntax INTERFACE in inflectional languages OR Is Czech an inflectional language

A Connectionist model of English inflectional morphology

Models of Memory

PDP Models of Morphology

Acquisition and Processing of LiDAR Data

Morphemes , morpheme classification, inflectional and derivational morphology

Data Acquisition and Processing

Models of Memory

Models of memory

Models of memory

Information Processing and Memory

Acquisition of Morphology by Computer: Unsupervised learning

Paradigms Organize Inflectional Morphology

A Non-Parametric Bayesian Approach to Inflectional Morphology Jason Eisner

Incomplete Acquisition: The Burden of Aspectual Morphology

MODELS OF MEMORY

Memory Processing

Models of Memory

Models of memory

Models of Memory