390 likes | 927 Views
Morphological Analysis. Chapter 3. Morphology. Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using morphemes base form ( stem,lemma ), e.g., believe affixes (suffixes, prefixes, infixes), e.g., un-, -able, - ly
E N D
Morphological Analysis Chapter 3
Morphology • Morpheme = "minimal meaning-bearing unit in a language" • Morphology handles the formation of words by using morphemes • base form (stem,lemma), e.g., believe • affixes (suffixes, prefixes, infixes), e.g., un-, -able, -ly • Morphological parsing = the task of recognizing the morphemes inside a word • e.g., hands, foxes, children • Important for many tasks • machine translation, information retrieval, etc. • Parsing, text simplification, etc
Morphemes and Words • Combine morphemes to create words • Inflection • combination of a word stem with a grammatical morpheme • same word class, e.g. clean (verb), clean-ing (verb) • Derivation • combination of a word stem with a grammatical morpheme • Yields different word class, e.g delight (verb), delight-ful (adj) • Compounding • combination of multiple word stems • Cliticization • combination of a word stem with a clitic • different words from different syntactic categories, e.g. I’ve = I + have
Inflectional Morphology • Inflectional Morphology • word stem + grammatical morpheme cat + s • only for nouns, verbs, and some adjectives • Nouns • plural: • regular: +s, +esirregular:mouse -mice;ox-oxen • many spelling rules: e.g.-y -> -ieslike: butterfly - butterflies • possessive: +'s, +' • Verbs • main verbs (sleep, eat, walk) • modal verbs (can, will, should) • primary verbs (be, have, do)
Inflectional Morphology (verbs) • Verb Inflections for: • main verbs (sleep, eat, walk); primary verbs (be, have, do) • Morpholog. FormRegularly Inflected Form • stem walk merge try map • -s form walks merges tries maps • -ing participle walking merging trying mapping • past; -ed participle walked merged tried mapped • Morph. FormIrregularly Inflected Form • stem eat catch cut • -s form eats catches cuts • -ing participle eating catching cutting • -ed past atecaughtcut • -ed participle eaten caught cut
Inflectional Morphology (nouns) • Noun Inflections for: • regular nouns (cat, hand); irregular nouns(child, ox) • Morpholog. FormRegularly Inflected Form • stem cat hand • plural form cats hands • Morph. FormIrregularly Inflected Form • stem child ox • plural form children oxen
Inflectional and Derivational Morphology (adjectives) • Adjective Inflections and Derivations: • prefix un- unhappy adjective, negation • suffix -ly happily adverb, manner • suffix -ier, -iesthappier, happiest comparatives • suffix -nesshappinessnoun • plus combinations, like unhappiest, unhappiness. • Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big.
Morpholgy and FSAs • We’d like to use the machinery provided by FSAs to capture these facts about morphology • Recognition: • Accept strings that are in the language • Reject strings that are not • In a way that doesn’t require us to in effect list all the words in the language
Computational Lexicons • Depending on the purpose, computational lexicons have various types of information • Between FrameNet and WordNet, we saw POS, word sense, subcategorization, semantic roles, and lexical semantic relations • For our purposes now, we care about stems, irregular forms, and information about affixes
Starting Simply • Let’s start simply: • Regular singular nouns listed explicitly in lexicon • Regular plural nouns have an -s on the end • Irregulars listed explicitly too
Now Plug in the Words Recognition of valid words But “foxs” isn’t right; we’ll see how to fix that
Parsing/Generation vs. Recognition • We can now run strings through these machines to recognize strings in the language • But recognition is usually not quite what we need • Often if we find some string in the language we might like to assign a structure to it (parsing) • Or we might have some structure and we want to produce a surface form for it (production/generation) • Example • From “cats” to “cat +N +PL”
Finite State Transducers • Add another tape • Add extra symbols to the transitions • On one tape we read “cats”, on the other we write “cat +N +PL”
Applications • The kind of parsing we’re talking about is normally called morphological analysis • It can either be • An important stand-alone component of many applications (spelling correction, information retrieval) • Or simply a link in a chain of further linguistic analysis
Transitions +N: ε +PL:s c:c a:a t:t • c:c means read a c on one tape and write a c on the other • +N:ε means read a +N symbol on one tape and write nothing on the other • +PL:s means read +PL and write an s
Ambiguity • Recall that in non-deterministic recognition multiple paths through a machine may lead to an accept state. • Didn’t matter which path was actually traversed • In FSTs the path to an accept state does matter since different paths represent different parses and different outputs will result
Ambiguity • What’s the right parse (segmentation) for • Unionizable • Union-ize-able • Un-ion-ize-able • Each represents a valid path through the morphology machine.
Ambiguity • There are a number of ways to deal with this problem • Simply take the first output found • Find all the possible outputs (all paths) and return them all (without choosing) • Bias the search so that only one or a few likely paths are explored
The Gory Details • Of course, its not as easy as • “cat +N +PL” <-> “cats” • As we saw earlier there are geese, mice and oxen • But there are also a whole host of spelling/pronunciation changes that go along with inflectional changes • Fox and Foxes vs. Cat and Cats
Multi-Tape Machines • To deal with these complications, we will add more tapes and use the output of one tape machine as the input to the next • So to handle irregular spelling changes we’ll add intermediate tapes with intermediate symbols
Multi-Level Tape Machines # • We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape
Intermediate to Surface • The add an “e” rule as in fox^s# --> foxes#
Foxes This arrow should point straight down #
Notes • The transducers may be run in the other direction too (examples in lecture) • The transducers are cascaded: The output of one layer serves as the input to the next
Overall Scheme We aren’t covering the overall scheme in any more detail than this #