240 likes | 359 Views
CSA3050: NL Algorithms. Introduction to English Morphology Finite State Transducers. Acknowledgement. For further details see Jurafsky & Martin Ch.3. Morphology. Morphology is the study of how word-parts combine to form word wholes. Several different dimensions:
E N D
CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers CSA3050: NLP Algorithms
Acknowledgement For further details see Jurafsky & Martin Ch.3 CSA3050: NLP Algorithms
Morphology • Morphology is the study of how word-parts combine to form word wholes. • Several different dimensions: • Orthographic - rules for combining strings of characters together. • Syntax - effect on syntactic category. • Semantic - effect on meaning. CSA3050: NLP Algorithms
Examples ofMorphological Processes • Affixation • prefix • suffix • circumfix: German ge + stem + te.g. sagen, gesagt • infix: unbloodylikely • Vowel change: swim/swam • Consonant change: send/sent CSA3050: NLP Algorithms
Inflectional+s plural+ed past category preserving productive: always applies (esp. new words, e.g. fax) systematic: same semantic effect Derivational+ment category changingescape+ment not completely productive: detractment* not completely systematic: catchment Inflectional/DerivationalMorphology CSA3050: NLP Algorithms
English Inflectional Morphology • Applies to nouns, verbs and adjectives only • Number of inflections relatively small • Nouns • Plural, Possessive • Verbs • Verb forms • Adjectives • Comparison CSA3050: NLP Algorithms
Noun Inflections CSA3050: NLP Algorithms
Regular Verb Inflections CSA3050: NLP Algorithms
Irregular Verb Inflections CSA3050: NLP Algorithms
Morphological Parsing Output Analysis cat + PL Input Word cats Morphological Parser • Output is a string of morphemes • Reversibility? CSA3050: NLP Algorithms
Morphological Parsing: Examples CSA3050: NLP Algorithms
Morphemes • Morpheme is a theoretical contruct ... • but has a practical use • Choice of morpheme vocabulary: theoretical and practical motivation • Distinction between underlying morpheme and its realisation. • String of morphemes could be turned into another representation later CSA3050: NLP Algorithms
Morphological Parsing Requires • Lexicon: list of stems and affixes + related information (e.g syntactic category) • Morphotactics: a model of ordering constraints over morphemes (e.g. the fact that +s comes after the stem not before). • Correspondences between input and output strings • SpellingRules: city + s cities CSA3050: NLP Algorithms
Lexicon • Lexicon is generally divided into sublexicons • Stem Lexicon • Noun Stems • Verb Stems • etc • Suffix Lexicon • Prefix Lexicon • Can all be represented as FSAs CSA3050: NLP Algorithms
FSA for Sublexicon Fragment o t h e s a e i t s CSA3050: NLP Algorithms
FSA for Morphotactics forNoun Inflection CSA3050: NLP Algorithms
Morphotactics for Verb Inflection CSA3050: NLP Algorithms
Input/Output Correspondences • Problem: how to specify correspondence between input word, and output analysis. • Given: both input and output are strings. • Two level morphology (Koskenniemi 1983) proposes • Surface Tape (words) • Lexical Tape (concatenation of morphemes) CSA3050: NLP Algorithms
2 Level Model The automaton used to perform the mapping Between these levels is the finite state transducer (FST). CSA3050: NLP Algorithms
Basic FS Transducer • Each transition of a transducer is labelled with a pair of symbols • Input symbols are matched against the lower-side symbols on transitions. • If analysis succeeds, return the string of upper-side symbols output symb input symb CSA3050: NLP Algorithms
C A T +N +PL e C A T S Morphological Analysis { ("CATS", "CAT+N+PL"), ("CAT", "CAT+N+SG") } CSA3050: NLP Algorithms
FST Formal Definition • States, initial state, final states: same as FSA • Alphabets I and O are input and output alphabets, not necessarily disjoint. • FST Alphabet Σ I x O • Transition function δ(q, i:o), defines the state q' that ensues when the machine is in state q and encounters complex symbol i:o. CSA3050: NLP Algorithms
FST Alphabet Example I x O O a:c a:a a:t a:ε c a t ε c:c c:ac:t c:ε Σ I ':c ':a ':t ':ε ' t:c t:a t:t t:ε CSA3050: NLP Algorithms
Summary • Morphological processing can be handled by finite state machinery • Finite State Transducers are formally very similar to Finite State Automata. • They are formally equivalent to regular relations, i.e. sets of pairings of sentences of regular languages. CSA3050: NLP Algorithms