510 likes | 690 Views
CMSC 723 / LING 645: Intro to Computational Linguistics. September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee. More about FSAs. Transducers Equivalence of DFSAs and NFSAs
E N D
CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. DorrDr. Christof MonzTA: Adam Lee
More about FSAs • Transducers • Equivalence of DFSAs and NFSAs • Recognition as search: depth-first, breadth-search
should be q2 Breadth-first Recognition of “baaa!”
Regular languages • Regular languages are characterized by FSAs • For every NFSA, there is an equivalent DFSA. • Regular languages are closed under concatenation, Kleene closure, union.
Morphology • Definitions and Problems • What is Morphology? • Topology of Morphologies • Approaches to Computational Morphology • Lexicons and Rules • Computational Morphology Approaches
Morphology • The study of the way words are built up from smaller meaning units called Morphemes • Abstract versus RealizedHOP +PAST hop +ed hopped /hapt/
Phonology and Morphology • Phonology vs. Orthography • Historical spelling • night, nite • attention, mission, fish • Script Limitations • Spoken English has 14 vowels • heed hid hayed head had hoed hood who’d hide how’d taught Tut toy enough • English Alphabet has 5 • Use vowel combinatios: far fair fare • Consonantal doubling (hopping vs. hoping)
conj prep noun article plural poss Syntax and Morphology • Phrase-level agreement • Subject-Verb • John studies hard (STUDY+3SG) • Noun-Adjective • Las vacas hermosas • Sub-word phrasal structures • שבספרינו • ש+ב+ספר+ים+נו • That+in+book+PL+Poss:1PL • Which are in our books
Topology of Morphologies • Concatenative vs. Templatic • Derivational vs. Inflectional • Regular vs. Irregular
Concatenative Morphology • Morpheme+Morpheme+Morpheme+… • Stems: also called lemma, base form, root, lexeme • hope+ing hoping hop hopping • Affixes • Prefixes: Antidisestablishmentarianism • Suffixes: Antidisestablishmentarianism • Infixes: hingi (borrow) – humingi (borrower) in Tagalog • Circumfixes: sagen (say) – gesagt (said) in German • Agglutinative Languages • uygarlaştıramadıklarımızdanmışsınızcasına • uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casına • Behaving as if you are among those whom we could not cause to become civilized
Templatic Morphology • Roots and Patterns ب ت ك ב ת כ K T B ? و ? ? مَ ? ו ? ? כתוב مكتوب maktuubwritten ktuuvwritten
Templatic Morphology: Root Meaning • KTB: writing “stuff” كتاب book write كتب כתב כתיב spelling مكتبة library letter מכתב مكتوب כתובת address مكتب office writer كاتب כתב
Derivational vs. Inflectional • Word Classes • Parts of speech: noun, verb, adjectives, etc. • Word class dictates how a word combines with morphemes to form new words
Derivational morphology • Nominalization: computerization, appointee, killer, fuzziness • Formation of adjectives: computational, clueless, embraceable • CatVar: Categorial Variation Database http://clipdemos.umiacs.umd.edu/catvar/
Inflectional morphology • Adds: Tense, number, person, mood, aspect • Word class doesn’t change • Word serves new grammatical role • Five verb forms in English • Other languages have (lots more)
Nouns and Verbs (in English) • Nouns have simple inflectional morphology • cat • cat+s, cat+’s • Verbs have more complex morphology
Regulars and Irregulars • Nouns • Cat/Cats • Mouse/Mice, Ox, Oxen, Goose, Geese • Verbs • Walk/Walked • Go/Went, Fly/Flew
Computational Morphology • Finite State Morphology • Finite State Transducers (FST) • Input/Output • Analysis/Generation
Computational Morphology WORD STEM (+FEATURES)* • cats cat +N +PL • cat cat +N +SG • cities city +N +PL • geese goose +N +PL • ducks (duck +N +PL) or (duck +V +3SG) • merging merge +V +PRES-PART • caught (catch +V +PAST-PART) or (catch +V +PAST)
Building a Morphological Parser • The Rules and the Lexicon • General versus Specific • Regular versus Irregular • Accuracy, speed, space • The Morphology of a language • Approaches • Lexicon only • Lexicon and Rules • Finite-state Automata • Finite-state Transducers • Rules only
Lexicon-only Morphology • The lexicon lists all surface level and lexical level pairs • No rules …? • Analysis/Generation is easy • Very large for English • What about Arabic or Turkish? • Chinese? acclaim acclaim $N$ acclaim acclaim $V+0$ acclaimed acclaim $V+ed$ acclaimed acclaim $V+en$ acclaiming acclaim $V+ing$ acclaims acclaim $N+s$ acclaims acclaim $V+s$ acclamation acclamation $N$ acclamations acclamation $N+s$ acclimate acclimate $V+0$ acclimated acclimate $V+ed$ acclimated acclimate $V+en$ acclimates acclimate $V+s$ acclimating acclimate $V+ing$
Building a Morphological Parser • The Rules and the Lexicon • General versus Specific • Regular versus Irregular • Accuracy, speed, space • The Morphology of a language • Approaches • Lexicon only • Lexicon and Rules • Finite-state Automata • Finite-state Transducers • Rules only
Lexicon and Rules:FSA Inflectional Noun Morphology • English Noun Lexicon • English Noun Rule
Using FSAs for Recognition: English Nouns and their Inflection
Morphological Parsing • Finite-state automata (FSA) • Recognizer • One-level morphology • Finite-state transducers (FST) • Two-level morphology • PC-Kimmo (Koskenniemi 83) • input-output pair
Terminology for PC-Kimmo • Upper = lexical tape • Lower = surface tape • Characters correspond to pairs, written a:b • If “a:a”, write “a” for shorthand • Two-level lexical entries • # = word boundary • ^ = morpheme boundary • Other = “any feasible pair that is not in this transducer” • Final states indicated with “:” and non-final states indicated with “.”
Four-Fold View of FSTs • As a recognizer • As a generator • As a translator • As a set relater
Chomsky and Halle Notation x s z ^ __ s # ε → e /
Sample Run KIMMO DEMO
FSTs and ambiguity • Parse Example 1: unionizable • union +ize +able • un+ ion +ize +able • Parse Example 2: assess • assessv • assN +essN • Parse Example 3: tender • tenderAJ • tenNum+dAJ+erCMP
What to do about Global Ambiguity? • Accept first successful structure • Run parser through all possible paths • Bias the search in some manner
Computational Morphology • The Rules and the Lexicon • General versus Specific • Regular versus Irregular • Accuracy, speed, space • The Morphology of a language • Approaches • Lexicon only • Lexicon and Rules • Finite-state Automata • Finite-state Transducers • Rules only
Computational Morphology • The Rules and the Lexicon • General versus Specific • Regular versus Irregular • Accuracy, speed, space • The Morphology of a language • Approaches • Lexicon only • Lexicon and Rules • Finite-state Automata • Finite-state Transducers • Rules only (next time!!)