110 likes | 282 Views
Morphology & Finite-State Transducers. Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-dependent ways} morpheme: small meaning bearing unit e.g., books = book+s, cats = cat + s Classes of Morphemes stem (root) affixes ( 詞綴 )
E N D
Morphology & Finite-State Transducers • Morphology: the study of constituents of words • Word = {a set of morphemes, combined in language-dependent ways} • morpheme: small meaning bearing unit • e.g., books = book+s, cats = cat + s • Classes of Morphemes • stem (root) • affixes (詞綴) • Morphological Parsing (or Analysis): • breaking down surface forms (or input forms) into stem and affixes • e.g., foxes = “fox” + “-es” (+N, +PL) • stemming: mapping surface form to stem (extracting stem from surface form) • Morphological Generation: • generate surface forms from stem and morphological features Jing-Shin Chang
Morphology & Finite-State Transducers • Applications: • spelling check, tokenization for parsing • Knowledge for Morphological Analysis • morphological rules (morphotactics): constituents of words & order • spelling rules (orthographic rules): spelling changes • Dictionary/Lexicon: • list of stems and affixes • stems of regular words (plus irregular variants) as indexing keys • not efficient to enumerate all morphological variants • some morphemes are productive: can be applied to all words or new words (impossible to list all of them) • morphological variants depends on spelling as well as pronunciation • morphologically complex languages (e.g., Turkish) may have a large number of morphological variants Jing-Shin Chang
Morphology & Finite-State Transducers • Models for morphological analysis/generation • generate-and-test: enumerate all possibilities & test against constraints • FSA / two-level FST model: modeling lexicon, morphological rules and orthographic rules as finite state automata or transducers Jing-Shin Chang
English Morphology • Morphology: • the study of the way words are built up from smaller meaning-bearing units (morphemes) • morpheme: the minimal meaning-bearing unit in a language • Classes of Morphemes • stem (root): main morpheme of the word, supplying main meaning • affixes (詞綴): add additional meanings • Affixes: • prefixes: un-happy • suffixes: eat-s • infixes: inserted inside the stem • Philipine language Tagalog: hingi (“borrow”) => h-um-ingi (agent of borrow) • circumfixes: • sagen (“to say”) => ge-sag-t (“said”) (German) [pp] Jing-Shin Chang
English Morphology • Affixes: • concatenative: prefix & suffixes • non-concatenative: infixes & templatic morphology • Templatic: root-and-pattern • Arabic, Hebrew, Semitic languages • Hebrew: lmd (“learn”, “study”) (tri-consonantal root) • active voice template: CaCaC => lamad (‘he studied’) • intensive CiCeC template: => limed (‘he taught’) • intensive passive template CuCaC => lumad (‘he was taught’) • Multiple affixes: un-believabl-y • Agglutinative languages: • languages that tends to string affixes together (Turkish, Japanese, Korean) Jing-Shin Chang
English Morphology • Infection: • stem + morphemes => same class • e.g., book + s => books (same meaning, same part of speech(詞類)) • Derivation: • stem + morphemes => different class • e.g., computerize + ation => computerization [verb => noun] Jing-Shin Chang
English Morphology • Inflectional Morphology • only Noun, Verb, Adjective, Adverb can be inflected • Noun: Plural, Possessive • Regular: Plural (+s/+es/+ies), Possessive (+’s, +s’) • Irregular: ox-en, mouse => mice • Verb (main/一般, modal/助, primary/be): • Forms: stem (現/不定), -s (現/P3SG), -ing(動名/現分), -ed (過/過分/完成) • Regular: (+s/+es,-y+ies), -e+ing/+ing/+.ing (consonant doubling), +d/+ed/+.ed • Irregular: e.g., eat => ate, eaten (+en), catch => caught • Consonant doubling: (短母音)+單子音 => double • -c => -ck (picnicked) • Adjective/Adverb: comparative/extreme • happy => happier, happiest, happily Jing-Shin Chang
English Morphology • Derivational Morphology • usually resulting in different classes • need part of speech (POS) conversion from root POS & affixes to get correct POS • Nominalization: V/A => N • computerize => computerization • more examples … • N/V => A • computation => computational • more examples … Jing-Shin Chang
Chinese Morphology • Chinese Morphemes • hard to be distinguished from characters and words and compound words • free morphemes • bound morphemes • Examples • 副-總統, 前-妻, 非-經濟(因素) • 學生-們 • 哈日-族, 銀髮-族 • 工業-化, 綠-化, 藍-化, 腐-化, 石-化, 神-化 • 公務-員, 業務-員, 推銷-員, 運動-員 Jing-Shin Chang