550 likes | 916 Views
COMP 4060 Natural Language Processing. Morphology, Word Classes, POS Tagging. Overview . Morphology Stemming Word Classes POS Tagging (Jurafsky, 2 nd edition, Ch. 2, 3, 5; Allen Ch. 2,3). Morphology. Morphemes and Words. Morpheme = "minimal meaning-bearing unit in a language"
E N D
COMP 4060 Natural Language Processing Morphology, Word Classes, POS Tagging Morphology
Overview • Morphology • Stemming • Word Classes • POS Tagging • (Jurafsky, 2nd edition, Ch. 2, 3, 5; Allen Ch. 2,3) Morphology
Morphology Morphology
Morphemes and Words • Morpheme = "minimal meaning-bearing unit in a language" • Combine morphemes to create words • Inflection • combination of a word stem with a grammatical morpheme • same word class, e.g. clean (verb), clean-ing (verb) • Derivation • combination of a word stem with a grammatical morpheme • Yields different word class, e.g. clean (verb), clean-ing (noun) • Compounding • combination of multiple word stems • Cliticization • combination of a word stem with a clitic • different words from different syntactic categories, e.g. I’ve = I + have Morphology
Inflectional Morphology Inflectional Morphology word stem + grammatical morpheme cat + s only for nouns, verbs, and some adjectives • Nouns • plural: regular: +s, +es irregular:mouse -mice;ox-oxen rules for exceptions: e.g.-y -> -ies like: butterfly - butterflies • possessive: +'s, +' • Verbs • main verbs (sleep, eat, walk) • modal verbs (can, will, should) • primary verbs (be, have, do) Morphology
Inflectional Morphology (verbs) Verb Inflections only for: main verbs (sleep, eat, walk); primary verbs (be, have, do) Morpholog. FormRegularly Inflected Form • stem walk merge try map • -s form walks merges tries maps • -ing participle walking merging trying mapping • past; -ed participle walked merged tried mapped Morph. FormIrregularly Inflected Form • stem eat catch cut • -s form eats catches cuts • -ing participle eating catching cutting • -ed past atecaughtcut • -ed participle eaten caught cut Morphology
Inflectional and Derivational Morphology (adjectives) Adjective Inflections and Derivations: • prefix un- unhappy adjective, negation • suffix -ly happily adverb, mode -er happier adjective, comparative 1 -est happiest adjective, comparative 2 • suffix -ness happinessnoun plus combinations, like unhappiest, unhappiness. Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big. Morphology
Inflectional Morphology Morphology
Noun Inflections Morphology
Verb Inflections Morphology
Derivational Morphology Morphology
Noun Derivation Morphology
Adjective Derivation Morphology
Clitics Morphology
Verb Clitics Morphology
Methods, Algorithms Morphology
Stemming • Stemming algorithms strip off word affixes • yield stem only, no additional information (like plural, 3rd person etc.) • used, e.g. in web search engines • famous stemming algorithm: the Porter stemmer Morphology
Stemming Methods • Rule-based stemming • Example rules: • ATIONAL→ ATE e.g., relational→ relate • ING→ if stem contains vowel, e.g., motoring→ motor Morphology
Stemming Problems Morphology
Tokenization, Word Segmentation • Tokenization or word segmentation • separate out “words” (lexical entries) from running text • expand abbreviated terms • E.g. I’m into I am, it’s into it is • collect tokens forming single lexical entry • E.g. New York marked as one single entry Morphology
Tokenization, Word Segmentation • Finite state transducer (FST) • Modifies input string (rules) • Recognizes (stored) abbreviations and composite words • See Fig.3.22 in Jurafsky, Ch.3 • More of an issue in languages like Chinese Morphology
Lemmatization • Lemmatization maps words with same root but different surface appearances onto the same lexeme • e.g. buys, bought, buying -> buy Morphology
Morphological Processing Morphology
Word Reccognition • Spelling Errors • Mark non-words based on dictionary/lexicon • Use “minimum editing distance” • Dynamic programming • Table-based • Transform operations • deletion, substitution, insertion • Calculate minimum path • Morphological Parser = FST Morphology
Morphological Processing • Knowledge • lexical entry: stem plus possible prefixes, suffixes plus word classes, e.g. endings for verb forms (see tables above) • rules: how to combine stem and affixes, e.g. add s to form plural of noun as in dogs • orthographic rules: spelling, e.g. double consonant as in mapping • Processing: Finite State Transducers • take information above and analyze word token / generate word form Morphology
Fig. 3.3 FSA for verb inflection. Morphology
Fig. 3.4 Simple FSA for adjective inflection. Fig. 3.5 More detailed FSA for adjective inflection. Morphology
Fig. 3.7 Compiled FSA for noun inflection. Morphology
Fig. 3.12 Lexical and intermediate tape of a FS Transducer Fig. 3.13 Lexical, intermediate, and surface tape after spelling transformation. Morphology
Word Classes and POS Tagging Morphology
Word Classes Sort words into categories according to: • morphological properties Which types of morphological forms do they take? e.g. form plural: noun+s; 3rd person: verb+s • distributional properties What other words or phrases can occur nearby? e.g. possessive pronoun before noun • semantic coherence Classify according to similar semantic type. e.g. nouns refer to object-like entities Morphology
Open vs. Closed Word Classes Open Class Types The set of words in these classes can change over time, with the development of the language, e.g. spaghetti and download Open Class Types: nouns, verbs, adjectives, adverbs Morphology
Open vs. Closed Word Classes Closed Class Types The set of words in these classes are very much determined and hardly ever change for one language. Closed Class Types: prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals Morphology
Open Class Words: Nouns Nouns denote objects, concepts, entities, events Proper Nouns Names for specific individual objects, entities e.g. the Eiffel Tower, Dr. Kemke Common Nouns Names for categories, classes, abstracts, events e.g. fruit, banana, table, freedom, sleep, race, ... Count Nouns enumerable entities, e.g. two bananas Mass Nouns not countable items, e.g. water, salt, freedom Morphology
Open Class Words: Verbs Verbs denote actions, processes, and states,e.g. smoke, dream, rest, run several morphological forms,e.g. non-3rd person - eat, sleep 3rd person - eats, sleeps, progressive/ - eating,sleeping present participle/ gerundive past participle - eaten, slept simple past - ate, slept Morphology
Open Class Words: Verbs (2) non-3rd person eatI eat. We eat. They eat. 3rd personeats He eats. She eats. It eats. progressive eating He is eating. He will be eating. He has been eating. e.g. present participleHe is eating. gerundiveEating scorpions [NP] is common in China. use as adjectiveEating children [NP] are common at McDonalds. past participleeaten He has eaten the scorpion. The scorpion was eaten. simple past ate He ate the scorpion. Morphology
Verb Forms 1 - The five verb forms Fig.2.6. The five verb forms. (Allen, 1995, p.28) Morphology
Verb Forms 2 - The basic tenses Fig.2.7. The basic tenses. (Allen, 1995, p.29) Morphology
Verb Forms 3 - The progressive tenses Fig.2.8. The progressive tenses. (Allen, 1995, p.29) Morphology
Verb Tense Chart. From: http://www.athabascau.ca/courses/engl/155/support/verb_tenses.htm
Open Class Words: Adjectives Adjectives denote qualities or properties of objects e.g. heavy, blue, content most languages have concepts for colour - white, green, ... age - young, old, ... value - good, bad, ... not all languages have adjectives as separate class Morphology
Open Class Words: Adverbs 1 Adverbs denote modifications of actions (verbs) or qualities (adjectives) e.g. walk slowlyorheavily drunk Directionalor Locational adverbs specify direction or location e.g. go home, stay here Morphology
Open Class Words: Adverbs 2 Degree Adverbs specify extent of process, action, property e.g. extremely slow, very modest Manner Adverbs specify manner of action or process e.g. walk slowly, run fast Temporal Adverbs specify time of event or action e.g. yesterday, Monday Morphology
Closed Word Classes Closed Class Types: Prepositions: on, under, over, at, from, to, with, ... Determiners: a, an, the, ... Pronouns: he, she, it, his, her, who, I, ... Conjunctions: and, or, as, if, when, ... Auxiliary verbs: can, may, should, are, … Particles: up, down, on, off, in, out, … Numerals:one, two, three, ..., first, second, ... Morphology
Closed Word Class: Prepositions Prepositions occur before noun phrases; describe relations; often spatial or temporal relations e.g. on the table spatial in two hours temporal Morphology
Closed Word Class: Pronouns Pronouns reference to entities, events, relations etc. Personal Pronouns refer to persons or entities, e.g. you, he, it, ... Possessive Pronouns possession or relation between person and object, e.g. his, her, my, its, ... Wh-Pronouns reference in question or back reference, e.g. Who did this ..., Frieda, who is 80 years old ... Morphology
Closed Word Class: Conjunctions Conjunctions join phrases or sentences; semantics is varied and complex Coordinating Conjunction Join two phrases or sentences on the same level through conjunctions like and, or, but, ... e.g. He takes a cat and a dog. He takes a dog and she takes a cat. Subordinating Conjunction Connect embedded phrases through e.g. that e.g. He thinks that the cat is nicer than the dog. Morphology
Closed Word Class: Auxiliary Verbs Auxiliary Verbs Mark semantic features of main verb. Often describe tense and modality aspects. Semantics is difficult. Tense addition expressing present, past or future, ... e.g. He will take the cat home. Aspect addition expressing completion of action e.g. He is taking the cat home. (incomplete) Mood addition expressing necessityof action e.g. He can take the cat home. (possible) Morphology
Closed Word Class: Copula, Modal Verbs Copula(be, do, have)andModal Verbs(can, should, ...) are subclasses of Auxiliary Verbs. Describe state, process, or tense / modality of action. Semantics: difficult (e.g. modal logic) State / Process: be and do e.g. He is at home. He does nothing. Tense: have e.g. He has taken the cat home. Modality: can, ought to, should, must e.g. He can take the cat home. (possibility) Morphology
Tagsets and POS Tagging Morphology