260 likes | 488 Views
Morphology: Words and their Parts. CS 4705. Basic Uses of Morphology. The study of how words are composed from smaller, meaning-bearing units ( morphemes ) Applications: Spelling correction: referece Hyphenation algorithms: refer-ence Part-of-speech analysis: googler
E N D
Morphology: Wordsand their Parts CS 4705 CS 4705
Basic Uses of Morphology • The study of how words are composed from smaller, meaning-bearing units (morphemes) • Applications: • Spelling correction: referece • Hyphenation algorithms: refer-ence • Part-of-speech analysis: googler • Text-to-speech: grapheme-to-phoneme conversion • hothouse (/T/ or /D/)
Speech recognition: phoneme-to-grapheme conversion • Amusing poetry and artificial languages in standardized tests • ‘Twas brillig and the slithy toves… • Muggles moogled migwiches
What is a word? • In formal languages, words are arbitrary strings • In natural languages, words are made up of meaningful subunits called morphemes • Allows for productivity: googled, texted • Abstract concepts denoting entities or relationships in the world • Roots + • Syntactic or grammatical elements • Realizations of morphemes: morphs • Door realizes door; take and took realize take
Allomorphs are classes of related morphs that realize a given morpheme • Allomorphs of s include en, men, es in English • Take and took are allomorphs of take • Sum: Morpheme [s] is realized by an allomorph class that includes the related morphs {en,men,es} • Syntactic or grammatical morphemes can convey many things • In Italian, mark nouns for gender and number Singular Plural Masc pomodoro pomodori Fem cipolla cipolle pomodor- cipoll-: stems, may or may not occur on their own as words • Stem may not occur as a word: derivative/deriv • Base form (lemma) occurs as word: derivative/derive • Sometimes the same: cars has stem ‘car’ and base form or lemma ‘car’ too
What useful information does morphology give us? • Different things in different languages • Spanish: hablo, hablaré/ English: I speak, I will speak • English: book, books/ Japanese: hon, hon • Languages differ in how they encode morphological information • Isolating languages (e.g. Cantonese) have no affixes: each word usually has 1 morpheme • Agglutinative languages (e.g. Finnish, Turkish) are composed of prefixes and suffixes added to a stem (like beads on a string) – each feature realized by a single affix, e.g. Finnish
epäjärjestelmällistyttämättömyydellänsäkäänköhän ‘Wonder if he can also ... with his capability of not causing things to be unsystematic’ • Inflectional languages (e.g. English) merge different features into a single affix (e.g. ‘s’ in likes indicates both person and tense); and the same feature can be realized by different affixes • Polysynthetic languages (e.g. Inuit languages) express much of their syntax in their morphology, incorporating a verb’s arguments into the verb, e.g. Western Greenlandic Aliikusersuillammassuaanerartassagaluarpaalli.aliiku-sersu-i-llammas-sua-a-nerar-ta-ssa-galuar-paal-lientertainment-provide-SEMITRANS-one.good.at-COP-say.that-REP-FUT-sure.but-3.PL.SUBJ/3SG.OBJ-but'However, they will say that he is a great entertainer, but ...' • So….different languages may require very different morphological analyzers
Morphology Can Help Define Word Classes • AKA morphological classes, parts-of-speech • Closed vs. open (function vs. content) class words • Pronoun, preposition, conjunction, determiner,… • Noun, verb, adverb, adjective,… • Identifying word classes is useful for almost any task in NLP, from translation to speech recognition to topic detection…very basic semantics
(English) Inflectional Morphology Word stem + grammatical morpheme different forms of same word • Usually produces word of same class • Usually serves a syntactic or grammatical function (e.g. agreement) like likes or liked bird birds • Nominal morphology • Plural forms • s or es • Irregular forms (goose/geese)
Mass vs. count nouns (fish/fish(es), email or emails?) • Possessives (cat’s, cats’) • Verbal inflection • Main verbs (sleep, like, fear) relatively regular • -s, ing, ed • And productive: emailed, instant-messaged, faxed, homered • But some are not: • eat/ate/eaten, catch/caught/caught • Primary (be, have, do) and modal verbs (can, will, must) often irregular and not productive • Be: am/is/are/were/was/been/being • Irregular verbs few (~250) but frequently occurring
Particles occur in only one form: in English • Prepositions: to, from • Adverbs: happily, quickly • Conjunctions: but, and • Articles: the, a, an • Japanese? • So….English inflectional morphology is fairly easy to model….with some special cases...
Derivational Morphology • Word stem + syntactic/grammaticalmorpheme new words • Usually produces word ofdifferent class • Incomplete process: derivational morphs cannot be applied to just any member of a class • Verbs --> nouns • -ize verbs -ation nouns • generalize, realize generalization, realization • synthesize but no synthesization
Verbs, nouns adjectives • embrace, pity embraceable, pitiable • care, wit careless, witless • Adjective adverb • happy happily • Process selective in unpredictable ways • Less productive: nerveless/*evidence-less, malleable/*sleep-able, rar-ity/*rareness • Meanings of derived terms harder to predict by rule • clueless, careless, nerveless, sleepless
Derivation can be applied recursively: • Hospital hospitalize hospitalization prehospitalization … • Morphological analysisidentifies concatenative processes as well as morphemes [pre[[[hospital]ize]ation]] • But there are bracketing paradoxes unhappier [un[happier]: not happier [[unhappy]er]: more unhappy
Compounding • Two base forms join to form a new word • Bedtime, Weinerschnitzel, Rotwein • Careful? Compound or derivation?
Affixes can be attached to stems in different ways • Prefixation • Immaterial • Suffixation: more common across languages than prefixation • Trying • Circumfixation: combine prefixation and suffixation • Gesagt
Infixation • English: Absobl**dylutely • Bontoc: ‘um’ turns adjectives and nouns into verbs (kilad (red) kumilad (to be red))
Concatenative vs. Non-concatenative Morphology • Semitic root-and-pattern morphology • Root (2-4 consonants) conveys basic semantics (e.g. Arabic /ktb/) • Vowel pattern conveys voice and aspect • Derivational template (binyan) identifies word class
Template Vowel Pattern active passive CVCVC katabkutib write CVCCVC kattabkuttib cause to write CVVCVC ka:tab ku:tib correspond tVCVVCVC taka:tab tuku:tib write each other nCVVCVC nka:tab nku:tib subscribe CtVCVC ktatab ktutib write stVCCVC staktab stuktib dictate
Morphotactics • What are the ‘rules’ for constructing a word in a given language? • Pseudo-intellectual vs. *intellectual-pseudo • Rational-ize vs *ize-rational • Cretin-ous vs. *cretin-ly vs. *cretin-acious • Possible ‘rules’ • Suffixes are suffixes and prefixes are prefixes • Certain affixes attach to certain types of stems (nouns, verbs, etc.) • Certain stems can/cannot take certain affixes
Semantics: In English, un- cannot attach to adjectives that already have a negative connotation: • Unhappy vs. *unsad • Unhealthy vs. *unsick • Unclean vs. *undirty • Phonology: In English, -er cannot attach to words of more than two syllables • great, greater • Happy, happier • Competent, *competenter • Elegant, *eleganter • Unruly, ?unrulier
Morphological Parsing • These regularities enable us to create software to parse words into their component parts • Known words and new ones (e.g. Pneumonoultramicroscopicsilicovolcanoconiosis, Columbianize, Columbianization)
Morphological Representations: Evidence from Human Performance • Hypotheses: • Full listing hypothesis: words listed • Minimum redundancy hypothesis: morphemes listed • Experimental evidence: • Priming experiments (Does seeing/hearing one word facilitate recognition of another?) suggest neither • Regularly inflected forms (e.g. cars) prime stem (car) but not derived forms (e.g. management, manage)
But spoken derived words can prime stems if they are semantically close (e.g. government/govern but not department/depart) • Speech errors suggest affixes must be represented separately in the mental lexicon • ‘easy enoughly’ for ‘easily enough’
Summing Up • Different languages have different morphological systems • If we can discover how to decode such a system, we can identify useful information about the word class and the semantic meaning of a word • Morphological regularities provide basis for building (automatic) morphological analyzers • Next time: Read Ch 3.2-3.6 • HW1 will be assigned (check the course syllabus and courseworks)
Announcements • HW1 will now be due 9/25/07 • WICS lunch tomorrow at noon in the CS Lounge, 452 MUDD (rsvp to hila@cs.columbia.edu)