90 likes | 284 Views
Word morphology. Teaching computers to read. Research papers. Book cites: “Viewing Morphology as an Inference Process” by Krovetz , SIGIR 1993 This is cited by: “ Guessing Morphology from Terms and Corpora” by Jacquemin , SIGIR 1997 When are different words the same word?.
E N D
Word morphology Teaching computers to read
Research papers • Book cites: “Viewing Morphology as an Inference Process” by Krovetz, SIGIR 1993 • This is cited by: “Guessing Morphology from Terms and Corpora” by Jacquemin, SIGIR 1997 • When are different words the same word?
Porter stemming Multi-step process to remove word suffixes • Stem • Stems • Stemmed • Stemming • -ology • -ize • -ship
Stemming problems Derivation - Meaning • Doe • Donut • Paste • Pastafarian Inflection – Syntax • Do • Doing • Done • Past (n) • Past (v)
inflectional stemming Afflictional suffixes are safe to remove… usually • Plural: s, es, ies 57% • Tense: ed 22% • Aspect: ing 21%
Derivational stemming Words that change meaning if they are stemmed. • Appreciate v Appreciation • 2/3rds of derivational variants appear in the dictionary • Krovetz’s solution is to leave dictionary words alone
Inferring morphology • Jacquemin asserts morphology can be derived from the corpus • Word truncation • Multi-word term conflation • Classification & filtering • Clustering
Statistical weighting • Different segments of terms are given different statistical weights
Word classification • Jacquemin’s algorithm allows error in conflation • Errors are filtered statistically • Rare and domain-specific terms are conflated • Gene rearrangement / Genetic rearrangement • Artificial ventilation / Artificially ventilated • North Africa / Northern Africa • Cirrhosis / Cirrhosia • Pulsating flow / pulsatile flow • The algorithm acts like a snap-to-grid for text