1 / 41

A memory-based learning-plus-inference approach to morphological analysis

This paper discusses a memory-based learning + inference approach to morphological analysis, demonstrating its application in natural language processing tasks such as segmentation, dependency parsing, and named entity recognition. It also presents experiments conducted on English and Dutch languages.

Download Presentation

A memory-based learning-plus-inference approach to morphological analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A memory-based learning-plus-inference approach to morphological analysis Antal van den Bosch With Walter Daelemans, Ton Weijters, Erwin Marsi, Abdelhadi Soudi, and Sander Canisius ILK / Language and Information Sciences Dept. Tilburg University, The Netherlands FLaVoR Workshop, 17 November 2006, Leuven

  2. Learning plus inference • Paradigmatic solution to natural language processing tasks • Decomposition: • The disambiguation of local, elemental ambiguities in context • A holistic, global coordination of local decisions over the entire sequence

  3. Learning plus inference • Example: grapheme-phoneme conversion • Local decisions • The mapping of a vowel letter in context to a vowel phoneme with primary stress • Global coordination • Making sure that there is only one primary stress

  4. Learning plus inference • Example: dependency parsing • Local decisions • The relation between a noun and a verb is of the “subject” type • Global coordination • The verb only has one subject relation

  5. Learning plus inference • Example: named entity recognition • Local decisions • A name that can be a location or a person, is a location in this context • Global coordination • Everywhere in the text this name always refers to the location

  6. Learning plus inference • Local decision making by learning • All NLP decisions can be recast as classification tasks • (Daelemans, 1996: segmentation or identification) • Global coordination by inference • Given local proposals that may conflict, find the best overall solution • (e.g. minimizing conflict, or adhering to language model) • Collins and colleagues; Manning and Klein and colleagues; Dan Roth & colleagues; Marquez and Carreras; etc.

  7. L+I and morphology • Segmentation boundaries, spelling changes, and PoS tagging recast as classification • Global inference checks for • Noun stem followed by noun inflection • Infix in a noun-noun compound is surrounded by two nouns • Etc.

  8. Talk overview • English morphological segmentation • Easy learning • Inference not really needed • Dutch morphological analysis • Learning operations rather than simple decisions • Reasonably complex inference • Arabic morphological analysis • Learning as an attempt at lowering the massive ambiguity • Inference as an attempt to separate the chaff from the grain

  9. English segmentation • (Van den Bosch, Daelemans, Weijters, NeMLaP 1996) • Morphological segmentation as classification • Versus traditional approach: • E.g. Mitalk’s DECOMP, analysing scarcity: • First analysis: scar|city - both stems found in morpheme lexicon, and validated as a possible analysis • Second analysis: scarc|ity - stem scarce found due to application of e-deletion rule; suffix -ity found; validated as a possible analysis • Cost-based heuristic prefers stem|derivation over stem|stem • Ingredients: morpheme lexicons, finite state analysis validator, spelling changing rules, cost heuristics • Validator, rules, and cost heuristics are costly knowledge-based resources

  10. English segmentation • Segmentations as local decisions • To segment or not to segment • If segment, identify start (or end) of • Stem • Affixes • Inflectional morpheme

  11. English segmentation • Three tasks: given a letter in context, is it the start of • a segment or not • a derivational morpheme (stem or affix), inflection, or not • a stem, a stress-affecting affix, a stress-neutral affix, an inflection, or not

  12. English segmentation

  13. Local classification • Memory-based learning • k-nearest neighbor classification • (Daelemans & Van den Bosch, 2005) • E.g. instance # 9 • m a l i t i e ? • Nearest neighbors: a lot of evidence for “2”: Instance distance clones m a l i t i e 2 0 2x t a l i t i e 2 1 3x u a l i t i e 2 1 2x i a l i t i e 2 1 11x g a l i t i e 2 1 2x n a l i t i e 2 1 7x r a l i t i e 2 1 5x c a l i t i e 2 1 7x p a l i t i e 2 1 2x h a l i t i c s 2 1x …

  14. Memory-based learning • Similarity function: • X and Y are instances • n is the number of features • xi is the value of the ith feature of X • wiis the weight of the ith feature

  15. Similarity function components

  16. Generalizing lexicon • A memory-based morphological analyzer is • A lexicon: 100% accurate reconstruction of all examples in training material • At the same time, capable of processing unseen words • In essence, unseen words are the only problem remaining • CELEX Dutch has +300k words; average coverage of text is 90%-95% • Evaluation should focus solely on unseen words • So, a held-out test from CELEX is fairly representative of unseen words

  17. Experiments • CELEX English • 65,558 segmented words • 573,544 instances • 10-fold cross-validation • Measuring accuracy • M1: 88.0% correct test words • M2: 85.6% correct test words • M3: 82.4% correct test words

  18. Add inference • (Van den Bosch and Canisius, SIGPHON 2006) • Original approach: only learning • Now: inference • Constraint satisfaction inference • Based on Van den Bosch and Daelemans (CoNLL 2005) trigram prediction

  19. Constraint satisfaction inference • Predict trigrams, and use them as complete as possible • Formulate the inference procedure as a constraint satisfaction problem • Constraint satisfaction • Assigning values to a number of variables while satisfying certain predefined constraints • Constraint satisfaction for inference • Each token maps to a variable, the domain of which corresponds to the three candidate labels • Constraints are derived from the predicted trigrams

  20. Constraint satisfaction inference Trigram constraints h,a,n → h,{,n a,n,d → {,n,t Bigram constraints h,a → h,{ h,a → h,{ a,n → {,n a,n → {,n n,d → n,t n,d → n,d Unigram constraints h → h h → h a → { a → { a → { n → n n → n n → n d → t d → d input output h _ h { (1) a h { n (2) n { n t (3) n d d _ (4)

  21. Constraint satisfaction inference Trigram constraints h,a,n → h,{,n a,n,d → {,n,t Bigram constraints h,a → h,{ h,a → h,{ a,n → {,n a,n → {,n n,d → n,t n,d → n,d Unigram constraints h → h h → h a → { a → { a → { n → n n → n n → n d → t d → d input output h _ h { (1) a h { n (2) n { n t (3) n d d _ (4)

  22. Constraint satisfaction inference Trigram constraints h,a,n → h,{,n a,n,d → {,n,t Bigram constraints h,a → h,{ h,a → h,{ a,n → {,n a,n → {,n n,d → n,t n,d → n,d Unigram constraints h → h h → h a → { a → { a → { n → n n → n n → n d → t d → d input output h _ h { (1) a h { n (2) n { n t (3) n d d _ (4)

  23. Constraint satisfaction inference Trigram constraints h,a,n → h,{,n a,n,d → {,n,t Bigram constraints h,a → h,{ h,a → h,{ a,n → {,n a,n → {,n n,d → n,t n,d → n,d Unigram constraints h → h h → h a → { a → { a → { n → n n → n n → n d → t d → d input output h _ h { (1) a h { n (2) n { n t (3) n d d _ (4)

  24. Constraint satisfaction inference Trigram constraints h,a,n → h,{,n a,n,d → {,n,t Bigram constraints h,a → h,{ h,a → h,{ a,n → {,n a,n → {,n n,d → n,t n,d → n,d Unigram constraints h → h h → h a → { a → { a → { n → n n → n n → n d → t d → d input output h _ h { (1) a h { n (2) n { n t (3) n d d _ (4) Conflicting constraints

  25. Weighted constraint satisfaction • Extension of constraint satisfaction to deal with overconstrainedness • Each constraint has a weight associated to it • Optimal solution assigns those values to the variables that optimise the sum of weights of the constraints that are satisfied • For constrained satisfaction inference, a constraint's weight should reflect the classifier's confidence in its correctness

  26. Example instances Left focus right uni tri _ _ _ _ _ a b n o r m 2 -20 _ _ _ _ a b n o r m a 0 20s _ _ _ a b n o r m a l s 0s0 _ _ a b n o r m a l i 0 s00 _ a b n o r m a l i t 0 000 a b n o r m a l i t i 0 000 b n o r m a l i t i e 0 000 n o r m a l i t i e s 0 001 o r m a l i t i e s _ 1 010 r m a l i t i e s _ _ 0 100 m a l i t i e s _ _ _ 0 000 a l i t i e s _ _ _ _ 0 00i l i t i e s _ _ _ _ _ i 0i-

  27. Results • Only learning: • M3: 82.4% correct unseen words • Learning + CSI: • M3: 85.4% correct unseen words • Mild effect.

  28. Dutch morphological analysis • (Van den Bosch & Daelemans, 1999; Van den Bosch & Canisius, 2006) • Task expanded to • Spelling changes • Part-of-speech tagging • Analysis generation • Dutch is mildly productive • Compounding • A bit more inflection than in English • Infixes, diminutives, …

  29. Dutch morphological analysis Left focus right uni tri _ _ _ _ _ a b n o r m A -A0 _ _ _ _ a b n o r m a 0 A00 _ _ _ a b n o r m a l 0 000 _ _ a b n o r m a l i 0 000 _ a b n o r m a l i t 0 000 a b n o r m a l i t e 0 000 b n o r m a l i t e i 0 00+Da n o r m a l i t e i t +Da 0+DaA_->N o r m a l i t e i t e A_->N +DaA_->N0 r m a l i t e i t e n 0 A_->N00 m a l i t e i t e n _ 0 000 a l i t e i t e n _ _ 0 000 l i t e i t e n _ _ _ 0 00plural i t e i t e n _ _ _ _ plural 0plural0 t e i t e n _ _ _ _ _ 0 plural0-

  30. Spelling changes • Deletion, insertion, replacement b n o r m a l i t e i 0 n o r m a l i t e i t +Da o r m a l i t e i t e A_->N • abnormaliteiten analyzed as [[abnormaal]A iteit]N[en]plural • Root form has double a, wordform drops one a

  31. Part-of-speech • Selection processes in derivation n o r m a l i t e i t +Da o r m a l i t e i t e A_->N r m a l i t e i t e n 0 • Stem abnormaal is an adjective; • Affix -iteit seeks an adjective to its left to turn it into a noun

  32. Experiments • CELEX Dutch: • 336,698 words • 3,209,090 instances • 10-fold cross validation • Learning only: 41.3% correct unseen words • With CSI: 51.9% correct unseen words • Useful improvement

  33. Arabic analysis(Marsi, Van den Bosch, and Soudi, 2005)

  34. Arabic analysis

  35. Arabic analysis

  36. Arabic analysis

  37. Arabic analysis

  38. Arabic analysis • Problem of undergeneration and overgeneration of analyses • Undergeneration: at k=1, • 7 out of 10 analyses of unknown words are correct, but • 4 out of 5 of the real analyses are not generated • Overgeneration: at k=10, • Only 3 out of 5 are missed, but • Half of the generated analyses is incorrect • Harmony at k=3 (F-score 0.42)

  39. Discussion (1) • Memory-based morphological analysis • Lexicon and analyzer in one • Extremely simple algorithm • Unseen words are the remaining problem • Learning: local classifications • From simple boundary decisions • To complex operations • And trigrams • Inference: • More complex morphologies need more inference effort

  40. Discussion (2) • Ceiling not reached yet; good solutions still wanted • Particularly for unknown words with unknown stems • Also, recent work by De Pauw! • External evaluation needed • Integration with part-of-speech tagging (software packages forthcoming) • Effect on IR, IE, QA • Effect in ASR

  41. Thank you. http://ilk.uvt.nl Antal.vdnBosch@uvt.nl

More Related