1 / 33

Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang

A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union. ICT. Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang. Outline. Introduction Modeling of morphology and shape Experimental Setup

brone
Download Presentation

Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Comparative Investigation of Morphological Language Modeling for theLanguages of the European Union ICT Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang

  2. Outline Introduction Modeling of morphology and shape Experimental Setup Results and Discussion Conclusion

  3. Outline Introduction Modeling of morphology and shape Experimental Setup Results and Discussion Conclusion

  4. Introduction • Motivation • Main idea

  5. large dangerous serious large dangerous serious hypothetically potentially (rare history) (frequent history) Motivation how to transfer ? Language model? morphology

  6. main idea • goal • perplexity reduction(PD) for a large number of languages

  7. main idea • goal • perplexity reduction(PD) for a large number of languages • Feature • Morphologigy • Shape Feature

  8. main idea • goal • perplexity reduction(PD) for a large number of languages • Feature • Morphologigy • Shape Feature • parameters • frequency threshold θ • number of suffixes uesd φ • morphological segmentation algorithms

  9. Outline Introduction Modeling of morphology and shape Experimental Setup Results and Discussion Conclusion

  10. Modeling of morphology and shape • Morphology • Shape features • Similarity measure

  11. Morphology • Automatic suffix identification algorithms: Reports , Morfessor and Frequency • Parameter:φ most frequent suffixes

  12. Shape features • capitalization • special characters • word length

  13. similarity measure • similarity measure and details of the shape features in prior work (M¨ uller and Sch¨ utze, 2011).

  14. Outline Introduction Modeling of morphology and shape Experimental Setup Results and Discussion Conclusion

  15. Experimental Setup • Baseline • Morphological class language model • Distributional class language model • Corpus

  16. Experimental Setup • Experiments: • srilm, kneser-Ney(KN), generic class implementation, optimal interpolation parameters • Baseline • modified KN model

  17. Morphological class language model Class-based language model: Word emission probobility:

  18. Morphological class language model Final model PM interpolates PC with a modified KN model: Unknow word estimation:

  19. Morphological class language model modified class model PC'

  20. Distributional class language model • PD is same form PM • The difference is the classes are mophological for PM and distributional for PD • Whole-context distributional vector space model

  21. Corpus • training set(80%) • validation set(10%) • test set(10%)

  22. Outline Introduction Modeling of morphology and shape Experimental Setup Results and Discussion Conclusion

  23. Results and Discussion • Morphological model vs. Distributional model • Sensitivity analysis of parameters

  24. Morphological model vs. Distributional model • MM:more morphological, more perplexity reduction,largerφ. • MM:Result considerable perplexity reduc-tions 3%-11% • Frequency is surprisingly well • Noly 4 cases DM better than MM • DM restriction clustering to less frequent words

  25. Morphological model vs. Distributional model

  26. Sensitivity analysis of parameters • best and worst values of eachparameter and the difference in perplexity improve-ment between the two. • θ • strong influence on PD • positive correlated with morphological complexity • φ and segmentation algorithms • negligible effect • frequency is perform best.

  27. Sensitivity analysis of parameters

  28. Outline Introduction Modeling of morphology and shape Experimental Setup Results and Discussion Conclusion

  29. Conclusion • Feature:morphology shape feature • Result:perplexity reduc-tions 3%-11% • parameters: • θ:considerable influence • φ and segmentation algorithms: small effect

  30. Future Work • A model that interpolates KN, morphological class model and distributional class model.

  31. my thought • Minority language model

  32. ICT Q&A?

  33. ICT Thank you!

More Related