330 likes | 464 Views
A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union. ICT. Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang. Outline. Introduction Modeling of morphology and shape Experimental Setup
E N D
A Comparative Investigation of Morphological Language Modeling for theLanguages of the European Union ICT Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang
Outline Introduction Modeling of morphology and shape Experimental Setup Results and Discussion Conclusion
Outline Introduction Modeling of morphology and shape Experimental Setup Results and Discussion Conclusion
Introduction • Motivation • Main idea
large dangerous serious large dangerous serious hypothetically potentially (rare history) (frequent history) Motivation how to transfer ? Language model? morphology
main idea • goal • perplexity reduction(PD) for a large number of languages
main idea • goal • perplexity reduction(PD) for a large number of languages • Feature • Morphologigy • Shape Feature
main idea • goal • perplexity reduction(PD) for a large number of languages • Feature • Morphologigy • Shape Feature • parameters • frequency threshold θ • number of suffixes uesd φ • morphological segmentation algorithms
Outline Introduction Modeling of morphology and shape Experimental Setup Results and Discussion Conclusion
Modeling of morphology and shape • Morphology • Shape features • Similarity measure
Morphology • Automatic suffix identification algorithms: Reports , Morfessor and Frequency • Parameter:φ most frequent suffixes
Shape features • capitalization • special characters • word length
similarity measure • similarity measure and details of the shape features in prior work (M¨ uller and Sch¨ utze, 2011).
Outline Introduction Modeling of morphology and shape Experimental Setup Results and Discussion Conclusion
Experimental Setup • Baseline • Morphological class language model • Distributional class language model • Corpus
Experimental Setup • Experiments: • srilm, kneser-Ney(KN), generic class implementation, optimal interpolation parameters • Baseline • modified KN model
Morphological class language model Class-based language model: Word emission probobility:
Morphological class language model Final model PM interpolates PC with a modified KN model: Unknow word estimation:
Morphological class language model modified class model PC'
Distributional class language model • PD is same form PM • The difference is the classes are mophological for PM and distributional for PD • Whole-context distributional vector space model
Corpus • training set(80%) • validation set(10%) • test set(10%)
Outline Introduction Modeling of morphology and shape Experimental Setup Results and Discussion Conclusion
Results and Discussion • Morphological model vs. Distributional model • Sensitivity analysis of parameters
Morphological model vs. Distributional model • MM:more morphological, more perplexity reduction,largerφ. • MM:Result considerable perplexity reduc-tions 3%-11% • Frequency is surprisingly well • Noly 4 cases DM better than MM • DM restriction clustering to less frequent words
Sensitivity analysis of parameters • best and worst values of eachparameter and the difference in perplexity improve-ment between the two. • θ • strong influence on PD • positive correlated with morphological complexity • φ and segmentation algorithms • negligible effect • frequency is perform best.
Outline Introduction Modeling of morphology and shape Experimental Setup Results and Discussion Conclusion
Conclusion • Feature:morphology shape feature • Result:perplexity reduc-tions 3%-11% • parameters: • θ:considerable influence • φ and segmentation algorithms: small effect
Future Work • A model that interpolates KN, morphological class model and distributional class model.
my thought • Minority language model
ICT Q&A?
ICT Thank you!