1 / 12

Semantic n-gram language modeling with the latent maximum entropy principle

Semantic n-gram language modeling with the latent maximum entropy principle. Shaojun Wang, Dale Schuurmans, Fuchun Peng and unxin Zhao. Reference. Shaojun Wang,Dale Schuurmans, Yunxin Zhao ,“The Latent Maximum Entropy Principle”. Introduction.

Download Presentation

Semantic n-gram language modeling with the latent maximum entropy principle

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic n-gram language modeling with the latent maximum entropy principle Shaojun Wang, Dale Schuurmans, Fuchun Peng and unxin Zhao NTNU Speech Lab

  2. Reference • Shaojun Wang,Dale Schuurmans, Yunxin Zhao ,“The Latent Maximum Entropy Principle” NTNU Speech Lab

  3. Introduction • There are various kinds of language models that can be used to capture different aspects of NL • Several techniques for combining language model have been investigated NTNU Speech Lab

  4. Latent Maximum Entropy • Given features f1,…,fN Subject to • Regularized LME principle Subject to NTNU Speech Lab

  5. EM-IS algorithm NTNU Speech Lab

  6. Combining N-gram and PLSA Models • X=(W2,W1,W0,D,T2,T1,T0) NTNU Speech Lab

  7. Tri-gram constraints NTNU Speech Lab

  8. PLSA constraints NTNU Speech Lab

  9. Efficient Feature Expectation and Inference Normalization constant: Feature expectations: NTNU Speech Lab

  10. Computation in Testing NTNU Speech Lab

  11. Experimental results • Training • NAB 87,000 documents, 38 millions words, 1987~1989 • Testing • 32,5000 words from 1989 • EM loop = 5, IIS loop= 20, |T|=125 NTNU Speech Lab

  12. Conclusion • LME provide a general statistical framework for incorporating arbitrary aspects of natural language into a parametric model NTNU Speech Lab

More Related