1 / 14

Statistical language modeling combining n-gram and dependency grammar

Eran Chinthaka, Ikhyun Park. Statistical language modeling combining n-gram and dependency grammar. Introduction. Statistical language models and ngrams Problems with ngram models Data sparseness Long dependencies Proposed Solution

kylee-shaw
Download Presentation

Statistical language modeling combining n-gram and dependency grammar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eran Chinthaka, Ikhyun Park Statistical language modeling combining n-gram and dependency grammar

  2. Introduction • Statistical language models and ngrams • Problems with ngram models • Data sparseness • Long dependencies • Proposed Solution • Use a hybrid model of ngram and dependency grammar for language model

  3. Process Evaluator Test Data (Good and Bad) Optimal Parameters Perplexity

  4. Training Data System Architecture

  5. Experimental Setup • Data • Brown Corpus • 28671 -- Train sentences • 9557 -- Development Sentences • 9556 -- Test Sentences • Tools • Smoother and Language Model Builder • CMU-Cambridge Statistical Language Modeling Toolkit v2 (http://www.speech.cs.cmu.edu/SLM/toolkit.html) • Dependency Parser • Stanford parser (http://nlp.stanford.edu/software/lex-parser.shtml)

  6. Sentence Evaluation • Ngram Score • Dependency Score • Combined Score

  7. Smoothing – Absolute Discounting • Ngram Language Model if if else if else if else

  8. Smoothing – Absolute Discounting • Dependency Language Model if else

  9. Assessment • Perplexity (Ngram only) • Perplexity (Combined)  Inappropriate

  10. Assessment • Classification of sentences (good vs bad) • Bad sentence generation • Shuffle good sentences • Eg :The election will be Dec. 4 from 8 a.m. to 8 p.m. .  The election will be 8 8 from 4 a.m. to Dec. p.m. . • Shuffle degree = 7 (number of lost bigrams)

  11. Results • Distribution of Sentences Ngram Avg. Shuffle: 12.357225 Dependency

  12. Results Avg. Shuffle: 12.357225 • Classification (ngram vs. ngram+dep) False Reject NOT Improved -*- ngram -*- ngram+dep. False Accept

  13. Discussion • Why no improvement • Insufficient feature exploration • Statistical nature of dependency parser • Any ideas?

  14. Thank You !!

More Related