150 likes | 252 Views
Eran Chinthaka, Ikhyun Park. Statistical language modeling combining n-gram and dependency grammar. Introduction. Statistical language models and ngrams Problems with ngram models Data sparseness Long dependencies Proposed Solution
E N D
Eran Chinthaka, Ikhyun Park Statistical language modeling combining n-gram and dependency grammar
Introduction • Statistical language models and ngrams • Problems with ngram models • Data sparseness • Long dependencies • Proposed Solution • Use a hybrid model of ngram and dependency grammar for language model
Process • Evaluator • Test Data • (Good and Bad) • Optimal Parameters • Perplexity
Training Data System Architecture
Experimental Setup • Data • Brown Corpus • 28671 -- Train sentences • 9557 -- Development Sentences • 9556 -- Test Sentences • Tools • Smoother and Language Model Builder • CMU-Cambridge Statistical Language Modeling Toolkit v2 (http://www.speech.cs.cmu.edu/SLM/toolkit.html) • Dependency Parser • Stanford parser (http://nlp.stanford.edu/software/lex-parser.shtml)
Sentence Evaluation • Ngram Score • Dependency Score • Combined Score
Smoothing – Absolute Discounting • Ngram Language Model if if else if else if else
Smoothing – Absolute Discounting • Dependency Language Model if else
Assessment • Perplexity (Ngram only) • Perplexity (Combined) Inappropriate
Assessment • Classification of sentences (good vs bad) • Bad sentence generation • Shuffle good sentences • Eg :The election will be Dec. 4 from 8 a.m. to 8 p.m. . The election will be 8 8 from 4 a.m. to Dec. p.m. . • Shuffle degree = 7 (number of lost bigrams)
Results • Distribution of Sentences Ngram Avg. Shuffle: 12.357225 Dependency
Results Avg. Shuffle: 12.357225 • Classification (ngram vs. ngram+dep) False Reject NOT Improved -*- ngram -*- ngram+dep. False Accept
Discussion • Why no improvement • Insufficient feature exploration • Statistical nature of dependency parser • Any ideas?