90 likes | 230 Views
Stat. inference: n-gram models over sparce data. Stat nlp function. Taking some data(generated in accordance with some unknown probability distribution) and then making some inferences about this distribution. Ex
E N D
Stat nlp function • Taking some data(generated in accordance with some unknown probability distribution) and then making some inferences about this distribution. Ex • We might look at lots of prepositional phrase attachments in a corpus and use them to try to predict prepositional phrase attachments for English in general.
We will examine the classic task of language modelling (aka Shannon game) where the problem is to predict the next word given the previous words. • Importance: • Speech or optical recognition, SMT, spelling correction, and handwriting recognition.
Uses • Word sense disambiguation • Probabilistic parsing
http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html • Preprocess the corpus using ASCII files • Check this • http://books.google.com/ngrams
MLE • Come across • 10 times of come across • 8 of which were followed by as • Once by more and once by a • PMLE (wn|w1 ….. Wn-1) = C(w1……….wn • ________________ • C(w1……. W-1