210 likes | 364 Views
Statistical Language Modelling Part I – Observable Models. Simon Lucas. Summary. Applications The fundamentals Observable v. hidden (latent) models N-gram and scanning n-tuple models Incremental classifiers and LOO optimisation Evaluation methods Results Conclusions and further work.
E N D
Statistical Language Modelling Part I – Observable Models Simon Lucas
Summary • Applications • The fundamentals • Observable v. hidden (latent) models • N-gram and scanning n-tuple models • Incremental classifiers and LOO optimisation • Evaluation methods • Results • Conclusions and further work
Statistical Language Models • Compute p(x|M) – the probability of a sequence x given the Model M • Java interface: public interface LanguageModel { public void train(SequenceDataset sd); public double p(int[] seq); }
Sequence Dataset public interface SequenceDataset { public int nSymbols(); public int nSequences(); public int[] getSequence(int i); }
Evaluating Language Models • Standard: • test set Perplexity • Preferred (by me!): • Recognition accuracy • Dictionary Extrapolation • Perplexity assumes all models are playing by the same rules • The other models make no such assumptions
Distributed Mode Evaluation • Use Algoval evaluation server • Currently: http://ace.essex.ac.uk • Download the developer pack • Configure model – or write your own • Specify test parameters • Run tests • View results immediately on web site!
Sequence Recognition • Given a statistical language model • Can easily deploy it for sequence recognition • Build a model for each class • Assign pattern to class with highest posterior • Better still – return the vector of posteriors for soft recognition • Interesting to try these models against simple nearest LD and WLD nearest neighbour
SN-Tuple MethodCurrent Status for OCR • Actively being researched at IBM TJ Watson • See Ratzlaff , proc ICDAR 2001, pages 18 – 22 (on djvu.com (note: NOT dejavu.com!!!!!)) • Concludes: “the sn-tuple is a viable method for on-line handwriting recognition”
1. ACHROIA2. ACHROEA3. ASEMIA4. ASEMEIA5. ACHAEA6. ACODIA7. ACHORIA8. ACHYRA9. ACRAEA10.ACHIRIA App2: Contextual OCR
Dictionary Extrapolation • Previous slide showed how well we can do with noisy images, with the aid of dictionary context • BUT: suppose the dictionary only has 50% coverage • Need a trainable model that can extrapolate from the given data • How to evaluate such a model?
Left Out Rank Estimate • For each word in the dictionary • Create a new dictionary with that word left out • Create a set of neighbouring words to the left out word • Get model to evaluate likelihood of each neighbouring word and the left out word • Return a rank-based score between 1.0 and 0.0 (from top to bottom of list)
Example Data 22 Human Chromosomes Chromosome 10: / 1802 3 10 55 19 / A=A=a===B==a====D==d====D==e======B==b====B==b====A=a=a / 3843 84 10 55 18 / A=B===a==A==a==D==d=====D==d======C==b===A===c====A=a=a / 7231 158 10 55 20 / A===B==a==C==a==A==c===D===d======C===b==B===d===A=a==a / 787 15 10 55 18 / A==B==a=A===a===B===b===D===e====A===a==A==a=Aa=A==a==a / 2459 60 10 54 19 / A=B=aB==a=A==a==C===c==C====d=====C==b===A===c===A=a=a / 3290 21 10 54 19 / A==B==a==A==a====B==b==D====c=====B==b==A==c====A=a==a / 5591 122 10 54 17 / A=A=a==A==a====A==a====E===d====B====b==A===b==A===a=a Chromosome 15: / 1447 5 15 43 10 / AA=a======D==b=======C==d=====A==a==A==b==a / 2120 32 15 43 11 / B=a=====E===c==A=a===C==d====Aa=A=a==A=b==a / 2759 16 15 43 9 / A=A=====D====aA=a==A===c=======A=a=A===c==a
N-gram Recognizers • Bigram
Leave One Out Error • Generally a good estimate of test-set error • Especially fast to compute for Incremental classifiers (O(n)) • As opposed to O(n2) for non-incremental
Incremental Classifiers • Can learn new patterns on demand without access to rest of training set • Can ‘forget’ or unlearn patterns on demand also • Incremental: n-gram, n-tuple, nearest neighbour (memory or counting methods) • Non-incremental: MLP, HMM, (SVM?) (latent variable re-estimation methods)
Statistical Model Servers • Server model of statistical models • Each server supports a range of models • Each model can have many instances • Each instance can be invoked for training or estimation • Now we can independently evaluate the service, not just the model!
Results • Bioinformatics • Dictionary modelling
Statistical Language ModellingPart II • Ensembles of observable models • Latent variable models • HMM • SCFG • Category n-gram • Other applications: Robot Sensors?