Statistical Language Modelling Part I – Observable Models

Statistical Language Modelling Part I – Observable Models Simon Lucas

Summary • Applications • The fundamentals • Observable v. hidden (latent) models • N-gram and scanning n-tuple models • Incremental classifiers and LOO optimisation • Evaluation methods • Results • Conclusions and further work

Statistical Language Models • Compute p(x|M) – the probability of a sequence x given the Model M • Java interface: public interface LanguageModel { public void train(SequenceDataset sd); public double p(int[] seq); }

Sequence Dataset public interface SequenceDataset { public int nSymbols(); public int nSequences(); public int[] getSequence(int i); }

Evaluating Language Models • Standard: • test set Perplexity • Preferred (by me!): • Recognition accuracy • Dictionary Extrapolation • Perplexity assumes all models are playing by the same rules • The other models make no such assumptions

Distributed Mode Evaluation • Use Algoval evaluation server • Currently: http://ace.essex.ac.uk • Download the developer pack • Configure model – or write your own • Specify test parameters • Run tests • View results immediately on web site!

Sequence Recognition • Given a statistical language model • Can easily deploy it for sequence recognition • Build a model for each class • Assign pattern to class with highest posterior • Better still – return the vector of posteriors for soft recognition • Interesting to try these models against simple nearest LD and WLD nearest neighbour

App1: Recognising OCR Chain Codes

Results (OLD!)

SN-Tuple MethodCurrent Status for OCR • Actively being researched at IBM TJ Watson • See Ratzlaff , proc ICDAR 2001, pages 18 – 22 (on djvu.com (note: NOT dejavu.com!!!!!)) • Concludes: “the sn-tuple is a viable method for on-line handwriting recognition”

1. ACHROIA2. ACHROEA3. ASEMIA4. ASEMEIA5. ACHAEA6. ACODIA7. ACHORIA8. ACHYRA9. ACRAEA10.ACHIRIA App2: Contextual OCR

Dictionary Extrapolation • Previous slide showed how well we can do with noisy images, with the aid of dictionary context • BUT: suppose the dictionary only has 50% coverage • Need a trainable model that can extrapolate from the given data • How to evaluate such a model?

Left Out Rank Estimate • For each word in the dictionary • Create a new dictionary with that word left out • Create a set of neighbouring words to the left out word • Get model to evaluate likelihood of each neighbouring word and the left out word • Return a rank-based score between 1.0 and 0.0 (from top to bottom of list)

App3: Human Chromosome Recognition (Banded Images)

Example Data 22 Human Chromosomes Chromosome 10: / 1802 3 10 55 19 / A=A=a===B==a====D==d====D==e======B==b====B==b====A=a=a / 3843 84 10 55 18 / A=B===a==A==a==D==d=====D==d======C==b===A===c====A=a=a / 7231 158 10 55 20 / A===B==a==C==a==A==c===D===d======C===b==B===d===A=a==a / 787 15 10 55 18 / A==B==a=A===a===B===b===D===e====A===a==A==a=Aa=A==a==a / 2459 60 10 54 19 / A=B=aB==a=A==a==C===c==C====d=====C==b===A===c===A=a=a / 3290 21 10 54 19 / A==B==a==A==a====B==b==D====c=====B==b==A==c====A=a==a / 5591 122 10 54 17 / A=A=a==A==a====A==a====E===d====B====b==A===b==A===a=a Chromosome 15: / 1447 5 15 43 10 / AA=a======D==b=======C==d=====A==a==A==b==a / 2120 32 15 43 11 / B=a=====E===c==A=a===C==d====Aa=A=a==A=b==a / 2759 16 15 43 9 / A=A=====D====aA=a==A===c=======A=a=A===c==a

N-gram Recognizers • Bigram

Leave One Out Error • Generally a good estimate of test-set error • Especially fast to compute for Incremental classifiers (O(n)) • As opposed to O(n2) for non-incremental

Incremental Classifiers • Can learn new patterns on demand without access to rest of training set • Can ‘forget’ or unlearn patterns on demand also • Incremental: n-gram, n-tuple, nearest neighbour (memory or counting methods) • Non-incremental: MLP, HMM, (SVM?) (latent variable re-estimation methods)

Statistical Model Servers • Server model of statistical models • Each server supports a range of models • Each model can have many instances • Each instance can be invoked for training or estimation • Now we can independently evaluate the service, not just the model!

Results • Bioinformatics • Dictionary modelling

Statistical Language ModellingPart II • Ensembles of observable models • Latent variable models • HMM • SCFG • Category n-gram • Other applications: Robot Sensors?

Statistical Language Modelling Part I – Observable Models

Statistical Language Modelling Part I – Observable Models

Presentation Transcript

Introduction To R

Part III: Models of synaptic plasticity

CityGML and 3D modelling

Mixed Models – Part 2

Mixed Models

Twelve Angry Men (Part Two)

Object-Oriented Modelling in LePUS3 and Class-Z

Models of Human Performance

PART TWO Statistical Physics Chapter III:Statistic Distributions for ideal gases

LING / C SC 439/539 Statistical Natural Language Processing

Statistics of natural images

Supply and Demand Models

6 th Grade Language Lessons

Ecological modelling

UML FUNDAMENTALS

Applications of Geophysical Inversion and Imaging Part 4 – AVO Modelling and Analysis

Statistics and Modelling Course

Statistical Models in S

Going Home

Statistical Studies: Statistical Investigations

Enviromental Modelling Introduction to TerraME