320 likes | 552 Views
Large Vocabulary Unconstrained Handwriting Recognition. J Subrahmonia Pen Technologies IBM T J Watson Research Center. Pen Technologies. Pen-based interfaces in mobile computing. Mathematical Formulation. H : Handwriting evidence on the basis of which a recognizer will make its decision
E N D
Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center
Pen Technologies • Pen-based interfaces in mobile computing
Mathematical Formulation • H : Handwriting evidence on the basis of which a recognizer will make its decision • H = {h1, h2, h3, h4,…,hm} • W : Word string from a large vocabulary • W = {w1, w2, w3, w4,…., wn} • Recognizer :
Mathematical Formulation CHANNEL SOURCE
Source Channel Model CHANNEL FEATURE EXTRACTOR WRITER DIGITIZER H DECODER
Source Channel Model Handwriting Modeling : HMMs Language Modeling SEARCH STRATEGY
Hidden Markov Models Memoryless Model Add Memory Hide Something Mixture Model Markov Model Add Memory Hide Something Hidden Markov Model Alan B Poritz : Hidden Markov Models : A Guided Tour ICASSP 1988
Memoryless Model COIN : Heads (1) : probability p Tails (0) : probability 1-p Flip the coin 10 times (IID Random sequence) Sequence : 1 0 1 0 0 0 1 1 1 1 Probability = p*(1-p)*p*(1-p)*(1-p)*(1-p)*p*p*p*p =
Add Memory – Markov Model 2 Coins : COIN 1 => p(1) = 0.9, p(0) = 0.1 COIN 2 => p(1) = 0.1, p(0) = 0.9 Experiment : Flip COIN 1, Note the outcome If ( outcome = Head) Flip Coin 1 Else Flip Coin 2 End Sequence 110 0 : Probability = 0.9*0.9*0.1*0.9 Sequence 1010 : Probability = 0.9*0.1*0.1*0.1
State Sequence Representation 1 : 0.9 0 : 0.9 0 : 0.1 1 2 1 : 0.1 Observed Output Sequence Unique State Sequence
Hide the states => Hidden Markov Model 0.9 0.9 0.9 0.1 0.1 0.9 0.9 0.1 0.1 s1 s2 0.1 0.1 0.9
Why use Hidden Markov Models Instead of Non-hidden? • Hidden Markov Models can be smaller – less parameters to estimate • States may be truly hidden • Position of the hand • Positions of articulators
Summary of HMM Basics • We are interested in assigning probabilities p(H) to feature sequences • Memoryless model • This model has no memory of the past • Markov noticed that is some sequences the future depends on the past. He introduced the concept of a STATE – a equivalence class of the past that influences the future • Hide the states : HMM
Hidden Markov Models • Given a observed sequence H • Compute p(H) for decoding • Find the most likely state sequence for a given Markov model (Viterbi algorithm) • Estimate the parameters of the Markov source (training)
Computep(H) p(a) p(b) 0.5 0.4 0.8 0.2 0.5 0.5 0.3 0.7 0.7 0.3 0.3 0.5 s1 s2 s3 0.2 0.1
Computep(H) – contd. • Compute p(H) where H = a a b b • Enumerate all ways of producing h1=a 0.5x0.8 s1 s1 0.40 0.3x0.7 s2 0.21 0.2 0.2 s2 s2 0.04 0.4x0.5 s2 s3 0.03 0.5x0.3
Computep(H) – contd. • Enumerate all ways of producing h1=a h2=a 0.5x0.8 s1 0.5x0.8 s1 s1 0.3x0.7 s2 0.2 0.3x0.7 s2 s2 s2 0.2 0.2 0.4x0.5 s2 s3 0.2 s2 s2 0.5x0.3 0.4x0.5 0.4x0.5 s2 s3 s2 0.5x0.3 0.5x0.3 s3
Computep(H) • Can save computation by combining paths s1 s1 s1 s2 s2 s2 s2 s3 s2 s2 s3 s2 s3
.5x.3 .3x.7 .5x.8 .5x.2 .3x.3 .5x.7 .3x.7 .5x.8 .5x.3 .5x.7 .3x.3 .5x.2 Computep(H) • Trellis Diagram 0 a aa aab aabb s1 .2 .2 .2 .2 .2 s2 .4x.5 .4x.5 .4x.5 .4x.5 .1 .1 .1 .1 .1 s3
Basic Recursion • Prob (Node) = sum (Prob(predecessor) x Prob (predecessor->node) ) • Boundary condition : Prob (s, 0) = 1 0 a aa aab aabb s1, a : 0.4 s1, a : 0.4 s1, a : 0.4 s1, a : 0.4 1.0 s1 1.0 0.4 .16 .016 .0016 s1, 0 : .08 s1, a : .21 s2, a : .04 s1, 0 : .032 s1, a : .084 s2, a : .066 s1, 0 : .0032 s1, b : .0144 s2, b : .0364 s1, 0 : .00032 s1, b : .00144 s2, b : .0108 s1, 0 : 0.2 s2 0.2 0.33 .182 .054 .01256 s2, 0 : .033 s1, a : .03 s2, 0 : .0182 s2, a : .0495 s2, 0 : .0054 s2, b : .0637 s2, 0 : .001256 s2, b : .0189 s2, 0 : 0.02 s3 0.02 0.063 .0677 .0691 .020156
Find Most Likely Path for aabb- Dynamic Prog. or Viterbi • Max Prob (Node) = • MAX(Max(predecessor) x Prob (predecessor->node) ) 0 a aa aab aabb s1 s1,b : .0016 s1, a : 0.4 s1, a : .16 s1, b : .016 1.0 s1, 0 : .0032 s1, b : .0144 s2, b : .0168 s1, 0 : .08 s1, a : .21 s2, a : .04 s1, 0 : .032 s1, a : .084 s2, a : .066 s1, 0 : .00032 s1, b : .00144 s2, b : .00336 s2 s1, 0 : 0.2 s2, 0 : .021 s1, a : .03 s2, 0 : .0084 s2, a : .0315 s2, 0 :.00168 s2, b : .0294 s2, 0 : .000336 s2, b : .00588 s2, 0 : 0.02 s3
Training HMM parameters 1/3 1/2 p(a) p(b) 1/2 1/2 = H = abaa 1/3 1/2 1/3 .000385 .000578 .000868 .001157 .002604 .001736 .001302 p(H) = .008632
Training HMM parameters = A posterior probability of path i = .045 .067 .134 .100 .201 .150 .301
Training HMM parameters .46 .60 .64 .36 .71 .29 .68 .32 .40 .34 .60 .40 .20 0.00108 0.00129 0.00404 0.00212 0.00253 0.00791 0.00537 Keep on repeating : 600 iterations : p(H) = .037037037 Another initial parameter set : p(H) = 0.0625
Training HMM parameters • Converges to local maximum • There are 7 (atleast) local maxima • Final solution depends on starting point • Speed of convergence depends on starting point
Training HMM parameters : Forward Backward algorithm • Improves on enumerating algorithm by using the Trellis • Results in reduction from exponential computation to linear computation
ForwardBackwardAlgorithm j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Forward Backward Algorithm • = Probability that hj is produced by and the complete output is H = = Probability of being in state and producing the output h1, .. hj-1 = Probability of being in state and producing the output hj+1,..hm
Forward Backward Algorithm Transition count
Training HMM parameters • Guess initial values for all parameters • Compute forward and backward pass probabilities • Compute counts • Re-estimate probabilities BAUM-WELCH, BAUM-EAGON, FORWARD-BACKWARD, E-M