1 / 32

Large Vocabulary Unconstrained Handwriting Recognition

Large Vocabulary Unconstrained Handwriting Recognition. J Subrahmonia Pen Technologies IBM T J Watson Research Center. Pen Technologies. Pen-based interfaces in mobile computing. Mathematical Formulation. H : Handwriting evidence on the basis of which a recognizer will make its decision

barny
Download Presentation

Large Vocabulary Unconstrained Handwriting Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center

  2. Pen Technologies • Pen-based interfaces in mobile computing

  3. Mathematical Formulation • H : Handwriting evidence on the basis of which a recognizer will make its decision • H = {h1, h2, h3, h4,…,hm} • W : Word string from a large vocabulary • W = {w1, w2, w3, w4,…., wn} • Recognizer :

  4. Mathematical Formulation CHANNEL SOURCE

  5. Source Channel Model CHANNEL FEATURE EXTRACTOR WRITER DIGITIZER H DECODER

  6. Source Channel Model Handwriting Modeling : HMMs Language Modeling SEARCH STRATEGY

  7. Hidden Markov Models Memoryless Model Add Memory Hide Something Mixture Model Markov Model Add Memory Hide Something Hidden Markov Model Alan B Poritz : Hidden Markov Models : A Guided Tour ICASSP 1988

  8. Memoryless Model COIN : Heads (1) : probability p Tails (0) : probability 1-p Flip the coin 10 times (IID Random sequence) Sequence : 1 0 1 0 0 0 1 1 1 1 Probability = p*(1-p)*p*(1-p)*(1-p)*(1-p)*p*p*p*p =

  9. Add Memory – Markov Model 2 Coins : COIN 1 => p(1) = 0.9, p(0) = 0.1 COIN 2 => p(1) = 0.1, p(0) = 0.9 Experiment : Flip COIN 1, Note the outcome If ( outcome = Head) Flip Coin 1 Else Flip Coin 2 End Sequence 110 0 : Probability = 0.9*0.9*0.1*0.9 Sequence 1010 : Probability = 0.9*0.1*0.1*0.1

  10. State Sequence Representation 1 : 0.9 0 : 0.9 0 : 0.1 1 2 1 : 0.1 Observed Output Sequence  Unique State Sequence

  11. Hide the states => Hidden Markov Model 0.9 0.9 0.9 0.1 0.1 0.9 0.9 0.1 0.1 s1 s2 0.1 0.1 0.9

  12. Why use Hidden Markov Models Instead of Non-hidden? • Hidden Markov Models can be smaller – less parameters to estimate • States may be truly hidden • Position of the hand • Positions of articulators

  13. Summary of HMM Basics • We are interested in assigning probabilities p(H) to feature sequences • Memoryless model • This model has no memory of the past • Markov noticed that is some sequences the future depends on the past. He introduced the concept of a STATE – a equivalence class of the past that influences the future • Hide the states : HMM

  14. Hidden Markov Models • Given a observed sequence H • Compute p(H) for decoding • Find the most likely state sequence for a given Markov model (Viterbi algorithm) • Estimate the parameters of the Markov source (training)

  15. Computep(H) p(a) p(b) 0.5 0.4 0.8 0.2 0.5 0.5 0.3 0.7 0.7 0.3 0.3 0.5 s1 s2 s3 0.2 0.1

  16. Computep(H) – contd. • Compute p(H) where H = a a b b • Enumerate all ways of producing h1=a 0.5x0.8 s1 s1 0.40 0.3x0.7 s2 0.21 0.2 0.2 s2 s2 0.04 0.4x0.5 s2 s3 0.03 0.5x0.3

  17. Computep(H) – contd. • Enumerate all ways of producing h1=a h2=a 0.5x0.8 s1 0.5x0.8 s1 s1 0.3x0.7 s2 0.2 0.3x0.7 s2 s2 s2 0.2 0.2 0.4x0.5 s2 s3 0.2 s2 s2 0.5x0.3 0.4x0.5 0.4x0.5 s2 s3 s2 0.5x0.3 0.5x0.3 s3

  18. Computep(H) • Can save computation by combining paths s1 s1 s1 s2 s2 s2 s2 s3 s2 s2 s3 s2 s3

  19. .5x.3 .3x.7 .5x.8 .5x.2 .3x.3 .5x.7 .3x.7 .5x.8 .5x.3 .5x.7 .3x.3 .5x.2 Computep(H) • Trellis Diagram 0 a aa aab aabb s1 .2 .2 .2 .2 .2 s2 .4x.5 .4x.5 .4x.5 .4x.5 .1 .1 .1 .1 .1 s3

  20. Basic Recursion • Prob (Node) = sum (Prob(predecessor) x Prob (predecessor->node) ) • Boundary condition : Prob (s, 0) = 1 0 a aa aab aabb s1, a : 0.4 s1, a : 0.4 s1, a : 0.4 s1, a : 0.4 1.0 s1 1.0 0.4 .16 .016 .0016 s1, 0 : .08 s1, a : .21 s2, a : .04 s1, 0 : .032 s1, a : .084 s2, a : .066 s1, 0 : .0032 s1, b : .0144 s2, b : .0364 s1, 0 : .00032 s1, b : .00144 s2, b : .0108 s1, 0 : 0.2 s2 0.2 0.33 .182 .054 .01256 s2, 0 : .033 s1, a : .03 s2, 0 : .0182 s2, a : .0495 s2, 0 : .0054 s2, b : .0637 s2, 0 : .001256 s2, b : .0189 s2, 0 : 0.02 s3 0.02 0.063 .0677 .0691 .020156

  21. More Formally –Forward Algorithm

  22. Find Most Likely Path for aabb- Dynamic Prog. or Viterbi • Max Prob (Node) = • MAX(Max(predecessor) x Prob (predecessor->node) ) 0 a aa aab aabb s1 s1,b : .0016 s1, a : 0.4 s1, a : .16 s1, b : .016 1.0 s1, 0 : .0032 s1, b : .0144 s2, b : .0168 s1, 0 : .08 s1, a : .21 s2, a : .04 s1, 0 : .032 s1, a : .084 s2, a : .066 s1, 0 : .00032 s1, b : .00144 s2, b : .00336 s2 s1, 0 : 0.2 s2, 0 : .021 s1, a : .03 s2, 0 : .0084 s2, a : .0315 s2, 0 :.00168 s2, b : .0294 s2, 0 : .000336 s2, b : .00588 s2, 0 : 0.02 s3

  23. Training HMM parameters 1/3 1/2 p(a) p(b) 1/2 1/2 = H = abaa 1/3 1/2 1/3 .000385 .000578 .000868 .001157 .002604 .001736 .001302 p(H) = .008632

  24. Training HMM parameters = A posterior probability of path i = .045 .067 .134 .100 .201 .150 .301

  25. Training HMM parameters

  26. Training HMM parameters .46 .60 .64 .36 .71 .29 .68 .32 .40 .34 .60 .40 .20 0.00108 0.00129 0.00404 0.00212 0.00253 0.00791 0.00537 Keep on repeating : 600 iterations : p(H) = .037037037 Another initial parameter set : p(H) = 0.0625

  27. Training HMM parameters • Converges to local maximum • There are 7 (atleast) local maxima • Final solution depends on starting point • Speed of convergence depends on starting point

  28. Training HMM parameters : Forward Backward algorithm • Improves on enumerating algorithm by using the Trellis • Results in reduction from exponential computation to linear computation

  29. ForwardBackwardAlgorithm j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  30. Forward Backward Algorithm • = Probability that hj is produced by and the complete output is H = = Probability of being in state and producing the output h1, .. hj-1 = Probability of being in state and producing the output hj+1,..hm

  31. Forward Backward Algorithm Transition count

  32. Training HMM parameters • Guess initial values for all parameters • Compute forward and backward pass probabilities • Compute counts • Re-estimate probabilities BAUM-WELCH, BAUM-EAGON, FORWARD-BACKWARD, E-M

More Related