340 likes | 616 Views
Hidden Markov Modelling and Handwriting Recognition. Csink László 2009. Types of Handwriting 1. 1. BLOCK PRINTING. 2. GUIDED CURSIVE HANDWRITING. Types of Handwriting 2. 3. UNCONSTRAINED CURSIVE HANDWRITING Clearly faster, but less legible, than 1 or 2.
E N D
Hidden Markov Modelling and Handwriting Recognition Csink László 2009
Types of Handwriting 1 1. BLOCK PRINTING 2. GUIDED CURSIVE HANDWRITING
Types of Handwriting 2 3. UNCONSTRAINED CURSIVE HANDWRITING Clearly faster, but less legible, than 1 or 2 ONLINE recognition for 3: some systems have been developed OFFLINE recognition for 3: much research has been done, still a lot to do Suen: ”no simple scheme is likely to achieve high recognition and reliability rates, not to mention human performance”
Introduction to Hidden Markov Modelling (HMM): a simple example 1 Suppose we want to determine the average annual temperature at a specific location over a series of years. We want to do it of such a past era of which measurements are unavailable. We assume that only two kinds of years exist: hot (H) and cold(C) and we know that the probability of a cold year coming after a hot one is 0.3, and the probability of a cold year coming after a cold one is 0.6. Similar data are known about the prob of a hot year after a hot one or a cold one, respectively. We assume that the probabilities are the same over the years. Then the data are expressed like this: We note that the row sums in the red matrix are 1! (row stochastic matrix) The transition process described by the red matrix is a MARKOV PROCESS, as the next state depends only on the prevoius one.
Introduction to HMM: a simple example 2 • We also suppose that there is a known correaltion between the size of tree growth rings and temperature. We consider only 3 different ring sizes: Small, Medium and Large. We know that in each year the following probabilistic realtionship holds between the states H and C and the rings S, M and L: We note that the row sums in the red matrix are 1! (also a row stochastic matrix)
Introduction to HMM: a simple example 3 • Since the past temperatures are unknown, that is the past states are hidden, the above model is called a Hidden Markov Model (HMM). State transition matrix (Markov) Observation matrix Initial state distribution (we assume this is also known) A and B and π are all row stochastic
Introduction to HMM: a simple example 4 • Denote the rings S, M and L by 0,1 and 2, resp. Assume that in a –year period we observe O=(0,1,0,2). We want to determine the most likely sequence of the Markov process given the observations O. • Dynamic Programming: the most likely sequence is the one with the highest probability from all possible state sequences of length four. • HMM solution: the most likely sequence is the one that maximizes the expected number of correct states. • These two solutuions do not necessarily coincide!
Introduction to HMM: a simple example 5 Notations State transition matrix A Initial state distribution π Observation matrix B In the previous example, T=4, N=2, M=3, Q={H, C} V={0(=S), 1(=M), 2(=L)}, O=(0,1,0,2) O= (0, 1, 0, 2)
State Sequence Probability • Consider a state sequence of length four • X=(x0,x1,x2,x3) with observations O=(O0,O1,O2,O3) • Denote by πx0 the probability of starting in state x0. bx0(O0) is the probability of initially observing O0 and ax0,x1 is the probability of transiting from state x0 to state x1. We see that the probability of the state sequence X above is
Probability of Sequence (H,H,C,C) A= B= P(HHCC) = 0.6 (0.1) (0.7) (0.4) (0.3) (0.7) (0.6) (0.1) = 0.000212
Finding the Best Solution in the DP Sense We compute the state sequence probabilities (see left) the same way as we computed the 5th row in the previous slides. Writing =B2/B$18 into C2 and copying the formula downwards we get the normalized probabilities. Using EXCEL functions, =INDEX(A2:A17; MATCH(MAX(B2:B17);B2:B17;0)) [ =INDEX(A2:A17; HOL.VAN(MAX(B2:B17);B2:B17;0)) ] we find that sequence with highest probability is CCCH. This gives the best solution in the Dynamic Programming (DP) sense.
The HMM prob matrix Using the EXCEL finctions MID [KÖZÉP] and SUMIF [SZUMHA] we produced the columns D,E,F and G to show the 1st,2nd,3rd and 4th states. Then summing up columns D,E,F and G when the state is ”H” we get the first row of the HMM prob matrix. The second row of the HMM prob matrix is computed similarly, using ”C” instead of ”H”.
Three Problems Problem 1 Given the model λ=(A,B,π) and a sequence of observations O, find P(O| λ). In other words, we want to determine the likelihood of the observed sequence O, given the model. Problem 2 Given the model λ=(A,B,π) and a sequence of observations O, find an optimal state sequence for the underlying Markov process. In other words, we want to uncover the hidden part of the Hidden Markov Model. Problem 3 Given an observation sequence O and dimensions N and M, find the model λ=(A,B,π) that maximizes the probability of O. This can be viewd as training the model to best fit the observed data.
Solution to Problem 1 Let λ=(A,B,π) be a given model and let O=(O0,O1,…,OT-1) ne a series of observations. We want to find P(O| λ). Let X=(x0,x1,…,xT-1) be a state sequence.Then by definion of B we have and by the definition of π and A we have
By summing over all possible state sequences we get As the length of the state sequence and the observation sequence is T, we have NT terms in this sum, and we have T multiplications in a term, so the total number of multiplications is T×NT. Fortunately, there exists a much faster algorithm as well.
The Forward α-pass Algorithm αt(i) is the probability of the partial observation sequence up to time t, where qi is the state the underlying Markov process has at time t. Let α0(i)=πibi(O0) for i=0,1,…,N-1. For t=1,2,…,T-1 and i=0,1,…,N-1 compute We have to compute α T×N-times and there are N multiplications in each α, so this method needs T×N2 multiplications.
Solution to Problem 2 Given the model λ=(A,B,π) and a sequence of observations O, our goal is to find the most likely state sequence, i.e. the one that maximizes the expected number of correct states. First we define the backward algorithm called β-pass.
Example (1996): HMM-based Handwritten Symbol Recognition • Input: a sequence of strokes captured during writing.A stroke is a sequence of (x,y)-coordinates correponding to pen positions. A stroke is writing from pen down to pen up. • Slant correction: try to find a near-vertical part in each stroke and rotate it the whole stroke so that the part should be vertical.
Normalization of Strokes • Normalization: determine the x-length of each stroke. Denote t10 the threshold under which 10 % of strokes are with respect to x-length. Denote x90 the threshold above which 10 % of strokes are with respect to x-length. Then compute the average of x-lengths of all strokes that are between the two thresholds, denote this by x’. • Perform the above operations with respect to y-length two; compute y’. • Then normalize all strokes to x’ and y’.
The Online Temporal Feature Vector • Introduce a hidden stroke between the pen-up position of a stroke and the pen-down position of the next stroke (we assume that the strokes are sequenced according to time). • The unified sequence of strokes and hidden strokes is resampled at equispaced points along the trajectory retaining the temporal order. For each point we store: the local position, the sine and cosine of the angle between the x-axis and the vector connecting the current point and the origin, and the fact that the point belomgs to a stroke or a hidden stroke constitute a feature vector.
HMM Topology • For each symbol Si of the alphabet {S1,S2,…,SK} an HMM λi is generated. The HMM is such that P(sj|si)=0 for states j<i or j>i+2. • The question is: how can we generate an HMM? The answer is given by the solution to Problem 3.
Solution of Problem 3 Now we want to adjust the model parameters to best fit the observations. The sizes N (number of states) and M (number of observations) are fixed but A, B and π are free, we only have to take care that they be row stochastic. For t=0,1,…,T-2 and i,j in {0,1,…,N-1} define the prob of being in state qi at t and transiting to state qj at t+1:
The Iteration • First we initialize λ=(A,B,π) with a best guess, or choose random values such that πi≈1/N, aij ≈1/N, bj(k) ≈1/M, π,A and B must be row stochastic. • Compute • Estimate the model λ=(A,B,π) • If P(O| λ) increases, GOTO 2.(the increase may be measured by a threshold, or the maximum number of iterations may be set)
Practical Considerations • Be aware of the fact that αt(i) tends to 0 as T increases. Therefore, realization of the above formulas may lead to underflow. • Details, and pseudocodes, may be found here:http://www.cs.sjsu.edu/faculty/stamp/RUA/HMM.pdf
Another example (2004): Writer Identification Using HMM Recognizers • Writer identification is the task of determining the author of a sample handwriting from a set of writers. • Writer verification is the task of determining if a given text has been written by a certain person. • If the text is predefined, it is text dependent verification, otherwise it is text independent verification. • Writer verification may be done online or offline. • It is generally believed that text independent verification is more difficult than the text dependent one.
For each writer, an individual HMM-based handwriting recognition system is trained using only data from that writer. Thus from n writers we get n different HMM’s. • Given an arbitrary line of text input, each HMM recognizer outputs some recognition with a recognition score. • It is assumed that • Correctly recognized words have a higher score than incorrectly recognized words • Recognition rate on input from a writer the system was trained on is higher than on input from other writers • The scores produced by the different HMM’s can be used to decide who has written the input text line.
After preprocessing (slant, skew, baseline location, height) a sliding window of one-pixel width is shifted from left to right • The features are: number of black pixels in the window, center of gravity, second order moment, position and contour direction of the upper- and lowermost pixels, number of black-to-white transitions in the window, distance between the upper- and lowermost pixels. • Normalization may lead to the reduction of individuality, on the other hand, it supports recognition which is important for the verification project • For each upper- and lowercase character an individual HMM is built.
Related Concepts • The Viterbi algorithm is a dynamic programming for finding the most likely sequence of hidden states – called the Viterbi path. • The Baum–Welch algorithm is used to find the unknown parameters of an HMM. It makes use of the forward-backward algorithm used above.
HMM-based Speech Recognition • Modern general-purpose speech recognition systems are generally based on Hidden Markov Models. Reason: Speech could be thought of as a Markov model. • For further reference consult Rabiner: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition • http://www.caip.rutgers.edu/~lrr/Reprints/tutorial%20on%20hmm%20and%20applications.pdf