1 / 47

Hidden Markov Models in Practice

Hidden Markov Models in Practice. Based on materials from Jacky Birrell, Tomáš Kolský. Outline. Markov Chains Extension to HMM Urn and Ball Modelling BLAST Similarity Matches Information Sources. Markov Chain Example. Based on the weather today what will it be tomorrow ?

krysta
Download Presentation

Hidden Markov Models in Practice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Models in Practice Based on materials from Jacky Birrell, Tomáš Kolský

  2. Outline • Markov Chains • Extension to HMM • Urn and Ball • Modelling BLAST Similarity Matches • Information Sources

  3. Markov Chain Example • Based on the weather today what will it be tomorrow? • Assuming only four possible weather states • Sunny • Cloudy • Rainy • Snowing

  4. Markov Chain Structure • Each state is an observable event • At each time interval the state changes to another or same state (qt {S1, S2, S3, S4}) State S1 State S2 (Cloudy) (Sunny) (Rainy) (Snowing) State S3 State S4

  5. Markov Chain Structure • Not necessarily all changes are possible • Can’t change between sunny and snowing • Can’t snow after it has rained Sunny Cloudy Rainy Snowy

  6. Markov Chain Assumptions • The Markov assumption • Next state only dependents on current state • Tomorrows weather depends only on today's • The stationarity assumption • Transition probabilities are independent of the timethe transition takes place

  7. Markov Chain Transition Probabilities • Transition probability matrix: • aij = P(qt + 1 = Sj | qt = Si)

  8. Markov Chain Transition Probabilities • Probabilities for tomorrow’s weather based on today’s weather

  9. Extension to HMM “Doubly embeddedstochastic process with an underlying … process that is not observable…, but can only be observed through another” Rabiner 89’ • States hidden • Observable events linked to states • Each state has observation probabilities to determine the observable event

  10. HMM Weather Example • Predicting weather based on today’s • BUTvisible weather determined by unseen meteorological conditions • Classified as: • Good • Variable • Bad

  11. HMM Model – Markov States • aij = P(qt + 1 = Sj | qt = Si) • qt is the current state Variable Good Bad

  12. HMM Model – Markov States • States hidden • e.g. stuck in a windowless room Variable Good Bad

  13. HMM Model – Linked Events Sunny, Cloudy, Rainy, Snowy • bj(k) = P(vt = Ok | qt = Sj) • vt is the current observed event • bj(k) = P(get observation vt given that state is Sj) Sunny, Cloudy, Rainy, Snowy Sunny, Cloudy, Rainy, Snowy

  14. HMM Observation Probabilities • Observation probability matrix:

  15. HMM Observation Probabilities • Observation probability matrix:

  16. HMM Assumptions: dependence • The Markov Assumption • Next state only dependenton current state • The stationarity assumption • Transition probabilities independent of the time the transition takes place • The output independence assumption • Observations independent of previous observations

  17. Markov HMM seen and unseen sequences • Markov has an observed state sequence • S1, S2, S3, S4, S5, S6, … • HMM has an unseen state sequence • S1, S2, S3, S4, S5, S6, … • And an observed event sequence • O1, O2, O3, O4, O5, O6, … • HMM unseen state sequence can only be implied from the observed event sequence

  18. HMM Notation • N states, S1 to SN, with qt at time t • M observations, O1 to OM,with vtat time t • A = {aij} is the transition probabilities • aij = P(qt + 1 = Sj | qt = Si) • B = {bj(k)} is the observation probabilities • bj(k) = P(vt = Ok | qt = Sj) • π = {πi} is the initial state probabilities • πi = P(q1 = Si)

  19. Urn & Ball – An Example • N large urns with Mcolored balls in each • Urns are the states and balls are the observable events • Probability matrix for changing between urns • Each urn has observation probabilities to determine which ball is chosen You can think that there is one person A who just sees the balls and another person B who knows the rules and draws the balls. What person B can learn from the sequence of balls seen?

  20. Urn & Ball – An Example P(red) = b1(1) P(red) = b2(1) P(red) = b3(1) P(blue) = b1(2) P(blue) = b2(2) P(blue) = b3(2) P(green) = b1(3) P(green) = b2(3) P(green) = b3(3) P(purple) = b1(4) P(purple) = b2(4) P(purple) = b3(4) … … … Urn 1Urn 2Urn 3 • N large urns with Mcolored balls in each • Urns are the states and balls are the observable events • Probability matrix for changing between urns • Each urn has observation probabilities to determine which ball is chosen

  21. Urn & Ball – An Example • Initial probability to determine first urn • At each time interval: • Transition probability determines the urn • Observation probability determines the ball • Ball color added to observed event sequence and returned to urn • Transition probability dependent on previous urn

  22. Example Sequence Creation Using Urn Ball • From , 1st urn = Urn 1 • Using b1(k), 1st ball = Red • From a1j, 2nd urn = Urn 3 etc… • Get observation sequence • Red, Blue, Purple, Yellow, Blue, Blue • From state sequence • Urn1, Urn 3, Urn 3, Urn 1, Urn, 2, Urn 1 Urn 1Urn 2Urn 3 P(red) = b1(1) P(red) = b2(1) P(red) = b3(1) P(blue) = b1(2) P(blue) = b2(2) P(blue) = b3(2) P(green) = b1(3) P(green) = b2(3) P(green) = b3(3) P(purple) = b1(4) P(purple) = b2(4) P(purple) = b3(4) … … … Typical Applications of this Model: • Generate sequences • Guess state sequence (state) by observing balls sequence

  23. EXAMPLE: HMM for Detecting ‘Clumpiness’ • Model distribution of similarity matches seen in BLAST TTEKYKGGSSTLVVGKQLLLENYPLGKSLKNPYLRALSTKLNGGLRSIT T Y+ GS+TLV+ + Y G S+++ AL++KL + + TVRLYRDGSNTLVLSGEFHDSTYSHGSSVQSVIRTALTSKLPNAVNGLY • Converted to binary: • 1000110111111100100000100101111000011111100010010

  24. Clumpy 1000110111111100100000100101111000011111100010010 1101111111 111111 • Clumpy State = more likely to get a 1 but still gets a few 0s

  25. Unclumpy 1000110111111100100000100101111000011111100010010 0010000010010 00010010 • Unclumpy state = more likely to get dispersed 1s through mainly 0s

  26. BLAST Similarity HMM aUU aCC P(1) = bC(1) P(1) = bU(1) P(0) = bC(0) P(0) = bU(0) aCU S1 Clumpy S2 Unclumpy aUC

  27. Adjusting Parameters • Differing levels of the probabilities will produce differing levels of clumpiness • High probability of staying in a clump and producing a 1 = observation sequence of lots of 1s: 1111111111111011111011110111111111011

  28. Various tools for DNA and speech based on HMM exist on WWW

  29. Application • Want to rank BLAST results by clumpiness • Need measures of clumpiness • Test measures using HMM model • Creates a synthetic data-set progressing from clumpy to unclumpy sequences • Could also use HMM as a measure • This is a Circular problem: model from measure and measure from model

  30. Circular Problem Model Measure

  31. Applications in speech recognition • Hidden Markov Model of speech • State transitions and alignment probabilities • Searching all possible alignments • Dynamic Programming • Viterbi Alignment • Isolated Word Recognition

  32. Isolated Word Recognizer • Preprocessor • Identifies start/end of each word • Converts speech into a sequence of feature vectors at intervals of around 10 ms. • Hypothesis Generator • Try each possible word in turn • Language Model • Estimate the probability of the word • Can depend on the preceding words • Acoustic Model • Calculate the probability density that an observed sequence of feature vectors corresponds to the chosen word

  33. Speech Production Model • Each phoneme in a word corresponds to number of model states . • Each model state represents a distinct sound with its own acoustic spectrum. • For each state, we store the mean value and variancefor each of F features. • When saying a word, the speaker stays in each state for one or more frames and then goes on to the next state. • The time in each state will vary according to how fast he/she is speaking. • Some speech sounds last longer than others

  34. State Transitions Average duration of 1/p frames

  35. Alignment Probabilities

  36. Hidden Markov Model • To calculate the probability density (pd) that observation matches a particular word with a given alignment, we multiply together: • the probability of the alignment • the output pds of each frame • Try this for every possiblealignment of every possibleword sequence and choose the one with highest probability. • Hidden Markov Model => the correct alignment is hidden: we can’t observe it directly. • We talk of the probability density ofthe model “generating” the observed frames sequence.

  37. Hidden Markov Model Parameters • A Hidden Markov Model for a word must specify the following parameters for state s: • The mean and variance for each of the F elements of the parameter vector: µs and s 2. • These allow us to calculate ds(x): the output probability density of input frame x in state s. • The transition probabilitiesas,jto every possible successor state. • as,jis often zero for all j except j=s and j=s+1 it is then called a left-to-right, no skips model. • For a Hidden Markov Model with S states we therefore have around (2F+1)S parameters. • A typical word might have S=15 and F=39 giving 1200 parameters in all.

  38. How to calculate : Minimum Cost Paths • Suppose we want to find the cheapest path through a toll road system: Start ->¨ Finish • In each circle we will enter the lowest cost of a journey from Start • Begin by putting 0 in the Start circle • Now put 2, 3, 4 in the 2nd column circles and mark all three segments in bold. • This shows the lowest cost to each of these circles and the route by which you go. • Put 6, 3, 6 in the 3rd column circles and, in each case, mark the best segment from the previous column in bold. • Put 5, 4 and 6 in the 4th column. • Put 7 in the Finish circle. • We can trace bold segments backwards from Finish to find the best overall path. Dynamic programming

  39. Dynamic Programming • This technique for finding the minimum cost path through a graph is known as dynamic programming. • Three conditions must be true: • 1. All paths through the graph must go from left to right. • 2. The cost of each segment of a path must be fixed in advance: it must not depend on which of the other segments are included in the route. • 3. The total cost of a path must just be the sum of each of its segment costs. • Dynamic programming is guaranteed to find the path with minimum cost. • We can also find the maximum cost path in the same way: • in this case the “costs” are usually called “utilities” instead. • We can use Dynamic Programming to find the “best” alignment of a sequence of feature vectors with a word’s model. • “best” means the alignment with the highest production probability density.

  40. Alignment Graph • We can draw an alignment graph (called also alignment lattice): • The columns correspond to speech frames • The rows correspond to model states • Each possible path from Start to Finish corresponds to an alignment of the speech frames with the model. • All valid paths pass through each column in turn. • In going to the next column, a path is restricted to the state transitions allowed by the state diagram • In the above example a path must either remain in the same state or else go on to the next state.

  41. Segment Utilities This equals pd that the model generates the observed sequence of feature vectors with this particular alignment

  42. Dynamic Programming Step

  43. Viterbi Alignment

  44. Isolated Word Recognition • Requires the speaker to insert a gapbetween each words • Used for budget systems with little CPU power • Recognition: • Extract a word-long segment of speech, s, from the input signal. Convert it into a sequence of frames. • Calculate pr(w|s) = pr(w) × pr(s|w) for each possible word, w, in the recognition vocabulary. • pr(w) is the prior probability of the word: get this from word frequencies or word-pair frequencies (e.g. “minister” often follows “prime”). • pr(s|w) is obtained by using the Viterbi alignment algorithm to find the log probability density of the best alignment of s with the model for w. • Choose the word with the highest probability • Need to create a separate Hidden Markov model for each word in the vocabulary.

  45. Applications of HMM • pattern recognition • genes • Human emotions, persons, gestures, • Spoken natural language recognition, many levels • Hand-written natural language recognition • Text generation based on writer’s style • Recognition of authorship

  46. Literature • University Leeds (applets): • http://www.scs.leeds.ac.uk/scs-only/teaching-materials/HiddenMarkovModels/html_dev/main.html • Tapas Kanungo (writing recognition)http://www.cfar.umd.edu/~kanungo/

  47. Literature • Rabiner, L. R. 1989. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, Vol. 77, No. 2, pp. 257-286. • Krogh et al. 1994. J. Mol. Biol. Vol 235, pp.1501-1531. • Krogh, A. 1998. Salzberg et al In Computational Methods in Molecular Biology. Chap 4. • Warakagoda, N. D et al MSc thesis online http://jedlik.phy.bme.hu/~gerjanos/HMM/hoved.html

More Related