480 likes | 701 Views
Hidden Markov Models in Practice. Based on materials from Jacky Birrell, Tomáš Kolský. Outline. Markov Chains Extension to HMM Urn and Ball Modelling BLAST Similarity Matches Information Sources. Markov Chain Example. Based on the weather today what will it be tomorrow ?
E N D
Hidden Markov Models in Practice Based on materials from Jacky Birrell, Tomáš Kolský
Outline • Markov Chains • Extension to HMM • Urn and Ball • Modelling BLAST Similarity Matches • Information Sources
Markov Chain Example • Based on the weather today what will it be tomorrow? • Assuming only four possible weather states • Sunny • Cloudy • Rainy • Snowing
Markov Chain Structure • Each state is an observable event • At each time interval the state changes to another or same state (qt {S1, S2, S3, S4}) State S1 State S2 (Cloudy) (Sunny) (Rainy) (Snowing) State S3 State S4
Markov Chain Structure • Not necessarily all changes are possible • Can’t change between sunny and snowing • Can’t snow after it has rained Sunny Cloudy Rainy Snowy
Markov Chain Assumptions • The Markov assumption • Next state only dependents on current state • Tomorrows weather depends only on today's • The stationarity assumption • Transition probabilities are independent of the timethe transition takes place
Markov Chain Transition Probabilities • Transition probability matrix: • aij = P(qt + 1 = Sj | qt = Si)
Markov Chain Transition Probabilities • Probabilities for tomorrow’s weather based on today’s weather
Extension to HMM “Doubly embeddedstochastic process with an underlying … process that is not observable…, but can only be observed through another” Rabiner 89’ • States hidden • Observable events linked to states • Each state has observation probabilities to determine the observable event
HMM Weather Example • Predicting weather based on today’s • BUTvisible weather determined by unseen meteorological conditions • Classified as: • Good • Variable • Bad
HMM Model – Markov States • aij = P(qt + 1 = Sj | qt = Si) • qt is the current state Variable Good Bad
HMM Model – Markov States • States hidden • e.g. stuck in a windowless room Variable Good Bad
HMM Model – Linked Events Sunny, Cloudy, Rainy, Snowy • bj(k) = P(vt = Ok | qt = Sj) • vt is the current observed event • bj(k) = P(get observation vt given that state is Sj) Sunny, Cloudy, Rainy, Snowy Sunny, Cloudy, Rainy, Snowy
HMM Observation Probabilities • Observation probability matrix:
HMM Observation Probabilities • Observation probability matrix:
HMM Assumptions: dependence • The Markov Assumption • Next state only dependenton current state • The stationarity assumption • Transition probabilities independent of the time the transition takes place • The output independence assumption • Observations independent of previous observations
Markov HMM seen and unseen sequences • Markov has an observed state sequence • S1, S2, S3, S4, S5, S6, … • HMM has an unseen state sequence • S1, S2, S3, S4, S5, S6, … • And an observed event sequence • O1, O2, O3, O4, O5, O6, … • HMM unseen state sequence can only be implied from the observed event sequence
HMM Notation • N states, S1 to SN, with qt at time t • M observations, O1 to OM,with vtat time t • A = {aij} is the transition probabilities • aij = P(qt + 1 = Sj | qt = Si) • B = {bj(k)} is the observation probabilities • bj(k) = P(vt = Ok | qt = Sj) • π = {πi} is the initial state probabilities • πi = P(q1 = Si)
Urn & Ball – An Example • N large urns with Mcolored balls in each • Urns are the states and balls are the observable events • Probability matrix for changing between urns • Each urn has observation probabilities to determine which ball is chosen You can think that there is one person A who just sees the balls and another person B who knows the rules and draws the balls. What person B can learn from the sequence of balls seen?
Urn & Ball – An Example P(red) = b1(1) P(red) = b2(1) P(red) = b3(1) P(blue) = b1(2) P(blue) = b2(2) P(blue) = b3(2) P(green) = b1(3) P(green) = b2(3) P(green) = b3(3) P(purple) = b1(4) P(purple) = b2(4) P(purple) = b3(4) … … … Urn 1Urn 2Urn 3 • N large urns with Mcolored balls in each • Urns are the states and balls are the observable events • Probability matrix for changing between urns • Each urn has observation probabilities to determine which ball is chosen
Urn & Ball – An Example • Initial probability to determine first urn • At each time interval: • Transition probability determines the urn • Observation probability determines the ball • Ball color added to observed event sequence and returned to urn • Transition probability dependent on previous urn
Example Sequence Creation Using Urn Ball • From , 1st urn = Urn 1 • Using b1(k), 1st ball = Red • From a1j, 2nd urn = Urn 3 etc… • Get observation sequence • Red, Blue, Purple, Yellow, Blue, Blue • From state sequence • Urn1, Urn 3, Urn 3, Urn 1, Urn, 2, Urn 1 Urn 1Urn 2Urn 3 P(red) = b1(1) P(red) = b2(1) P(red) = b3(1) P(blue) = b1(2) P(blue) = b2(2) P(blue) = b3(2) P(green) = b1(3) P(green) = b2(3) P(green) = b3(3) P(purple) = b1(4) P(purple) = b2(4) P(purple) = b3(4) … … … Typical Applications of this Model: • Generate sequences • Guess state sequence (state) by observing balls sequence
EXAMPLE: HMM for Detecting ‘Clumpiness’ • Model distribution of similarity matches seen in BLAST TTEKYKGGSSTLVVGKQLLLENYPLGKSLKNPYLRALSTKLNGGLRSIT T Y+ GS+TLV+ + Y G S+++ AL++KL + + TVRLYRDGSNTLVLSGEFHDSTYSHGSSVQSVIRTALTSKLPNAVNGLY • Converted to binary: • 1000110111111100100000100101111000011111100010010
Clumpy 1000110111111100100000100101111000011111100010010 1101111111 111111 • Clumpy State = more likely to get a 1 but still gets a few 0s
Unclumpy 1000110111111100100000100101111000011111100010010 0010000010010 00010010 • Unclumpy state = more likely to get dispersed 1s through mainly 0s
BLAST Similarity HMM aUU aCC P(1) = bC(1) P(1) = bU(1) P(0) = bC(0) P(0) = bU(0) aCU S1 Clumpy S2 Unclumpy aUC
Adjusting Parameters • Differing levels of the probabilities will produce differing levels of clumpiness • High probability of staying in a clump and producing a 1 = observation sequence of lots of 1s: 1111111111111011111011110111111111011
Application • Want to rank BLAST results by clumpiness • Need measures of clumpiness • Test measures using HMM model • Creates a synthetic data-set progressing from clumpy to unclumpy sequences • Could also use HMM as a measure • This is a Circular problem: model from measure and measure from model
Circular Problem Model Measure
Applications in speech recognition • Hidden Markov Model of speech • State transitions and alignment probabilities • Searching all possible alignments • Dynamic Programming • Viterbi Alignment • Isolated Word Recognition
Isolated Word Recognizer • Preprocessor • Identifies start/end of each word • Converts speech into a sequence of feature vectors at intervals of around 10 ms. • Hypothesis Generator • Try each possible word in turn • Language Model • Estimate the probability of the word • Can depend on the preceding words • Acoustic Model • Calculate the probability density that an observed sequence of feature vectors corresponds to the chosen word
Speech Production Model • Each phoneme in a word corresponds to number of model states . • Each model state represents a distinct sound with its own acoustic spectrum. • For each state, we store the mean value and variancefor each of F features. • When saying a word, the speaker stays in each state for one or more frames and then goes on to the next state. • The time in each state will vary according to how fast he/she is speaking. • Some speech sounds last longer than others
State Transitions Average duration of 1/p frames
Hidden Markov Model • To calculate the probability density (pd) that observation matches a particular word with a given alignment, we multiply together: • the probability of the alignment • the output pds of each frame • Try this for every possiblealignment of every possibleword sequence and choose the one with highest probability. • Hidden Markov Model => the correct alignment is hidden: we can’t observe it directly. • We talk of the probability density ofthe model “generating” the observed frames sequence.
Hidden Markov Model Parameters • A Hidden Markov Model for a word must specify the following parameters for state s: • The mean and variance for each of the F elements of the parameter vector: µs and s 2. • These allow us to calculate ds(x): the output probability density of input frame x in state s. • The transition probabilitiesas,jto every possible successor state. • as,jis often zero for all j except j=s and j=s+1 it is then called a left-to-right, no skips model. • For a Hidden Markov Model with S states we therefore have around (2F+1)S parameters. • A typical word might have S=15 and F=39 giving 1200 parameters in all.
How to calculate : Minimum Cost Paths • Suppose we want to find the cheapest path through a toll road system: Start ->¨ Finish • In each circle we will enter the lowest cost of a journey from Start • Begin by putting 0 in the Start circle • Now put 2, 3, 4 in the 2nd column circles and mark all three segments in bold. • This shows the lowest cost to each of these circles and the route by which you go. • Put 6, 3, 6 in the 3rd column circles and, in each case, mark the best segment from the previous column in bold. • Put 5, 4 and 6 in the 4th column. • Put 7 in the Finish circle. • We can trace bold segments backwards from Finish to find the best overall path. Dynamic programming
Dynamic Programming • This technique for finding the minimum cost path through a graph is known as dynamic programming. • Three conditions must be true: • 1. All paths through the graph must go from left to right. • 2. The cost of each segment of a path must be fixed in advance: it must not depend on which of the other segments are included in the route. • 3. The total cost of a path must just be the sum of each of its segment costs. • Dynamic programming is guaranteed to find the path with minimum cost. • We can also find the maximum cost path in the same way: • in this case the “costs” are usually called “utilities” instead. • We can use Dynamic Programming to find the “best” alignment of a sequence of feature vectors with a word’s model. • “best” means the alignment with the highest production probability density.
Alignment Graph • We can draw an alignment graph (called also alignment lattice): • The columns correspond to speech frames • The rows correspond to model states • Each possible path from Start to Finish corresponds to an alignment of the speech frames with the model. • All valid paths pass through each column in turn. • In going to the next column, a path is restricted to the state transitions allowed by the state diagram • In the above example a path must either remain in the same state or else go on to the next state.
Segment Utilities This equals pd that the model generates the observed sequence of feature vectors with this particular alignment
Isolated Word Recognition • Requires the speaker to insert a gapbetween each words • Used for budget systems with little CPU power • Recognition: • Extract a word-long segment of speech, s, from the input signal. Convert it into a sequence of frames. • Calculate pr(w|s) = pr(w) × pr(s|w) for each possible word, w, in the recognition vocabulary. • pr(w) is the prior probability of the word: get this from word frequencies or word-pair frequencies (e.g. “minister” often follows “prime”). • pr(s|w) is obtained by using the Viterbi alignment algorithm to find the log probability density of the best alignment of s with the model for w. • Choose the word with the highest probability • Need to create a separate Hidden Markov model for each word in the vocabulary.
Applications of HMM • pattern recognition • genes • Human emotions, persons, gestures, • Spoken natural language recognition, many levels • Hand-written natural language recognition • Text generation based on writer’s style • Recognition of authorship
Literature • University Leeds (applets): • http://www.scs.leeds.ac.uk/scs-only/teaching-materials/HiddenMarkovModels/html_dev/main.html • Tapas Kanungo (writing recognition)http://www.cfar.umd.edu/~kanungo/
Literature • Rabiner, L. R. 1989. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, Vol. 77, No. 2, pp. 257-286. • Krogh et al. 1994. J. Mol. Biol. Vol 235, pp.1501-1531. • Krogh, A. 1998. Salzberg et al In Computational Methods in Molecular Biology. Chap 4. • Warakagoda, N. D et al MSc thesis online http://jedlik.phy.bme.hu/~gerjanos/HMM/hoved.html