250 likes | 433 Views
Class Hidden Markov Models. Markov Chains. First order Markov process set of states, S i transition probabilities, a ij the probability of each state is dependent only on the previous state trivial constraints (transitions must be true probabilities). a AC. A. C. a AA. a AT. a AG. G.
E N D
Markov Chains • First order Markov process • set of states, Si • transition probabilities, aij • the probability of each state is dependent only on the previous state • trivial constraints (transitions must be true probabilities) aAC A C aAA aAT aAG G T
Markov Chains • Probabilities • for a sequence of observed states O = ( s0, s1, …sT) in general for the Markov chain
Markov Chains • What is it good for? • Model comparison – for a given model (set of transition probabilities), what is the probability of seeing the observed data • The best model (maximum likelihood model) can be found by simply counting the observed transitions in a sufficiently large set of data • Generated random sequence according to specific background models • Birth /death processes – probability of extinction • Branching processes (trees) • Markov chain Monte Carlo (MCMC)
Hidden Markov Model • More complicated model • let each state emit a character, ck, according to a set of emission probabilities, bik, where bik = P( ck|Si) • For a set of observed characters O = (o0, …, oT) and states (so, ..., sT)
Hidden Markov Model • Two state model for sequences • maybe S0 is exon and S1 is intron • what if you have a set of observed characters, but you want to know what the state is (or the most likely state) • the state information is the hidden part of the hidden Markov model S0 P(A) = 0.148 P(C) = 0.334 P(G) = 0.365 P(T) = 0.154 S1 P(A) = 0.262 P(C) = 0.247 P(G) = 0.238 P(T) = 0.253 CGCTTAGCTATCGCATTCGGCTACGATCTAGCTACGTAGCTATGCCGATGCATTATTACGCGATCTCGATCG S0 S0 S0 S1 S1 S1 C G C T T A
Hidden Markov Model • Rabiner's three basic problems • what is the probability of an observed sequence, O, given a model?(evaluation) • joint probability of observations and state sequence – P(O,S|θ) • as useful as Markov Chain • what is the optimal sequence of states that "explains" the observed data? (decoding) • optimality criterion? • how can one adjust the model parameters to maximize the probability of the observed data given the model(learning)
Hidden Markov Model • Evaluation • for a model with parameters , what is or more simply • model parametersQ a set of states A the state transition matrix B the emission probabilitiesω an initial probability distribution • observations
Hidden Markov Model • Evaluation • Assume the observations are independentThe probability of a a particular state sequence (or path), π, • There are NT state sequences and O(T) calculations so the brute force complexity is O(TNT)
Hidden Markov Model • Forward algorithm • is the probability of observing the partial sequence given that state with • complexity O(N2T)
Hidden Markov Model • Backward algorithm • almost the same as forward algorithm • is the probability of observing the partial sequence given that state with initial condition
Hidden Markov Model • Problem 2: Decoding • what is the optimal sequence of states that "explains" the observed data? • optimality criteria • the path π that maximizes the correct number of individual states, i.e., the path where the states are individually most likely • the most probable single path, maximize or equivalently Viterbi algorithm • The optimal path is the path that maximizes let be the highest probability path ending in state i • keep track of argument that maximizes at each position t
Hidden Markov Model • Problem 3: learning • find the parameters, , that maximizes • No analytical solution requires iterative solution (Baum-Welch algorithm) • initial model , repeat • compute parameters based on and observations O • if , stop • else, accept , goto 2. • With Baum-Welch algorithm likelihood is proven to be greater or equal at each step
Hidden Markov Model • Training • need to update the transition probabilities. the probability of being in state i at time t and state j at time t+1 is • the probability of being in state i at time t, given the observed sequence O is or in terms of ,
Hidden Markov Model • Training • Derived quantitiesexpected number of times state i is used expected number of transitions from state ito state j • BW parameter updates • probability of starting in state I • where
Class Hidden Markov Model • Why? • Basic HMM has no way to include the known states of classified training data into the optimization! Generally trained in an unsupervised learning approach. • HMMs are not discriminitive models • How do you discriminate? • Separate training data into classes and train • for any set of observations, compare probabilities of observed data given each model and choose the best model (likelihood ratio, for example)
Class Hidden Markov Model • Hidden Markov Models for Labeled SequencesAnders Krogh, ISMB 1994 • Assume you have a sequence of observed symbols, (s0, s1, …, sT), and a sequence of (observed) classes (c0, c1, …, cT) • Each state at time t, emits an observed symbol and an observed class (label). The classes are treated similarly to the emission probabilitiesprobability of emitting characteraand class label x • Most often a state will emit a single class, i.e., for two states x, and y, = 1 and = 0 = 0 and = 1 • What is the probability of the class labels given this model • is the basic HMM calculation described earlier and solved using the forward/backward algorithm
Class Hidden Markov Model • in the general case, multiple paths through the model can give the same labeling. in the special case where each state emits only a single class label, you can use Viterbi to optimize the most probable class labeling of sequence s • Maximum likelihood parameter estimates
Class Hidden Markov Model • Calculating gradient -
Class Hidden Markov Model • Calculating gradient - • basically the same as • the overall likelihood and optimization
Class Hidden Markov Model • Overall gradient of the likelihood • Intuitively • Permissible paths - the state path given in φ • mk is the number of times θk is used in permissible paths • Slightly modified forward/backward • αt and βt are zero except along the permissible paths • nk is the number of times θk is used in all possible • Use standard forward/backward algorithm • Straightforward application of EM (Baum-Welch) is likely to give negative probabilities (eq. 16) • Krogh proposes an iterative method, including multiple training sequences, μ
References • To learn about HMMs • Rabiner L. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257-286, 1989. • This is the ultimate origin, everything comes back to here including basic nomenclature and definition of problems • Durbin R, Eddy S, Krogh A, Mitchison G. Biological Sequence analysis, probabilistic models of proteins and nucleic acids. Cambridge University Press, 1998. • Detailed applications to sequence alignment, phylogenetic trees, RNA folding, and other biological models • Mann T. Numerically stable hidden Markov model implementation. http://bozeman.genome.washington.edu/compbio/mbt599_2006/hmm_scaling_revised.pdf. 2006. • Detailed pseudocode for implementation of HMM calculation in log space to avoid numerical underflow. Very useful if you are writing your own code. • De Fonzo V, Aluffi-Pentini F, Parisi V. Hidden Markov Models in Bioinformatics. Current Bioinformatics 2, 49-61, 2007. • A more recent review that gives a clear outline of the algorithms and lots of references to more recent applications in computational biology. • Kanungo T. Hidden Markov Models. http://www.kanungo.com/software/hmmtut.pdf. • Kanungo T. UMDHMM: hidden Markov model toolkit. in "Extended finite state models of language", Kornai A (ed). Cambridge University Press. Software download http://www.kanungo.com/software/software.html#umdhmm 1999.