Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability

Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7 S. Maarschalkerweerd & A. Tjhang

Overview last lecture • Hidden Markov Models • Different algorithms: • Viterbi • Forward • Backward S. Maarschalkerweerd & A. Tjhang

Overview today • Parameter estimation for HMMs • Baum-Welch algorithm • HMM model structure • More complex Markov chains • Numerical stability of HMM algorithms S. Maarschalkerweerd & A. Tjhang

Specifying a HMM model • Most difficult problem using HMMs is specifying the model • Design of the structure • Assignment of parameter values S. Maarschalkerweerd & A. Tjhang

Parameter estimation for HMMs • Estimate transition and emission probabilities akl and ek(b) • Two ways of learning: • Estimation when state sequence is known • Estimation when paths are unknown • Assume that we have a set of example sequences (training sequences x1, …xn) S. Maarschalkerweerd & A. Tjhang

Parameter estimation for HMMs • Assume that x1…xnindependent. • So P(x1,…,xn |  ) =  P(xj |  ) n j=1 Since log ab = log a + logb S. Maarschalkerweerd & A. Tjhang

Estimation when state sequence is known • Easier than estimation when paths unknown • Akl= number of transitions k to l in trainingdata + rkl • Ek(b) = number of emissions of b from k in training data + rk(b) S. Maarschalkerweerd & A. Tjhang

Estimation when paths are unknown • More complex than when paths are known • We can’t use maximum likelihood estimators • Instead, an iterative algorithm is used • Baum-Welch S. Maarschalkerweerd & A. Tjhang

The Baum-Welch algorithm • We don’t know real values of Akl and Ek(b) • Estimate Akl and Ek(b) • Update akl and ek(b) • Repeat with new model parameters akl and ek(b) S. Maarschalkerweerd & A. Tjhang

Forward value Backward value Baum-Welch algorithm S. Maarschalkerweerd & A. Tjhang

Baum-Welch algorithm • Now that we have estimated Akl and Ek(b), use maximum likelihood estimators to compute akl and ek(b) • We use these values to estimate Akl and Ek(b) in the next iteration • Continue doing this iteration until change is very small or max number of iterations is exceeded S. Maarschalkerweerd & A. Tjhang

Baum-Welch algorithm S. Maarschalkerweerd & A. Tjhang

Example S. Maarschalkerweerd & A. Tjhang

S. Maarschalkerweerd & A. Tjhang

Drawbacks • ML estimators • Vulnerable to overfitting if not enough data • Estimations can be undefined if never used in training set (use pseudocounts) • Baum-Welch • Local maximum instead of global maximum can be found, depending on starting values of parameters • This problem will be worse for large HMMs S. Maarschalkerweerd & A. Tjhang

Modelling of labelled sequences • Only -- and ++ are calculated • Better than using ML estimators, when many different classes are present S. Maarschalkerweerd & A. Tjhang

Specifying a HMM model • Most difficult problem using HMMs is specifying the model • Design of the structure • Assignment of parameter values S. Maarschalkerweerd & A. Tjhang

Design of the structure • Design: how to connect states by transitions • A good HMM is based on the knowledge about the problem under investigation • Local maxima are biggest disadvantage in models that are fully connected • After deleting a transition from model Baum-Welch will still work: set transition probability to zero S. Maarschalkerweerd & A. Tjhang

p 1-p Example 1 • Geometric distribution S. Maarschalkerweerd & A. Tjhang

Example 2 • Model distribution of length between 2 and 10 S. Maarschalkerweerd & A. Tjhang

Example 3 S. Maarschalkerweerd & A. Tjhang

B  Silent states • States that do not emit symbols • Also in other places in HMM S. Maarschalkerweerd & A. Tjhang

Silent states Example S. Maarschalkerweerd & A. Tjhang

Silent states • Advantage: • Less estimations of transition probabilities needed • Drawback: • Limits the possibilities of defining a model S. Maarschalkerweerd & A. Tjhang

More complex Markov chains • So far, we assumed that probability of a symbol in a sequence depends only on the probability of the previous symbol • More complex • High order Markov chains • Inhomogeneous Markov chains S. Maarschalkerweerd & A. Tjhang

P(AB|B) = P(A|B) High order Markov chains • An nth order Markov process • Probability of a symbol in a sequence depends on the probability of the previous n symbols • An nth order Markov chain over some alphabet A is equivalent to a first order Markov chain over the alphabet Anof n-tuples, because S. Maarschalkerweerd & A. Tjhang

Example • A second order Markov chain with two different symbols {A,B} • This can be translated into a first order Markov chain of 2-tuples {AA, AB, BA, BB} Sometimes the framework of high order model is convenient S. Maarschalkerweerd & A. Tjhang

Finding prokaryotic genes • Gene candidates in DNA: -sequence of triplets of nucleotides: startcodon nr. of non-stopcodons stopcodon -open reading frame (ORF) • An ORF can be either a gene or a non-coding ORF (NORF) S. Maarschalkerweerd & A. Tjhang

Finding prokaryotic genes • Experiment: • DNA from bacterium E.coli • Dataset contains 1100 genes (900 used for training, 200 for testing) • Two models: • Normal model with first order Markov chains • Also first order Markov chains, but codons instead of nucleotides are used as symbol S. Maarschalkerweerd & A. Tjhang

Finding prokaryotic genes • Outcomes: S. Maarschalkerweerd & A. Tjhang

CAT GCA P(C)aCA aAT aTG aGC aCA P(C)a2CA a3AT a1TG a2GC a3CA Inhomogeneous Markov chains • Using the position information in the codon • Three models for position 1, 2 and 3 1 2 3 1 2 3 Homogeneous Inhomogeneous S. Maarschalkerweerd & A. Tjhang

Numerical Stability of HMM algorithms • Multiplying many probabilities can cause numerical problems: • Underflow errors • Wrong numbers are calculated • Solutions: • Log transformation • Scaling of probabilities S. Maarschalkerweerd & A. Tjhang

The log transformation • Compute log probabilities • Log 10-100000 = -100000 • Underflow problem is essentially solved • Sum operation is often faster than product operation • In the Viterbi algorithm: S. Maarschalkerweerd & A. Tjhang

Scaling of probabilities • Scale f and b variables • Forward variable: • For each i a scaling variable si is defined • New f variables are defined: • New forward recursion: S. Maarschalkerweerd & A. Tjhang

Scaling of probabilities • Backward variable • Scaling has to be with same numbers as forward variable • New backward recursion: • This normally works well, however underflow errors can still occur in models with many silent states (chapter 5) S. Maarschalkerweerd & A. Tjhang

Summary • Hidden Markov Models • Parameter estimation • State sequence known • State sequence unknown • Model structure • Silent states • More complex Markov chains • Numerical stability S. Maarschalkerweerd & A. Tjhang

Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability