1 / 37

Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability

Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability. Chapter 3.3-3.7. Overview last lecture. Hidden Markov Models Different algorithms: Viterbi Forward Backward. Overview today. Parameter estimation for HMMs Baum-Welch algorithm HMM model structure

ayala
Download Presentation

Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7 S. Maarschalkerweerd & A. Tjhang

  2. Overview last lecture • Hidden Markov Models • Different algorithms: • Viterbi • Forward • Backward S. Maarschalkerweerd & A. Tjhang

  3. Overview today • Parameter estimation for HMMs • Baum-Welch algorithm • HMM model structure • More complex Markov chains • Numerical stability of HMM algorithms S. Maarschalkerweerd & A. Tjhang

  4. Specifying a HMM model • Most difficult problem using HMMs is specifying the model • Design of the structure • Assignment of parameter values S. Maarschalkerweerd & A. Tjhang

  5. Specifying a HMM model • Most difficult problem using HMMs is specifying the model • Design of the structure • Assignment of parameter values S. Maarschalkerweerd & A. Tjhang

  6. Parameter estimation for HMMs • Estimate transition and emission probabilities akl and ek(b) • Two ways of learning: • Estimation when state sequence is known • Estimation when paths are unknown • Assume that we have a set of example sequences (training sequences x1, …xn) S. Maarschalkerweerd & A. Tjhang

  7. Parameter estimation for HMMs • Assume that x1…xnindependent. • So P(x1,…,xn |  ) =  P(xj |  ) n j=1 Since log ab = log a + logb S. Maarschalkerweerd & A. Tjhang

  8. Estimation when state sequence is known • Easier than estimation when paths unknown • Akl= number of transitions k to l in trainingdata + rkl • Ek(b) = number of emissions of b from k in training data + rk(b) S. Maarschalkerweerd & A. Tjhang

  9. Estimation when paths are unknown • More complex than when paths are known • We can’t use maximum likelihood estimators • Instead, an iterative algorithm is used • Baum-Welch S. Maarschalkerweerd & A. Tjhang

  10. The Baum-Welch algorithm • We don’t know real values of Akl and Ek(b) • Estimate Akl and Ek(b) • Update akl and ek(b) • Repeat with new model parameters akl and ek(b) S. Maarschalkerweerd & A. Tjhang

  11. Forward value Backward value Baum-Welch algorithm S. Maarschalkerweerd & A. Tjhang

  12. Baum-Welch algorithm • Now that we have estimated Akl and Ek(b), use maximum likelihood estimators to compute akl and ek(b) • We use these values to estimate Akl and Ek(b) in the next iteration • Continue doing this iteration until change is very small or max number of iterations is exceeded S. Maarschalkerweerd & A. Tjhang

  13. Baum-Welch algorithm S. Maarschalkerweerd & A. Tjhang

  14. Example S. Maarschalkerweerd & A. Tjhang

  15. S. Maarschalkerweerd & A. Tjhang

  16. Drawbacks • ML estimators • Vulnerable to overfitting if not enough data • Estimations can be undefined if never used in training set (use pseudocounts) • Baum-Welch • Local maximum instead of global maximum can be found, depending on starting values of parameters • This problem will be worse for large HMMs S. Maarschalkerweerd & A. Tjhang

  17. Modelling of labelled sequences • Only -- and ++ are calculated • Better than using ML estimators, when many different classes are present S. Maarschalkerweerd & A. Tjhang

  18. Specifying a HMM model • Most difficult problem using HMMs is specifying the model • Design of the structure • Assignment of parameter values S. Maarschalkerweerd & A. Tjhang

  19. Design of the structure • Design: how to connect states by transitions • A good HMM is based on the knowledge about the problem under investigation • Local maxima are biggest disadvantage in models that are fully connected • After deleting a transition from model Baum-Welch will still work: set transition probability to zero S. Maarschalkerweerd & A. Tjhang

  20. p 1-p Example 1 • Geometric distribution S. Maarschalkerweerd & A. Tjhang

  21. Example 2 • Model distribution of length between 2 and 10 S. Maarschalkerweerd & A. Tjhang

  22. Example 3 S. Maarschalkerweerd & A. Tjhang

  23. B  Silent states • States that do not emit symbols • Also in other places in HMM S. Maarschalkerweerd & A. Tjhang

  24. Silent states Example S. Maarschalkerweerd & A. Tjhang

  25. Silent states • Advantage: • Less estimations of transition probabilities needed • Drawback: • Limits the possibilities of defining a model S. Maarschalkerweerd & A. Tjhang

  26. More complex Markov chains • So far, we assumed that probability of a symbol in a sequence depends only on the probability of the previous symbol • More complex • High order Markov chains • Inhomogeneous Markov chains S. Maarschalkerweerd & A. Tjhang

  27. P(AB|B) = P(A|B) High order Markov chains • An nth order Markov process • Probability of a symbol in a sequence depends on the probability of the previous n symbols • An nth order Markov chain over some alphabet A is equivalent to a first order Markov chain over the alphabet Anof n-tuples, because S. Maarschalkerweerd & A. Tjhang

  28. Example • A second order Markov chain with two different symbols {A,B} • This can be translated into a first order Markov chain of 2-tuples {AA, AB, BA, BB} Sometimes the framework of high order model is convenient S. Maarschalkerweerd & A. Tjhang

  29. Finding prokaryotic genes • Gene candidates in DNA: -sequence of triplets of nucleotides: startcodon nr. of non-stopcodons stopcodon -open reading frame (ORF) • An ORF can be either a gene or a non-coding ORF (NORF) S. Maarschalkerweerd & A. Tjhang

  30. Finding prokaryotic genes • Experiment: • DNA from bacterium E.coli • Dataset contains 1100 genes (900 used for training, 200 for testing) • Two models: • Normal model with first order Markov chains • Also first order Markov chains, but codons instead of nucleotides are used as symbol S. Maarschalkerweerd & A. Tjhang

  31. Finding prokaryotic genes • Outcomes: S. Maarschalkerweerd & A. Tjhang

  32. CAT GCA P(C)aCA aAT aTG aGC aCA P(C)a2CA a3AT a1TG a2GC a3CA Inhomogeneous Markov chains • Using the position information in the codon • Three models for position 1, 2 and 3 1 2 3 1 2 3 Homogeneous Inhomogeneous S. Maarschalkerweerd & A. Tjhang

  33. Numerical Stability of HMM algorithms • Multiplying many probabilities can cause numerical problems: • Underflow errors • Wrong numbers are calculated • Solutions: • Log transformation • Scaling of probabilities S. Maarschalkerweerd & A. Tjhang

  34. The log transformation • Compute log probabilities • Log 10-100000 = -100000 • Underflow problem is essentially solved • Sum operation is often faster than product operation • In the Viterbi algorithm: S. Maarschalkerweerd & A. Tjhang

  35. Scaling of probabilities • Scale f and b variables • Forward variable: • For each i a scaling variable si is defined • New f variables are defined: • New forward recursion: S. Maarschalkerweerd & A. Tjhang

  36. Scaling of probabilities • Backward variable • Scaling has to be with same numbers as forward variable • New backward recursion: • This normally works well, however underflow errors can still occur in models with many silent states (chapter 5) S. Maarschalkerweerd & A. Tjhang

  37. Summary • Hidden Markov Models • Parameter estimation • State sequence known • State sequence unknown • Model structure • Silent states • More complex Markov chains • Numerical stability S. Maarschalkerweerd & A. Tjhang

More Related