1 / 45

Hidden Markov Model

Hidden Markov Model . 11/28/07. Bayes Rule. The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification rate. Maximum likelihood rule is equivalent to Bayes rule with uniform prior. Decision boundary is . Naïve Bayes approximation.

dacian
Download Presentation

Hidden Markov Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Model 11/28/07

  2. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification rate. Maximum likelihood rule is equivalent to Bayes rule with uniform prior. Decision boundary is

  3. Naïve Bayes approximation • When x is high dimensional, it is difficult to estimate

  4. Naïve Bayes Classifier • When x is high dimensional, it is difficult to estimate • But if we assume independence, then it becomes a 1-D problem.

  5. Naïve Bayes Classifier • Usually the independence assumption is not valid. • But sometimes the NBC can still be a good classifier. • A lot of times simple models may not perform badly.

  6. Hidden Markov Model

  7. A coin toss example Scenario: You are betting with your friend using a coin toss. And you see (H, T, T, H, …)

  8. A coin toss example Scenario: You are betting with your friend using a coin toss. And you see (H, T, T, H, …) But, you friend is cheating. He occasionally switches from a fair coin to a biased coin – of course, the switch is under the table! Biased Fair

  9. A coin toss example This is what really happening: (H, T, H, T,H, H, H, H, T,H, H, T, …) Of course you can’t see the color. So how can you tell your friend is cheating?

  10. Hidden Markov Model Hidden state (the coin) Observed variable (H or T)

  11. Markov Property Hidden state (the coin) Observed variable (H or T)

  12. Markov Property transition probability prior distribution Biased Fair

  13. Observation independence Hidden state (the coin) Observed variable (H or T) Emission probability

  14. Model parameters A = (aij) (transition matrix) p(yt | xt) (emission probability) p(x1) (prior distribution)

  15. Model inference • Infer states when model parameters are known. • Both states and model parameters are unknown.

  16. Viterbi algorithm time t-1 t t+1 1 2 state 3 4

  17. Viterbi algorithm time • Most probable path: t-1 t t+1 1 2 state 3 4

  18. Viterbi algorithm time • Most probable path: t-1 t t+1 1 2 state 3 4

  19. Viterbi algorithm time • Most probable path: t-1 t t+1 1 2 state 3 4 Therefore, the path can be found iteratively.

  20. Viterbi algorithm time • Most probable path: t-1 t t+1 1 2 state 3 4 Let vk(i) be the most probable path ending in state k. Then

  21. Viterbi algorithm • Initialization (i=0): • Recursion (i=1,...,L): • Termination: • Traceback (i = L, ..., 1):

  22. Advantage of Viterbi path • Identify the most probable path very efficiently. • The most probable path is legitimate, i.e., it is realizable by the HMM process.

  23. Issue with Viterbi path • The most probability path does not predict the confidence level of a state estimate. • The most probably path may not be much more probable then other paths.

  24. Posterior distribution Estimate p(xk | y1, ..., yL). Strategy: This is done by a forward-backward algorithm

  25. Forward-backward algorithm Estimate fk(i)

  26. Forward algorithm Estimate fk(i) Initialization: Recursion: Termination:

  27. Backward algorithm Estimate bk(i)

  28. Backward algorithm Estimate bk(i) Initialization: Recursion: Termination:

  29. Probability of fair coin 1 P(fair)

  30. Probability of fair coin 1 P(fair)

  31. Posterior distribution • Posterior distribution predicts the confidence level of a state estimate. • Posterior distribution combines information from all paths. But.. • The predicted path may not be legitimate.

  32. Estimating parameters when state sequence is known Given the state sequence {xk} Define Ajk = # transitions from j to k. Ek(b) = #emissions of b from k. The maximum likelihood estimates of parameters are:

  33. Infer hidden states together with model parameters • Viterbi training • Baum-Welch

  34. Viterbi training Main idea: Use an iterative procedure • Estimate state for fixed parameters using the Viterbi algorithm. • Estimate model parameters for fixed states.

  35. Baum-Welch algorithm • Instead of using the Viterbi path to estimate state, consider the expected number of Akl and Ek(b)

  36. Baum-Welch algorithm • Instead of using the Viterbi path to estimate state, consider the expected number of Akl and Ek(b)

  37. Baum-Welch is a special case of EM algorithm • Given an estimate of parameter qt , try to find a better q. • Choose q to maximize Q

  38. Baum-Welch is a special case of EM algorithm • E-step: Calculate the Q function • M-step: Maximize Q(q|qt) with respect to q.

  39. Issue with EM • EM only finds local maxima. • Solution: • Run multiple EM starting with different initial guesses. • Use more sophisticated algorithm such as MCMC.

  40. Dynamic Bayesian Network Kelvin Murphy

  41. Software • Kevin Murphy’s Bayes Net Toolbox for Matlab http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html

  42. Applications Copy number changes (Yi Li)

  43. Applications Protein-binding sites

  44. Applications Sequence alignment www.biocentral.com

  45. Reading list • Hastie et al. (2001) the ESL book • p184-185. • Durbin et al. (1998) Biological Sequence Analysis • Chapter 3.

More Related