1 / 26

Hidden Markov Models

Hidden Markov Models. Markov assumption: The current state is conditionally independent of all the other past states given the state in the previous time step The evidence at time t depends only on the state at time t Transition model: P( X t | X 0: t -1 ) = P( X t | X t -1 )

favian
Download Presentation

Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Models • Markov assumption: • The current state is conditionally independent of all the other past states given the state in the previous time step • The evidence at time t depends only on the state at time t • Transition model: P(Xt | X0:t-1) = P(Xt | Xt-1) • Observation model: P(Et | X0:t, E1:t-1) = P(Et | Xt) … X2 Xt-1 Xt X0 X1 E2 Et-1 Et E1

  2. An example HMM • States:X = {home, office, cafe} • Observations:E = {sms, facebook, email} Slide credit: Andy White

  3. The Joint Distribution • Transition model: P(Xt | Xt-1) • Observation model: P(Et | Xt) • How do we compute the full joint P(X0:t, E1:t)? … X2 Xt-1 Xt X0 X1 E2 Et-1 Et E1

  4. HMM inference tasks • Filtering: what is the distribution over the current state Xt given all the evidence so far, e1:t ? Query variable … … Xk Xt-1 Xt X0 X1 Ek Et-1 Et E1 Evidence variables

  5. HMM inference tasks • Filtering: what is the distribution over the current state Xt given all the evidence so far, e1:t ? • Smoothing: what is the distribution of some state Xk given the entire observation sequence e1:t? … … Xk Xt-1 Xt X0 X1 Ek Et-1 Et E1

  6. HMM inference tasks • Filtering: what is the distribution over the current state Xt given all the evidence so far, e1:t ? • Smoothing: what is the distribution of some state Xk given the entire observation sequence e1:t? • Evaluation: compute the probability of a given observation sequence e1:t … … Xk Xt-1 Xt X0 X1 Ek Et-1 Et E1

  7. HMM inference tasks • Filtering: what is the distribution over the current state Xt given all the evidence so far, e1:t • Smoothing: what is the distribution of some state Xk given the entire observation sequence e1:t? • Evaluation: compute the probability of a given observation sequence e1:t • Decoding: what is the most likely state sequence X0:t given the observation sequence e1:t? … … Xk Xt-1 Xt X0 X1 Ek Et-1 Et E1

  8. Filtering • Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) • Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Query variable … … Xk Xt-1 Xt X0 X1 Ek Et-1 Et E1 Evidence variables

  9. Filtering • Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) • Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Time: t – 1 Time: t What is P(Xt = Office | e1:t-1) ? et-1 = Facebook 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1=0.5 Home 0.6 Home ?? 0.6 0.2 Office 0.3 Office ?? 0.8 Cafe 0.1 Cafe ?? P(Xt | Xt-1) P(Xt-1 | e1:t-1)

  10. Filtering • Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) • Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Time: t – 1 Time: t What is P(Xt = Office | e1:t-1) ? et-1 = Facebook 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1=0.5 Home 0.6 Home ?? 0.6 0.2 Office 0.3 Office ?? 0.8 Cafe 0.1 Cafe ?? P(Xt | Xt-1) P(Xt-1 | e1:t-1)

  11. Filtering • Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) • Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Time: t – 1 Time: t What is P(Xt = Office | e1:t-1) ? et-1 = Facebook 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1=0.5 Home 0.6 Home ?? 0.6 0.2 Office 0.3 Office ?? 0.8 Cafe 0.1 Cafe ?? P(Xt | Xt-1) P(Xt-1 | e1:t-1)

  12. Filtering • Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) • Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Time: t – 1 Time: t What is P(Xt = Office | e1:t-1) ? et-1 = Facebook et = Email 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1=0.5 Home 0.6 Home ?? 0.6 0.2 Office 0.3 Office ?? What is P(Xt = Office | e1:t) ? 0.8 Cafe 0.1 Cafe ?? P(Xt | Xt-1) P(et | Xt)= 0.8 P(Xt-1 | e1:t-1)

  13. Filtering • Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) • Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Time: t – 1 Time: t What is P(Xt = Office | e1:t-1) ? et-1 = Facebook et = Email 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1=0.5 Home 0.6 Home ?? 0.6 0.2 Office 0.3 Office ?? What is P(Xt = Office | e1:t) ? 0.8 Cafe 0.1 Cafe ?? P(Xt | Xt-1) P(et | Xt)= 0.8 P(Xt-1 | e1:t-1)

  14. Filtering • Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) • Recursive formulation: suppose we know P(Xt-1 | e1:t-1) Time: t – 1 Time: t What is P(Xt = Office | e1:t-1) ? et-1 = Facebook et = Email 0.6 * 0.6 + 0.2 * 0.3 + 0.8 * 0.1=0.5 Home 0.6 Home ?? 0.6 0.2 Office 0.3 Office ?? What is P(Xt = Office | e1:t) ? 0.8 Cafe 0.1 Cafe ??  0.5*0.8= 0.4 Note: must also compute this value for Home and Cafe, and renormalize to sum to 1 P(Xt | Xt-1) P(et | Xt)= 0.8 P(Xt-1 | e1:t-1)

  15. Filtering: The Forward Algorithm • Task: compute the probability distribution over the current state given all the evidence so far: P(Xt | e1:t) • Recursive formulation: suppose we know P(Xt-1 | e1:t-1) • Base case: priors P(X0) • Prediction: propagate belief from Xt-1 to Xt • Correction: weight by evidence et • Renormalize to have all P(Xt = x | e1:t) sum to 1

  16. Filtering: The Forward Algorithm Time: 0 Time: t Time: t – 1 et-1 et Home prior … Home Home Office prior … Office Office Cafe prior … Cafe Cafe

  17. Evaluation • Compute the probability of the current sequence: P(e1:t) • Recursive formulation: suppose we know P(e1:t-1)

  18. Evaluation • Compute the probability of the current sequence: P(e1:t) • Recursive formulation: suppose we know P(e1:t-1) recursion filtering

  19. Smoothing • What is the distribution of some state Xk given the entire observation sequence e1:t? … … Xk Xt-1 Xt X0 X1 Ek Et-1 Et E1

  20. Smoothing • What is the distribution of some state Xk given the entire observation sequence e1:t? • Solution: the forward-backward algorithm Time: t Time: 0 Time: k et ek Home Home … Home … Office Office … Office … Cafe Cafe … Cafe … Forward message: P(Xk | e1:k) Backward message: P(ek+1:t | Xk)

  21. Decoding: Viterbi Algorithm • Task: given observation sequence e1:t, compute most likely state sequence x0:t … … Xk Xt-1 Xt X0 X1 Ek Et-1 Et E1

  22. Decoding: Viterbi Algorithm • Task: given observation sequence e1:t, compute most likely state sequence x0:t • The most likely path that ends in a particular state xt consists of the most likely path to some state xt-1 followed by the transition to xt Time: 0 Time: t Time: t – 1 xt-1 xt

  23. Decoding: Viterbi Algorithm • Let mt(xt) denote the probability of the most likely path that ends in xt: Time: 0 Time: t Time: t – 1 P(xt| xt-1) xt-1 mt-1(xt-1) xt

  24. Learning • Given: a training sample of observation sequences • Goal: compute model parameters • Transition probabilities P(Xt | Xt-1) • Observation probabilities P(Et | Xt) • What if we had complete data, i.e., e1:t and x0:t ? • Then we could estimate all the parameters by relative frequencies # of times state b follows state a P(Xt = b | Xt-1= a)  total # of transitions from state a # of times e is emitted from state a P(E = e | X= a)  total # of emissions from state a

  25. Learning • Given: a training sample of observation sequences • Goal: compute model parameters • Transition probabilities P(Xt | Xt-1) • Observation probabilities P(Et | Xt) • What if we had complete data, i.e., e1:t and x0:t ? • Then we could estimate all the parameters by relative frequencies • What if we knew the model parameters? • Then we could use inference to find the posterior distribution of the hidden states given the observations

  26. Learning • Given: a training sample of observation sequences • Goal: compute model parameters • Transition probabilities P(Xt | Xt-1) • Observation probabilities P(Et | Xt) • The EM (expectation-maximization) algorithm: • Starting with a random initialization of parameters: • E-step: find the posterior distribution of the hidden variables given observations and current parameter estimate • M-step: re-estimate parameter values given the expected values of the hidden variables

More Related