1 / 64

Advanced Artificial Intelligence

Advanced Artificial Intelligence. Part II. Statistical NLP. Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme. Most slides taken (or adapted) from David Meir Blei, Figures from Manning and Schuetze and from Rabiner. Contents. Markov Models

carterm
Download Presentation

Advanced Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Artificial Intelligence Part II. Statistical NLP Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most slides taken (or adapted) from David Meir Blei, Figures from Manning and Schuetze and from Rabiner

  2. Contents • Markov Models • Hidden Markov Models • Three problems - three algorithms • Decoding • Viterbi • Baum-Welsch • Next chapter • Application to part-of-speech-tagging (POS-tagging) Largely chapter 9 of Statistical NLP, Manning and Schuetze, or Rabiner, A tutorial on HMMs and selected applications in Speech Recognition, Proc. IEEE

  3. Motivations and Applications • Part-of-speech tagging / Sequence tagging • The representative put chairs on the table • AT NN VBD NNS IN AT NN • AT JJ NN VBZ IN AT NN • Some tags : • AT: article, NN: singular or mass noun, VBD: verb, past tense, NNS: plural noun, IN: preposition, JJ: adjective

  4. Bioinformatics • Durbin et al. Biological Sequence Analysis, Cambridge University Press. • Several applications, e.g. proteins • From primary structure ATCPLELLLD • Infer secondary structure HHHBBBBBC..

  5. Other Applications • Speech Recognition: from • From: Acoustic signals infer • Infer: Sentence • Robotics: • From Sensory readings • Infer Trajectory / location …

  6. Graphical Model (Can be interpreted as Bayesian Net) Circles indicate states Arrows indicate probabilistic dependencies between states State depends only on the previous state “The past is independent of the future given the present.” Recall from introduction to N-gramms !!! What is a (Visible) Markov Model ?

  7. Markov Model Formalization S S S S S • {S, P, A} • S : {s1…sN } are the values for the hidden states Limited Horizon (Markov Assumption) Time Invariant (Stationary) Transition Matrix A

  8. Markov Model Formalization A A A A S S S S S • {S, P, A} • S : {s1…sN } are the values for the hidden states • P = {pi} are the initial state probabilities • A = {aij} are the state transition probabilities

  9. What is the probability of a sequence of states ?

  10. Graphical Model Circles indicate states Arrows indicate probabilistic dependencies between states HMM = Hidden Markov Model What is an HMM?

  11. Green circles are hidden states Dependent only on the previous state What is an HMM?

  12. Purple nodes are observed states Dependent only on their corresponding hidden state The past is independent of the future given the present What is an HMM?

  13. {S, K, P, A, B} S : {s1…sN } are the values for the hidden states K : {k1…kM } are the values for the observations HMM Formalism S S S S S K K K K K

  14. {S, K, P, A, B} P = {pi} are the initial state probabilities A = {aij} are the state transition probabilities B = {bik} are the observation state probabilities Note : sometimes one uses B = {bijk} output then depends on previous state / transition as well HMM Formalism A A A A S S S S S B B B K K K K K

  15. The crazy soft drink machine • Fig 9.2

  16. Probability of {lem,ice} ? • Sum over all paths taken through HMM • Start in CP • 1 x 0.3 x 0.7 x 0.1 + • 1 x 0.3 x 0.3 x 0.7

  17. o1 ot-1 ot ot+1 oT HMMs and Bayesian Nets (1) x1 xt-1 xt xt+1 xT

  18. HMM and Bayesian Nets (2) x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Conditionally independent of Given Because of d-separation “The past is independent of the future given the present.”

  19. Compute the probability of a given observation sequence Given an observation sequence, compute the most likely hidden state sequence Given an observation sequence and set of possible models, which model most closely fits the data? Inference in an HMM

  20. oT Decoding o1 ot-1 ot ot+1 Given an observation sequence and a model, compute the probability of the observation sequence

  21. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Decoding

  22. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Decoding

  23. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Decoding

  24. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Decoding

  25. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Decoding

  26. Dynamic Programming

  27. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Forward Procedure • Special structure gives us an efficient solution using dynamic programming. • Intuition: Probability of the first t observations is the same for all possible t+1 length state sequences. • Define:

  28. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Forward Procedure

  29. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Forward Procedure

  30. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Forward Procedure

  31. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Forward Procedure

  32. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Forward Procedure

  33. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Forward Procedure

  34. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Forward Procedure

  35. x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Forward Procedure

  36. Dynamic Programming

  37. Backward Procedure x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Probability of the rest of the states given the first state

  38. Decoding Solution x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Forward Procedure Backward Procedure Combination

  39. Find the state sequence that best explains the observations Two approaches Individually most likely states Most likely sequence (Viterbi) o1 ot-1 ot ot+1 oT Best State Sequence

  40. Best State Sequence (1)

  41. Find the state sequence that best explains the observations Viterbi algorithm o1 ot-1 ot ot+1 oT Best State Sequence (2)

  42. Viterbi Algorithm x1 xt-1 j o1 ot-1 ot ot+1 oT The state sequence which maximizes the probability of seeing the observations to time t-1, landing in state j, and seeing the observation at time t

  43. o1 ot-1 ot ot+1 oT Viterbi Algorithm x1 xt-1 xt xt+1 Recursive Computation

  44. o1 ot-1 ot ot+1 oT Viterbi Algorithm x1 xt-1 xt xt+1 xT Compute the most likely state sequence by working backwards

  45. o1 ot-1 ot ot+1 oT HMMs and Bayesian Nets (1) x1 xt-1 xt xt+1 xT

  46. HMM and Bayesian Nets (2) x1 xt-1 xt xt+1 xT o1 ot-1 ot ot+1 oT Conditionally independent of Given Because of d-separation “The past is independent of the future given the present.”

  47. Compute the probability of a given observation sequence Given an observation sequence, compute the most likely hidden state sequence Given an observation sequence and set of possible models, which model most closely fits the data? Inference in an HMM

  48. Dynamic Programming

More Related