1 / 31

Hidden Markov Models

Hidden Markov Models. Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001. Administration. Exams need a swift kick. Form project groups next week. Project due on the last day. BUT ! There will be milestones. In 2 weeks, synonyms via web.

maureen
Download Presentation

Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Models Introduction toArtificial Intelligence COS302 Michael L. Littman Fall 2001

  2. Administration Exams need a swift kick. Form project groups next week. Project due on the last day. BUT! There will be milestones. In 2 weeks, synonyms via web. 3 weeks, synonyms via wordnet.

  3. Shannon Game Again Recall Sue swallowed the large green __. Ok: pepper, frog, pea, pill Not ok: beige, running, very Parts of speech could help: noun verbed det adj adj noun

  4. POS Language Model How could we define a probabilistic model over sequences of parts of speech? How could we use this to define a model over word sequences?

  5. Hidden Markov Model Idea: We have states with a simple transition rule (Markov model). We observe a probabilistic function of the states. Therefore, the states are not what we see…

  6. HMM Example Browsing the web, connection is either up or down. Observe response or no. 0.1 UP DOWN Response: 0.7 No response: 0.3 Response: 0.0 No response: 1.0 0.9 0.2 0.8

  7. HMM Definition N states, M observations p(s): prob. starting state is s p(s,s’): prob. of s to s’ transition b(s, k): probability of obs k from s k0 k1 … kl: observation sequence

  8. HMM Problems Probability of a state sequence Probability of an obs. sequence Most like state sequence for an observation sequence Most likely model given obs. seq. 0.1 Response: 0.7 No response: 0.3 Response: 0.0 No response: 1.0 UP DOWN 0.9 0.2 0.8

  9. R: 0.7, N: 0.3 R: 0.0, N: 1.0 0.1 0.2 0.8 0.9 States Example Probability of: U U D D U U U is 1.0 0.9 0.1 0.8 0.2 0.9 0.9 = .01166

  10. Derivation Pr(s0…sl) = Pr(s0) Pr(s1 | s0) Pr(s2 | s0 s1)… Pr(sl | s0 s1 s2 … sl-1) = Pr(s0) Pr(s1 | s0) Pr(s2 | s1)… Pr(sl | sl-1) = p(s0) p(s0, s1) p(s1, s2)… p(sl-1, sl)

  11. R: 0.7, N: 0.3 R: 0.0, N: 1.0 0.1 0.2 0.8 0.9 States/Obs. Example Probability of: R R N N N R N given U U D D U U U is 0.7 0.7 1.0 1.0 0.3 0.7 0.3 = .0308700

  12. Derivation Pr(k0…kl | s0…sl) = Pr(k0|s0…sl) Pr(k1 | s0…sl, k0) Pr(k2 | s0…sl, k0, k1) … Pr(kl | s0…sl, k0…kl-1) = Pr(k0|s0) Pr(k1 | s1) Pr(k2 | s2) … Pr(kl | sl) = b(s0,k0) b(s1,k1) b(s2,k2) … b(sl,kl)

  13. R: 0.7, N: 0.3 R: 0.0, N: 1.0 0.1 0.2 0.8 0.9 Observation Example Probability of: R R N N N R N = sum{s0…s6} Pr(s0…s6) Pr(R R N N N R N | s0…s6)

  14. Derivation Pr(k0…kl) = sum{s0…sl} Pr(s0…s6) Pr(k0…kl |s0…sl) = sum{s0…sl} Pr(s0) Pr(k0|s0) Pr(s1 | s0) Pr(k1 | s1) Pr(s2 | s1) Pr(k2 | s2) … Pr(sl | sl-1) Pr(kl | sl) = sum{s0…sl} p(s0) b(s0,k0) p(s0, s1) b(s1,k1) p(s1, s2) b(s2,k2) … p(sl-1, sl) b(sl,kl)

  15. Uh Oh, Better Get a Grows exponentially with l: Ml How do this more efficiently? a(i,t): probability of seeing first t observations and ending up in state i: Pr(k0…kt, st=i) a(i,0) = p(s0) b(k0, si) a(i,t) = sumja(j,t-1) p(sj,si) b(kt, si) Return Pr(k0…kl) = sumja(j,l)

  16. Partial Derivation a(i,t) = Pr(k0…kt, st=i) = sumj Pr(k0…kt-1, st-1=j) Pr(k0…kt, st=i | k0…kt-1, st-1=j) = sumja(j,t-1) Pr(k0…kt-1 | k0…kt-1, st-1=j) Pr(st=i, kt | k0…kt-1, st-1=j) = sumja(j,t-1) Pr(st=i | k0…kt-1, st-1=j) Pr(kt | st=i, k0…kt-1, st-1=j) = sumja(j,t-1) Pr(st=i | st-1=j) Pr(kt | st=i) = sumja(j,t-1) p(sj,si) b(kt, si)

  17. R: 0.7, N: 0.3 R: 0.0, N: 1.0 0.1 0.2 0.8 0.9 Pr(N N N) = Pr(UUU) Pr(N N N | UUU) + Pr(UUD) Pr(N N N | UUD) + Pr(UDU) Pr(N N N | UDU) + Pr(UDD) Pr(N N N | UDD) + Pr(DUU) Pr(N N N | DUU) + Pr(DUD) Pr(N N N | DUD) + Pr(DDU) Pr(N N N | DDU) + Pr(DDD) Pr(N N N | DDD)

  18. R: 0.7, N: 0.3 R: 0.0, N: 1.0 0.1 0.2 0.8 0.9 Pr(N N N)

  19. POS Tagging Simple tagging model says part of speech depends only on previous part of speech, word depends only on part of speech. So, HMM state is previous part of speech, observation is word. What are the probabilities and where could they come from?

  20. Shannon Game If we have a POS-based language model, how do we compute probabilities? Pr(Sue ate the small blue candy.) = sumtags Pr(sent|tags)

  21. Best Tag Sequence Useful problem to solve: Given sentence, find most likely sequence of POS tags Sue saw the fly . noun verb det noun . maxtags Pr(tags|sent) = maxtags Pr(sent|tags) Pr(tags) / Pr(sent) = c maxtags Pr(sent|tags) Pr(tags)

  22. Viterbi d(i,t): probability of most likely state sequence that ends in state i when seeing first t observations: max{s0…st-1} Pr(k0…kt,s0…st-1, st=i) d(i,0) = p(s0) b(k0, si) d(i,t) = maxjd(j,t-1) p(sj,si) b(kt, si) Return maxjd(j,l) (trace it back)

  23. R: 0.7, N: 0.3 R: 0.0, N: 1.0 0.1 0.2 0.8 0.9 max Pr(s*) Pr(NNN|s*) max { Pr(UUU) Pr(N N N | UUU), Pr(UUD) Pr(N N N | UUD), Pr(UDU) Pr(N N N | UDU), Pr(UDD) Pr(N N N | UDD), Pr(DUU) Pr(N N N | DUU), Pr(DUD) Pr(N N N | DUD), Pr(DDU) Pr(N N N | DDU), Pr(DDD) Pr(N N N | DDD) }

  24. R: 0.7, N: 0.3 R: 0.0, N: 1.0 0.1 0.2 0.8 0.9 max Pr(s*) Pr(NNN|s*)

  25. PCFG An analogy… Finite-state Automata (FSAs): Hidden Markov Models (HMMs) :: Context-free Grammars (CFGs): Probabilistic context-free grammars (PCFGs)

  26. Grammars & Recursion S  1.0 NP VP NP  0.45 NP PP, 0.55 N PP  1.0 P NP VP  0.3 VP PP, 0.7 V NP N  0.4mice, 0.25hats, 0.35pimples P  1.0with V  1.0wore

  27. Computing with PCFGs Can compute Pr(sent) and also maxtree Pr(tree|sent) using algorithms like the ones we discussed for HMMs. Still polynomial time.

  28. What to Learn HMM Definition Computing probability of state and observation sequences Part of Speech Tagging Viterbi: most likely state sequence PCFG Definition

  29. Homework 7 (due 11/21) • Give a maximization scheme for filling in the two blanks in a sentence like “I hate it when ___ goes ___ like that.” Be somewhat rigorous to make the TA’s job easier. 2. Derive that (a) Pr(k0…kl) = sumja(j,l), and (b) a(i,0) = p(s0) b(k0, si). 3. Email me your project group of three or four.

  30. Homework 8 (due 11/28) • Write a program that decides if a pair of words are synonyms using the web. I’ll send you the list, you send me the answers. • more soon

  31. Homework 9 (due 12/5) • Write a program that decides if a pair of words are synonyms using wordnet. I’ll send you the list, you send me the answers. • more soon

More Related