1 / 41

FSA and HMM

FSA and HMM. LING 572 Fei Xia 1/5/06. Outline. FSA HMM Relation between FSA and HMM. FSA. Definition of FSA. A FSA is Q: a finite set of states Σ : a finite set of input symbols I: the set of initial states F: the set of final states

dessa
Download Presentation

FSA and HMM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FSA and HMM LING 572 Fei Xia 1/5/06

  2. Outline • FSA • HMM • Relation between FSA and HMM

  3. FSA

  4. Definition of FSA A FSA is • Q: a finite set of states • Σ: a finite set of input symbols • I: the set of initial states • F: the set of final states • : the transition relation between states.

  5. An example of FSA b a q0 q1

  6. Definition of FST A FST is • Q: a finite set of states • Σ: a finite set of input symbols • Γ: a finite set of output symbols • I: the set of initial states • F: the set of final states • : the transition relation between states.  FSA can be seen as a special case of FST

  7. The extended transition relation is the smallest set such that • T transduces a string x into a string y if there exists a path from the initial state to a final state whose input is x and whose output is y:

  8. b:y a:x q0 q1 An example of FST

  9. Operations on FSTs • Union: • Concatenation: • Composition:

  10. b:y a:x q0 q1 x:ε y:z q0 An example of composition operation

  11. Probabilistic finite-state automata (PFA) • Informally, in a PFA, each arc is associated with a probability. • The probability of a path is the multiplication of the arcs on the path. • The probability of a string x is the sum of the probabilities of all the paths for x. • Tasks: • Given a string x, find the best path for x. • Given a string x, find the probability of x in a PFA. • Find the string with the highest probability in a PFA • …

  12. Formal definition of PFA A PFA is • Q: a finite set of N states • Σ: a finite set of input symbols • I: Q R+ (initial-state probabilities) • F: Q R+ (final-state probabilities) • : the transition relation between states. • P: (transition probabilities)

  13. Constraints on function: Probability of a string:

  14. Consistency of a PFA Let A be a PFA. • Def: P(x | A) = the sum of all the valid paths for x in A. • Def: a valid path in A is a path for some string x with probability greater than 0. • Def: A is called consistent if • Def: a state of a PFA is useful if it appears in at least one valid path. • Proposition: a PFA is consistent if all its states are useful.  Q1 of Hw1

  15. b:0.8 a:1 q0:0 q1:0.2 An example of PFA I(q0)=1.0 I(q1)=0.0 P(abn)=0.2*0.8n

  16. Weighted finite-state automata (WFA) • Each arc is associated with a weight. • “Sum” and “Multiplication” can be other meanings.

  17. HMM

  18. Two types of HMMs • State-emission HMM (Moore machine): • The emission probability depends only on the state (from-state or to-state). • Arc-emission HMM (Mealy machine): • The probability depends on (from-state, to-state) pair.

  19. State-emission HMM … s1 s2 sN w1 w4 w1 w3 w5 w1 • Two kinds of parameters: • Transition probability: P(sj| si) • Output (Emission) probability: P(wk | si) •  # of Parameters: O(NM+N2)

  20. Arc-emission HMM w1 w2 w1 w1 w5 … sN s1 s2 w4 w3 Same kinds of parameters but the emission probabilities depend on both states: P(wk, sj| si)  # of Parameters: O(N2M+N2).

  21. Are the two types of HMMs equivalent? • For each state-emission HMM1, there is an arc-emission HMM2, such that for any sequence O, P(O|HMM1)=P(O|HMM2). • The reverse is also true.  Q3 and Q4 of hw1.

  22. Definition of arc-emission HMM • A HMM is a tuple : • A set of states S={s1, s2, …, sN}. • A set of output symbols Σ={w1, …, wM}. • Initial state probabilities • State transition prob: A={aij}. • Symbol emission prob: B={bijk} • State sequence: X1,n • Output sequence: O1,n

  23. Constraints For any integer n and any HMM  Q2 of hw1.

  24. Properties of HMM • Limited horizon: • Time invariance: the probabilities do not change over time: • The states are hidden because we know the structure of the machine (i.e., S and Σ), but we don’t know which state sequences generate a particular output.

  25. Applications of HMM • N-gram POS tagging • Bigram tagger: oi is a word, and si is a POS tag. • Trigram tagger: oi is a word, and si is ?? • Other tagging problems: • Word segmentation • Chunking • NE tagging • Punctuation predication • … • Other applications: ASR, ….

  26. Three fundamental questions for HMMs • Finding the probability of an observation • Finding the best state sequence • Training: estimating parameters

  27. (1) Finding the probability of the observation Forward probability: the probability of producing O1,t-1 while ending up in state si:

  28. Calculating forward probability Initialization: Induction:

  29. oT o1 o2 XT+1 … XT X1 X2 (2) Finding the best state sequence • Given the observation O1,T=o1…oT, find the state sequence X1,T+1=X1 … XT+1 that maximizes P(X1,T+1 | O1,T).  Viterbi algorithm

  30. Viterbi algorithm The probability of the best path that produces O1,t-1 while ending up in state si: Initialization: Induction: Modify it to allow epsilon emission: Q5 of hw1.

  31. Summary of HMM • Two types of HMMs: state-emission and arc-emission HMM: • Properties: Markov assumption • Applications: POS-tagging, etc. • Finding the probability of an observation: forward probability • Decoding: Viterbi decoding

  32. Relation between FSA and HMM

  33. Relation between WFA and HMM • HMM can be seen as a special type of WFA. • Given an HMM, how to build an equivalent WFA?

  34. Converting HMM into WFA Given an HMM , build a WFA such that. for any input sequence O, P(O|HMM)=P(O|WFA). • Build a WFA: add a final state and arcs to it • Show that there is a one-to-one mapping between the paths in HMM and the paths in WFA • Prove that the probabilities in HMM and in WFA are identical.

  35. HMM  WFA Need to create a new state (the final state) and add edges to it.  The WFA is not a PFA.

  36. A slightly different definition of HMM • A HMM is a tuple : • A set of states S={s1, s2, …, sN}. • A set of output symbols Σ={w1, …, wM}. • Initial state probabilities • State transition prob: A={aij}. • Symbol emission prob: B={bijk} • qf is the final state: there are no outcoming edges fromqf

  37. Constraints For any HMM (under this new definition)

  38. HMM  PFA

  39. PFA  HMM  Need to add a new final state and edges to it

  40. Project: Part 1 • Learn to use Carmel (a WFST package) • Use Carmel as an HMM Viterbi decoder for a trigram POS tagger. • The instruction will be handed out on 1/12, and the project is due on 1/19.

  41. Summary • FSA • HMM • Relation between FSA and HMM • HMM (the common def) is a special case of WFA • HMM (a different def) is equivalent to PFA.

More Related