170 likes | 365 Views
Dynamic Programming Viterbi. Today. Take a look at dynamic programming algorithms Walk through some examples Simple chart parse Viterbi example (from speech) Viterbi example (from POS tagging). Dynamic Programming.
E N D
Today • Take a look at dynamic programming algorithms • Walk through some examples • Simple chart parse • Viterbi example (from speech) • Viterbi example (from POS tagging)
Dynamic Programming • Definition: Algorithmic approach for solving optimization problems by caching sub-problem solutions rather than recomputing them. • Typical device used is a chart to hold the various sub-problem solutions. • Common implementation: Charts used in parsing (based on the Earley algorithm)
BU Chart Parser Walk Through Sample Sentence: The large can can hold the water. Sample Grammar: 1. S NP VP 2. NP D ADJ N 3. NP D N 4. NP ADJ N 5. VP AUX VP 6. VP V NP
Chart Parser • Allows one to preserve the results of the parse so far in a chart • Partial results and all previous results are kept so that work is not repeated
1. S NP VP 2. NP D ADJ N 3. NP D N 4. NP ADJ N 5. VP AUX VP 6. VP V NP The large can can hold the water. D1 ADJ1 1 2 3 NP D ° ADJ N NP D ADJ ° N NP D ° N NP ADJ ° N
1. S NP VP 2. NP D ADJ N 3. NP D N 4. NP ADJ N 5. VP AUX VP 6. VP V NP The large can can hold the water. NP2 NP1 D1 ADJ1 N1 AUX1 - V1 1 2 3 4 NP D ° ADJ N NP D ADJ N ° NP D ADJ ° N NP D ° N NP ADJ ° N NP ADJ N ° S NP ° VP S NP ° VP
Viterbi Example See word models, p. 245
Viterbi example • Input is: [aa n iy dh ax] • “I need the…” • Actually dialectal/fast speech variant, “Ah nee the” • [a ni ðə]
A smaller example very quick very quick 0.1 0.9 0.8 0.2 0.7 end start JJ RB 1 1 0.1 0.3 0.9 • What is the best sequence of states for the input string “very very quick”? • Computing all possible paths and finding the one with maximum prob. is exponential
very quick very quick 0.1 0.9 0.8 0.2 0.7 end start JJ RB 1 1 0.1 0.3 0.9
very quick very quick 0.1 0.9 0.8 0.2 0.7 end start JJ RB 1 1 0.1 0.3 0.9
very quick very quick 0.1 0.9 0.8 0.2 0.7 end start JJ RB 1 1 0.1 0.3 0.9
Implementation // Initialize viterbi[1,PERIOD] = 1.0 at the start, assuming dummy period for i:=1 to n step 1 do // all input words (presumably in a sentence) for all tags tj do // for all possible tags // max probability of being in state j (tag j) at word i+1 (path probability matrix) viterbi[i+1,tj] := max1≤k≤T(viterbi[i,tk] x P(wi+1|tj) x P(tj|tk)) // most likely state (tag) at word i given that we’re in state j at word i+1 // In other words, state[i+1] is keeping a pointer to the state that got us here state[i+1,tj] := argmax1≤k≤T ( viterbi[i,tk] x P(wi+1|tj) x P(tj|tk)) end end //Termination and path-readout bestPathn+1 := argmax1≤j≤T viterbi[n+1,j] for j:=n to 1 step -1 do // for all input words bestPathj := state[i+1, bestPathj+1] end P(bestPath1,…, bestPathn ) := max1≤j≤T viterbi[n+1,j] Emission probability State transition probability