Dynamic Programming Viterbi

Dynamic ProgrammingViterbi

Today • Take a look at dynamic programming algorithms • Walk through some examples • Simple chart parse • Viterbi example (from speech) • Viterbi example (from POS tagging)

Dynamic Programming • Definition: Algorithmic approach for solving optimization problems by caching sub-problem solutions rather than recomputing them. • Typical device used is a chart to hold the various sub-problem solutions. • Common implementation: Charts used in parsing (based on the Earley algorithm)

BU Chart Parser Walk Through Sample Sentence: The large can can hold the water. Sample Grammar: 1. S  NP VP 2. NP  D ADJ N 3. NP  D N 4. NP  ADJ N 5. VP  AUX VP 6. VP  V NP

Chart Parser • Allows one to preserve the results of the parse so far in a chart • Partial results and all previous results are kept so that work is not repeated

1. S  NP VP 2. NP  D ADJ N 3. NP  D N 4. NP  ADJ N 5. VP  AUX VP 6. VP  V NP The large can can hold the water. D1 ADJ1 1 2 3 NP  D ° ADJ N NP  D ADJ ° N NP  D ° N NP  ADJ ° N

1. S  NP VP 2. NP  D ADJ N 3. NP  D N 4. NP  ADJ N 5. VP  AUX VP 6. VP  V NP The large can can hold the water. NP2 NP1 D1 ADJ1 N1 AUX1 - V1 1 2 3 4 NP  D ° ADJ N NP  D ADJ N ° NP  D ADJ ° N NP  D ° N NP  ADJ ° N NP  ADJ N ° S  NP ° VP S  NP ° VP

Viterbi Example See word models, p. 245

Viterbi Example

Viterbi example • Input is: [aa n iy dh ax] • “I need the…” • Actually dialectal/fast speech variant, “Ah nee the” • [a ni ðə]

[aa n iy dh ax]

Viterbi Example

A smaller example very quick very quick 0.1 0.9 0.8 0.2 0.7 end start JJ RB 1 1 0.1 0.3 0.9 • What is the best sequence of states for the input string “very very quick”? • Computing all possible paths and finding the one with maximum prob. is exponential

very quick very quick 0.1 0.9 0.8 0.2 0.7 end start JJ RB 1 1 0.1 0.3 0.9

Implementation // Initialize viterbi[1,PERIOD] = 1.0 at the start, assuming dummy period for i:=1 to n step 1 do // all input words (presumably in a sentence) for all tags tj do // for all possible tags // max probability of being in state j (tag j) at word i+1 (path probability matrix) viterbi[i+1,tj] := max1≤k≤T(viterbi[i,tk] x P(wi+1|tj) x P(tj|tk)) // most likely state (tag) at word i given that we’re in state j at word i+1 // In other words, state[i+1] is keeping a pointer to the state that got us here state[i+1,tj] := argmax1≤k≤T ( viterbi[i,tk] x P(wi+1|tj) x P(tj|tk)) end end //Termination and path-readout bestPathn+1 := argmax1≤j≤T viterbi[n+1,j] for j:=n to 1 step -1 do // for all input words bestPathj := state[i+1, bestPathj+1] end P(bestPath1,…, bestPathn ) := max1≤j≤T viterbi[n+1,j] Emission probability State transition probability

Dynamic Programming Viterbi

Dynamic Programming Viterbi

Presentation Transcript

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming

Dynamic Programming