1.03k likes | 1.22k Views
HMM Algorithms. Linguistics 570, Lecture #5. HW #1. Where we left off. HMMs A set of states A set of observations Transition probability matrix , where Emission probability matrix , where An initial probability distribution , where A set of final/accepting states .
E N D
HMM Algorithms Linguistics 570, Lecture #5
Where we left off • HMMs • A set of states • A set of observations • Transition probability matrix , where • Emission probability matrix , where • An initial probability distribution , where • A set of final/accepting states
Answer 3 questions [If we already know the parameters:] 1. How likely is this data given our model? 2. Which states most likely generated that data? [If we don’t know the parameters:] 3.Which parameters make this data most likely?
Answer 3 questions [If we already know the parameters:] 1. How likely is this data given our model? 2. Which states most likely generated that data? [If we don’t know the parameters:] 3.Which parameters make this data most likely? Dynamic Programming is crucial to all of these. DP = find the optimal answer (max / arg max)
An aside:Why “dynamic programming”? “programming” “computer programming” Think television programming • put programs in a schedule • maximize viewership / ad revenue This programming is optimization/solving, often max / arg max • Linear programming: optimize linear function • Quadratic programming: optimize quadratic fn • Dynamic programming: optimize by divide and conquer
Quick Math Review Take this function • What is • What is • How do I turn a max into a min?
Fibonacci in Python #!/usr/bin/python def fib(n): if n == 0: return 1 if n == 1: return 1 return fib(n-1) + fib(n-2) print fib(40)
Runtime cquirk@patas:~ $ time ./fib.py 165580141 real 1m20.136s user 1m20.076s sys 0m0.038s
Dynamic programming solution #!/usr/bin/python def fib(n): x = [] x.append(1) x.append(1) for i in range(2, n + 1): x.append(x[i-2] + x[i-1]) return x[n] print fib(40) cquirk@patas:~ $ time ./fib_dp.py 165580141 real 0m0.020s user 0m0.013s sys 0m0.007s
Memoized solution #!/usr/bin/python x = [1,1] def fib(n): if n < len(x): return x[n] f = fib(n-1) + fib(n-2) x.append(f) return f print fib(40) cquirk@patas:~ $ time ./fib_memo.py 165580141 real 0m0.020s user 0m0.013s sys 0m0.007s
DP: String Edit Distance Classic problem: String Edit Distance (Levenstein Distance) • Given two strings and • What is the minimal sequence of insertions, deletions, and substitutions that turn into ? • Copy a character for free • Inserting a character drops my score by 1 • Deleting a character drops my score by 1
Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits }
Recursive solution If one of the strings is empty, we have to use all insertions or all deletions intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits }
Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } If the last character of each string is the same, copy
Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } Try deleting the last character of the source
Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } Try inserting the last character of the target
Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } Whichever way resulted in the minimal number of edits – that’s our distance
Recursive solution intdist(string s, string t) { if (len(s) == 0 || len(t) == 0) returnlen(s) + len(t) edits = infinity if (last(s) == last(t)) edits = dist(prefix(s), prefix(t)) // COPY edits = min(edits, 1 + dist(prefix(s), t)) // DEL edits = min(edits, 1 + dist(s, prefix(t))) // INS return edits } This algorithm works, but it wastes a huge amount of computation!
Recursive solution Say we want to compute dist(chi, kit) dist(chi, kit)
Recursive solution We need to compute distances of substrings… dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(ch, ki)
Recursive solution So first, let’s take the middle arrow (recursive call): dist(ch, ki) dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(ch, ki)
Recursive solution It does a bunch of computation to figure out its distance dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(ch, ki) dist(ch, ki) dist(c, ki) dist(ch, ) dist(c, k) dist(, ki) dist(c, ) dist(, k) dist(,)
Recursive solution Now take another arrow… dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(chi, k) dist(ch, ki) dist(ch, ki)
Recursive solution It *also* asks for dist(ch, ki), but we just computed that! dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(chi, k) dist(ch, ki) dist(ch, ki)
Recursive solution It *also* asks for dist(ch, ki), but we just computed that! dist(chi, kit) dist(chi, ki) dist(ch, kit) dist(chi, k) dist(ch, ki) dist(ch, ki) Let’s save a bunch of work and just compute these things once! This is Dynamic Programming Fill out a data structure that holds the solutions to all the subproblems This is called the CHART
Back to our HMM • Let’s answer that first question: • Given an observation sequence • Model over obs & state seq: ) • What is
Back to our HMM • Let’s answer that first question: • Given an observation sequence • Model over obs & state seq: ) • What is
Back to our HMM • Let’s answer that first question: • Given an observation sequence • Model over obs & state seq: ) • What is • Remember from last week: • We could enumerate all state sequences, but how many are there? ( tags, words)
Back to our HMM • Let’s answer that first question: • Given an observation sequence • Model over obs & state seq: ) • What is • Remember from last week: • We could enumerate all state sequences, but how many are there? ( tags, words) • With 40 POS tags, a 10 word sentences has 10,485,760,000,000,000 possible state sequences!
Forward Algorithm • Dynamic programming solution: • Tabulates intermediate results as it computes the probability of the sequence • Folds summation over paths into a forward trellis
Forward Algorithm • Dynamic programming solution: • Tabulates intermediate results as it computes the probability of the sequence • Folds summation over paths into a forward trellis • Cell = probability of being in state after seeing first observations • Computed by summing over all paths to this cell
Forward Algorithm • Dynamic programming solution: • Tabulates intermediate results as it computes the probability of the sequence • Folds summation over paths into a forward trellis • Cell = probability of being in state after seeing first observations • Computed by summing over all paths to this cell • Assume are distinguished, non-emitting start and final states
time flies like an arrow
time flies like an arrow
0.05 time flies like an arrow
0.05 0.01 time flies like an arrow
0.05 0.01 0 0 time flies like an arrow
0.05 0.01 0 0 time flies like an arrow
0.05 0.01 = 0 0 time flies like an arrow
0.05 0.01 = =0.050.20.1+0.010.40.1 =0.001+0.0004 =0.0014 0 0 time flies like an arrow
0.05 0.0014 0.01 = =0.050.20.1+0.010.40.1 =0.001+0.0004 =0.0014 0 0 time flies like an arrow
0.05 0.0014 0.01 = =0.050.70.1+0.010.10.1 =0.0035+0.0001 =0.0036 0 0 time flies like an arrow
0.05 0.0014 0.01 0.0036 = =0.050.70.1+0.010.10.1 =0.0035+0.0001 =0.0036 0 0 time flies like an arrow