Dynamic Programming

Dynamic Programming

Definition (1) • Divide the problem into a tree of sub-problems • Conquer each problem in the tree in post-order, recording problem/solution pairs in a table • Combine the solution to each child problem to solve each parent problem using any applicable problem/solution pairs in the table. Sound familiar at all?

Definition (2) • Algorithmic Pattern (iterative): • Solution DP( Problem p ) • tree  p.divide() • list  tree.postOrder() • for each problem, pi, in list • children  pi.children() • solution   • for each problem, cij, in children • if cache( cij ) not empty then • sij  cached solution • else • sij  cij.solve() • cache  cache + <cij, sij> • end if • solution  solution + sij • end for • cache  cache + < pi, solution> • end for • return cache( p )

Applicability • Use the dynamic programming algorithmic pattern when ALL of the following are true: • The problem lends itself to division into sub-problems of the same type • The sub-problems have considerable overlap in their solution • An acceptable solution to the problem can be constructed from acceptable solutions to sub-problems • Extra memory is readily available

Well-Known Uses • Academic • “n” Coins • Mathematics • Fibonacci sequence • Matrix Multiplication • Graphs • Shortest Path • String • Edit distance (similarity) • String alignment

Example: NCoins (1) • How many coins will be given in change for a purchase of up to $1.00 • Build a table starting at the base case • Work from the bottom sub-problems to the top sub-problems • This will work as long as when we need to compute the value for N we already have the following values in our table: N-1, N-5, N-10, N-12, N-25 MinCoins

Example: NCoins (2) • However, this table gives only the minimum number of coins needed • This is the thing being optimized • What usually also want the actual coins used • To do this we build a second table as we are building the first table • The second table contains, at each step, the coins that produced the optimal number of coins

Example: NCoins (3) • We can then recursively reconstruct the list of coins used to produce the minimum number of coins • We stop when we reach a base case (0 in this example) • Note that the second table is constructed during the construction of the first table, but is never used to determine values in the first table • And the first table is never used during the recursive reconstructing of the coin list MinCoins CoinUsed

Example: Matrix Multiplication (1) col 2 row 1, col 2 row 1 x = 2x3 3x3 2x3 1x2 + 2x5 + 3x8 = 2 + 10 + 24 = 36 • rows from the first matrix • columns from the second matrix

Example: Matrix Multiplication (2) • “inner” dimensions must match • result is “outer” dimension • Examples: • 2x3 X 3x3 = 2x3 • 3x4 X 4x5 = 3x5 • 2x3 X 4x3 = cannot multiply • Question: Does AB = BA? • Hint: let A be 2x3 and B be 3x2

Example: Matrix Multiplication (3) • publicstatic Matrix mult(Matrix m1, Matrix m2) { • Matrix result = new Matrix(); • for (int i=0; i<4; i++) { • for (int j=0; j<4; j++) { • double total = 0.0; • for (int k=0; k<4; k++) { • total += (m1.m[i * 4 + k] * m2.m[k * 4 + j]); • } • result.m[i * 4 + j] = total; • } • } • return result; • } How many multiplications of matrix elements are performed?

Example: Matrix Multiplication (4) • Given the following matrices: • A is 20x2 • B is 2 x 30 • C is 30 x 12 • D is 12 x 8 • Specifically, how many multiplications does it take to compute A x B x C x D ? • First thing you should ask is “can it even be done”?

Example: Matrix Multiplication (5) • Note that matrix multiplication is an associative operation meaning that the order in which we multiply doesn’t matter • A(B(C D)) or (A B)(C D) or A((B C)D)) or ((A B)C)D or (A(B C))D • However, each of these has a different number of multiplications: • A(B(CD)) = (30x12x8)+(2x30x8)+(20x2x8)=3,680 • (AB)(CD) = (20x2x30)+(30x12x8)+(20x30x8)=8,880 • A((BC)D)) = (2x30x12)+(2x12x8)+(20x2x8)=1,232 • ((AB)C)D = (20x2x30)+(20x30x12)+(20x12x8)=10,320 • (A(BC))D = (2x30x12)+(20x2x12)+(20x12x8)=3,120 • Obviously, there is an optimal solution • How do we solve for it?

Example: Matrix Multiplication (6) • At the top level, given 4 matrices, there are 3 ways of parenthesizing this set into 2 subsets: • (A1) (A2 A3 A4 ) or (A1 A2 ) (A3 A4)or(A1 A2 A3 ) (A4) • The best way parenthesizing for this set of 4 is given by: • Best(firstSet) + Best(secondSet) + amount to multiply resulting 2 matrices • This is simply a recursive definition of the problem

Example: Matrix Multiplication (7) • As an example: • A1 A2 A3 A4 A5 A6 • 5x2 2x3 3x4 4x6 6x7 7x8 • d0 d1 d1 d2 d2 d3 d3 d4 d4 d5 d5 d6 • There are 5 possible ways to parenthesize this expression, each one defined as: • Best(1, k) + Best(k+1, 6) + d0dkd6 for k  [1..5] • We need to take the min of these: • Best(1, 6) = Min(Best(1, k) + Best(k+1, 6) + d0dkd6 ) for k  [1..5]

Example: Matrix Multiplication (8) • There was nothing in the previous work that forced the first matrix to be A1 and the last to be A6 • Thus, we can generalize this to be: • Best(i, j) = Min(Best(i, k) + Best(k+1, j) + di-1dkdj ) for k  [i..(j-1)] • Best(i, i) = 0 // Base case

Example: Matrix Multiplication (9) • We could develop a Divide-and-Conquer approach to solving this problem: • public int best(int i, int j) { • int result; • if (i==j) { • result = 0; • } else { • int min = Integer.MAX_VALUE; • for (int k=i; k<j; k++) { • int next = best(i,k) + best(k+1,j) + d[i]*d[k]*d[j]; • min = Math.min(min,next); • } • result = min; • } • return result; • }

Example: Matrix Multiplication (10) • This approach will compute the correct answer, but it has tons of repeated work: • Best(2, 5) takes the min of • Best(2,2) + Best(3,5) and • Best(2,3) + Best(4,5) and • Best(2,4) + Best(5,5) • But then Best(2,4) needs: • Best(2,2) + Best(3,4) and • Best(2,3) + Best(4,4) • You can see the repeated work (in red) and this is just the tip of the iceberg • Turns out that this is an exponential algorithm because of the repeated work

Example: Matrix Multiplication (11) • So we can try Dynamic Programming • We start with the base cases • Best(i,i) = 0 for i  [1..n] • Then we can use the recursive part to generate the rest of the Best values from the bottom up • The question is what values need to be previously computed in order to solve for a particular Best(i,j) This is always the question in dynamic programming!

Example: Matrix Multiplication (12) • Each number in the 2D table represents the min # of mults from Ai to Aj • We are trying to get a value for the entire thing A1 to A6 so we want a value in the upper right triangular matrix • No values in the bottom left triangular matrix because there are not possible • Start with filling in the base cases (when i==j) the diagonal j i

Example: Matrix Multiplication (13) • What can we fill in next? • Best(1,2) requires values for Best(1,1) and Best(2,2) • We have those values • So Best(1,2) = Best(1,1) + Best(2,2) + d0*d1*d2 =0 + 0 + (5*2*3) =30 j i Best(i, j) = Min(Best(i, k) + Best(k+1, j) + di-1dkdj ) for k  [i..(j-1)] • A1 A2 A3 A4 A5 A6 • 5x2 2x3 3x4 4x6 6x7 7x8 • d0 d1 d1 d2 d2 d3 d3 d4 d4 d5 d5 d6

Example: Matrix Multiplication (14) • Similar for the other values along that diagonal j i Best(i, j) = Min(Best(i, k) + Best(k+1, j) + di-1dkdj ) for k  [i..(j-1)] • A1 A2 A3 A4 A5 A6 • 5x2 2x3 3x4 4x6 6x7 7x8 • d0 d1 d1 d2 d2 d3 d3 d4 d4 d5 d5 d6

Example: Matrix Multiplication (15) • Now we have enough values to fill in the next diagonal • Best(1,3) is the min of 2 possible values:A1 (A2 A3)(A1 A2) A3 • So for each value in the table, we need the values to its left and below it to be previously computed j i Best(i, j) = Min(Best(i, k) + Best(k+1, j) + di-1dkdj ) for k  [i..(j-1)] • A1 A2 A3 A4 A5 A6 • 5x2 2x3 3x4 4x6 6x7 7x8 • d0 d1 d1 d2 d2 d3 d3 d4 d4 d5 d5 d6

Example: Matrix Multiplication (16) Best(i, j) = Min(Best(i, k) + Best(k+1, j) + di-1dkdj ) for k  [i..(j-1)] • Now we have enough values to fill in the next diagonal • Best(1,3) is the min of 2 possible values:A1 (A2 A3)(A1 A2) A3 • So for each value in the table, we need the values to its left and below it to be previously computed • Filling in the next diagonal j i • A1 A2 A3 A4 A5 A6 • 5x2 2x3 3x4 4x6 6x7 7x8 • d0 d1 d1 d2 d2 d3 d3 d4 d4 d5 d5 d6

Example: Matrix Multiplication (17) • Add the other diagonals • Note that here there were 5 different possible ways to parenthesize • (A1) (A2 A3 A4 A5 A6) • (A1 A2) (A3 A4 A5 A6) • (A1 A2 A3) (A4 A5 A6) • (A1 A2 A3 A4) (A5 A6) • (A1 A2 A3 A4 A5) (A6) j i

Example: Matrix Multiplication (18) • The Best table we just built tells us that the optimal number of multiplications is 348 • But it doesn’t tell us the correct way of producing this optimal • Much like the first table in Ncoins told us the optimal number of coins to use, but not which ones they were • We need a second table in order to determine the optimal factorization of the matrices • We will store the “winner” at each stage • Much like the second table we needed in Ncoins

Example: Matrix Multiplication (19)

Example: Matrix Multiplication (20) A1 A2 A3 A4 A5 A6 (A1) (A2 A3 A4 A5 A6) (A1) ((A2 A3 A4 A5) (A6)) (A1) (((A2 A3 A4)(A5)) (A6)) (A1) ((((A2 A3)(A4))(A5)) (A6)) (A1) (((((A2)(A3))(A4))(A5)) (A6))

Example: Optimal String Alignment (1) • Images taken from http://www.sbc.su.se/~per/molbioinfo2001/dynprog/dynamic.html • What to find optimal (best) alignment between two strings where matches count as 1, mismatches count as 0, and “gaps” count as 0. • Example: • G A A T T C A G T T A • G G A T C G A • G _ A A T T C A G T T A • G G _ A _ T C _ G _ _ A • Notice that every alignment can start in exactly one of three ways: • Two non-gap characters • Non-gap above, gap below • Gap above, non-gap below Score == 6

Example: Optimal String Alignment (2) • Notice in an optimal alignment of (s1,s2,…,sn) with (t1,t2,…,tm) one of these must be true: • S2..n must be optimally aligned with T2..m • S2..n must be optimally aligned with T1..m • S1..n must be optimally aligned with T2..m • The score of each is then • Score(S2..n, T2..m) + Score(S1,T1) • Score(S2..n, T1..m) + Score(S1,gap) • Score(S1..n, T2..m) + Score(T1,gap) • We want to select the maximum of these, giving • Score(S1..n, T1..m) = • max(Score(S2..n, T2..m) + Score(S1,T1), • Score(S2..n, T1..m), • Score(S1..n, T2..m)) • base case given by scoring function Only gives score of optimal, not actual alignment

Example: Optimal String Alignment (3) • Let’s build a matrix, M, of values such that Mi,j is the optimal score for S1..i aligned with T1..j. • If we have this, Mn,m is the overall optimal score • M is zero-indexed to allow for beginning gap • Notice by our recurrence relation, • Mi,j = max(Mi-1,j-1 + Scorei,j,Mi,j-1,Mi-1,j)

Example: Optimal String Alignment (4) • Still need to find actual alignment that results in maximum score • Can be done by tracing back from optimal value to find where it must have originated • representing earlier optimal alignment • G _ A A T T C A G T T A • G G _ A _ T C _ G _ _ A

Principle of Optimality (1) • Although, at this point, it may seem that any optimization problem can be solved using Dynamic Programming – it is not the case • The principle of optimality must apply to in a given problem • The principle of optimality is said to apply in a problem if an optimal solution to an instance of a problem always contains optimal solutions to all sub-instances

Principle of Optimality (2) • For example: Shortest Path Problem • If the optimal path from Vi  Vj includes the Vk, then the sub-paths from Vi  Vk and Vk  Vj must also be optimal • These were the sub-instances we used to form the solution for the large instance • The Principle of Optimality is sometime difficult to prove, but one must prove it in order to prove that a Dynamic Programming solution will work • Sometimes finding a counter-example is easier

V1 • Principle of Optimality (3) • Consider the Longest Path Problem • Restrict to simple paths • No cycles • The longest path from V1 to V4 is V1V3V2V4 • However, the sub-path V1V3 is not the optimal (longest) from V1 to V3 • V1V3 = 1, but V1V2V3 = 4 • Thus, the principle of optimality doesn’t hold for the longest path problem 1 1 3 V2 V3 2 4 V4

Example: Edit Distance (1) • Edit distance is a measure of how far a particular word is away from another word • That is, the number of character edits one needs to make the two words match • A single “character edit” consists of either: • Insertion of a single character • sort  sport (insertion of p) • Deletion of a single character • sport  sort (deletion of p) • Changing a single character • computer  commuter (change p to m) Can anyone think of an application for this idea?

Example: Edit Distance (2) • Note that there are many sequences of edits that can change one word into another • We want the optimal (minimal in this case) • And since we are going to accomplish this using Dynamic Programming the first thing we will need is a recursive definition of the problem • Edist(str1, str2) • Here is the recursive definition: • Edist(ε, ε) = 0 • Edist(str, ε) = Edist(ε, str) = | str | • Edist(str1+ch1, str2+ch2) = Min( Edist(str1, str2) + (0 if ch1==ch2, 1 otherwise), Edist(str1+ch1, str2) + 1, Edist(str1, str2+ch2) + 1 )

Example: Edit Distance (3) • Note that in every recursive case that we recurse with at least one string shorter by 1 character • Both are shorter in the change case • That means it will keep calling itself until at least 1 string reaches the empty string at which time our base case kicks in • However, implementing this with a recursive divide and conquer approach would lead to an exponential running time because of all the repeated work • So we try a Dynamic Programming approach instead • Start by filling in a table with the base case information • Fill in the rest of the table bottom up until you reach your goal solution • We will again have a 2D table • The dimensions will be the length of the first string +1 by the length of the second string +1 • The +1s are there because we need a spot in the table for ε

Example: Edit Distance (4) • First fill in the base cases • Our goal is to find the edit distance for the entire string “cake” to the entire string “cat” • So the number we want to find is in the lower right corner

Example: Edit Distance (5) • The 3 recursive cases are: • Edist(i, j) uses • Edist(i-1, j-1) • Edist(i, j-1) • Edist(i-1, j) • Where i represents a substring of “cake” from character 1 up to character i (1 indexed) • And j is similar, but for “cat” • So this tells us that for each cell, we need the values from the cells to the upper left, to the left, and above Edist(ε, ε) = 0 Edist(str, ε) = Edist(ε, str) = | str | Edist(str1+ch1, str2+ch2) = Min( Edist(str1, str2) + (0 if ch1==ch2, 1 otherwise), Edist(str1+ch1, str2) + 1, Edist(str1, str2+ch2) + 1 )

Example: Edit Distance (6) • So we can fill in the table row by row and we end up with the given table

Example: Edit Distance (7) • And, again, we have a table which tells us the optimal number of edits, but not the sequence of edits required to actually change one into the other • Importance of the table depends on the application • For spell checking this isn’t important • For DNA alignment this will be important and we will need to build a second table to keep track of this information so we can trace back to determine the actual alignment

Dynamic Programming