Dynamic Programming

Dynamic Programming

Development of a Dynamic Programming Algorithm • Characterize the structure of a solution • Recursively define the value of a solution • Compute the value of a solution in a bottom-up fashion • Construct a solution from computed information

Definition • Algorithmic Pattern: (example - Trees) • Solution DP( Tree t ) • //Compute the base cases (leaves) • for all leaves l of tree t • table[l] = process(l); • //Compute the recursive cases bottom-up • for each node n of tree t • value =  • for each child c of n • value += table[c] • table[n] = value + process(n); • return table(root)

Applicability • Use the dynamic programming algorithmic pattern when ALL of the following are true: • The problem lends itself to division into sub-problems of the same type • The sub-problems have considerable overlap in their solution • An acceptable solution to the problem can be constructed from acceptable solutions to sub-problems • Extra memory is readily available

Well-Known Uses • Academic • “n” Coins • Mathematics • Fibonacci sequence • Binomial Coeficient • Matrix Multiplication • Graphs • Shortest Path • Binary Search Trees • Traveling Salesman • String • Edit distance (similarity) • String alignment

Example: NCoins • How many coins will be given in change for a purchase of up to $1.00 • Structure of Optimal Solution: The set of coins can be broken into 2 piles each of which is optimal • Recursive Definition

Example: NCoins • Build a table starting at the base case • Work from the bottom sub-problems to the top sub-problems • This will work as long as when we need to compute the value for N we already have the following values in our table: N-1, N-5, N-10, N-12, N-25 MinCoins

Example: NCoins • However, this table gives only the minimum number of coins needed • This is the thing being optimized • Also want the actual coins used • To do this we build a second table as we are building the first table • The second table contains, at each step, the coins that produced the optimal number of coins

Example: NCoins • We can then recursively reconstruct the list of coins used to produce the minimum number of coins • We stop when we reach a base case (0 in this example) • The second table is constructed during the construction of the first table, but is never used to determine values in the first table • And the first table is never used during the recursive reconstructing of the coin list MinCoins CoinUsed

Example: Matrix Multiplication col 2 row 1, col 2 row 1 x = 2x3 3x3 2x3 1x2 + 2x5 + 3x8 = 2 + 10 + 24 = 36 • rows from the first matrix • columns from the second matrix

Example: Matrix Multiplication • “inner” dimensions must match • result is “outer” dimension • Examples: • 2  3 X 3  3 = 2x3 • 3  4 X 4  5 = 3x5 • 2  3 X 4  3 = cannot multiply • Question: Does AB = BA? • Hint: let A be a 23 matrix and B be 32 matrix

Example: Matrix Multiplication • publicstatic Matrix mult(Matrix m1, Matrix m2) { • Matrix result = new Matrix(); • for (inti=0; i<m1.numRow(); i++) { • for (int j=0; j<m2.numCol(); j++) { • double total = 0.0; • for (int k=0; k<m1.numCol(); k++) { • total += • (m1.m[i][k]*m2.m[k][j]); • } • result.m[i][j] = total; • } • } • return result; • } How many multiplications of matrix elements are performed?

Example: Matrix Multiplication • Given the following matrices: • A is 20  2 • B is 2  30 • C is 30  12 • D is 12  8 • Specifically, how many multiplications does it take to compute A  B  C  D ? • First thing you should ask is “can it even be done”?

Example: Matrix Multiplication • Matrix multiplication is an associative operation meaning that the order in which we multiply doesn’t matter • A(B(C D)) or (A B)(C D) or A((B C)D)) or ((A B)C)D or (A(B C))D • However, each of these has a different number of multiplications: • A(B(CD)) = (30  12  8) + (2  30  8) + (20  2  8) = 3,680 • (AB)(CD) = (20  2  30) + (30  12  8) + (20  30  8) = 8,880 • A((BC)D) = (2  30  12) + (2  12  8) + (20  2  8) = 1,232 • ((AB)C)D = (20  2  30) + (20  30  12) + (20  12  8) = 10,320 • (A(BC))D = (2  30  12) + (20  2  12) + (20  12  8) = 3,120 • Obviously, there is an optimal solution • A((BC)D)) = (2  30  12) + (2  12  8) + (20  2  8) = 1,232 • How do we figure out that this one is the best?

Example: Matrix Multiplication • At the top level, given 4 matrices, there are 3 ways of parenthesizing this set into 2 subsets: • (A1) (A2 A3 A4 ) or (A1 A2 ) (A3 A4)or(A1 A2 A3 ) (A4) • The best way parenthesizing for this set of 4 is given by: • Best(firstSet) + Best(secondSet) + amount to multiply resulting 2 matrices • This is simply a recursive definition of the problem

Example: Matrix Multiplication • As an example: • A1 A2 A3 A4 A5 A6 • 5  2 2  3 3  4 4  6 6  7 7  8 • d0 d1 d1 d2 d2 d3 d3 d4 d4 d5 d5 d6 • There are 5 possible ways to parenthesize this expression, each one defined as: • Best(1, k) + Best(k+1, 6) + d0dkd6 for k  [1, 5] • We need to take the min of these: • Best(1, 6) = Min(Best(1, k) + Best(k+1, 6) + d0dkd6 ) for k  [1, 5]

Example: Matrix Multiplication • There was nothing in the previous work that forced the first matrix to be A1 and the last to be A6 • Thus, we can generalize this to be: • Best(i, j) = Min(Best(i, k) + Best(k+1, j) + di-1dkdj ) for k  [i, (j-1)] • Best(i, i) = 0 // Base case

Example: Matrix Multiplication • We could develop a Divide-and-Conquer approach to solving this problem: • public int best(int i, int j) { • int result; • if (i==j) { • result = 0; • } else { • int min = Integer.MAX_VALUE; • for (int k=i; k<j; k++) { • int next = best(i,k) + best(k+1,j) + d[i]*d[k]*d[j]; • min = Math.min(min,next); • } • result = min; • } • return result; • }

Example: Matrix Multiplication • This approach will compute the correct answer, but it has tons of repeated work: • Best(2, 5) takes the min of • Best(2,2) + Best(3,5) and • Best(2,3) + Best(4,5) and • Best(2,4) + Best(5,5) • But then Best(2,4) needs: • Best(2,2) + Best(3,4) and • Best(2,3) + Best(4,4) • You can see the repeated work (in red) and this is just the tip of the iceberg • Turns out that this is an exponential algorithm because of the repeated work

Example: Matrix Multiplication • So we can try Dynamic Programming • We start with the base cases • Best(i,i) = 0 for i  [1, n] • Then we can use the recursive part to generate the rest of the Best values from the bottom up • The question is what values need to be previously computed in order to solve for a particular Best(i,j) This is always the question in dynamic programming!

Example: Matrix Multiplication • Each number in the 2D table represents the min # of mults from Ai to Aj • We are trying to get a value for the entire thing A1 to A6 so we want a value in the upper right triangular matrix • No values in the bottom left triangular matrix because they are not possible • Start with filling in the base cases (when i==j) the diagonal j i

Example: Matrix Multiplication • What can we fill in next? • Best(1,2) requires values for Best(1,1) and Best(2,2) • We have those values • So Best(1,2) = Best(1,1) + Best(2,2) + d0*d1*d2 =0 + 0 + (5*2*3) =30 j i • A1 A2 A3 A4 A5 A6 • 5 x 2 2 x 3 3 x 4 4 x 6 6 x 7 7 x 8 • d0 d1 d1 d2 d2 d3 d3 d4 d4 d5 d5 d6 Best(i, j) = Min(Best(i, k) + Best(k+1, j) + di-1dkdj ) for k  [i, (j-1)]

Example: Matrix Multiplication • Similar for the other values along that diagonal j i • A1 A2 A3 A4 A5 A6 • 5 x 2 2 x 3 3 x 4 4 x 6 6 x 7 7 x 8 • d0 d1 d1 d2 d2 d3 d3 d4 d4 d5 d5 d6 Best(i, j) = Min(Best(i, k) + Best(k+1, j) + di-1dkdj ) for k  [i, (j-1)]

Example: Matrix Multiplication • Now we have enough values to fill in the next diagonal • Best(1,3) is the min of 2 possible values:A1 (A2 A3)(A1 A2) A3 • So for each value in the table, we need the values to its left and below it to be previously computed j i • A1 A2 A3 A4 A5 A6 • 5 x 2 2 x 3 3 x 4 4 x 6 6 x 7 7 x 8 • d0 d1 d1 d2 d2 d3 d3 d4 d4 d5 d5 d6 Best(i, j) = Min(Best(i, k) + Best(k+1, j) + di-1dkdj ) for k  [i, (j-1)]

Example: Matrix Multiplication • Now we have enough values to fill in the next diagonal • Best(1,3) is the min of 2 possible values:A1 (A2 A3)(A1 A2) A3 • So for each value in the table, we need the values to its left and below it to be previously computed • Filling in the next diagonal j i • A1 A2 A3 A4 A5 A6 • 5 x 2 2 x 3 3 x 4 4 x 6 6 x 7 7 x 8 • d0 d1 d1 d2 d2 d3 d3 d4 d4 d5 d5 d6 Best(i, j) = Min(Best(i, k) + Best(k+1, j) + di-1dkdj ) for k  [i, (j-1)]

Example: Matrix Multiplication • Add the other diagonals • There there were 5 different possible ways to parenthesize • (A1) (A2 A3 A4 A5 A6) • (A1 A2) (A3 A4 A5 A6) • (A1 A2 A3) (A4 A5 A6) • (A1 A2 A3 A4) (A5 A6) • (A1 A2 A3 A4 A5) (A6) j i Best(i, j) = Min(Best(i, k) + Best(k+1, j) + di-1dkdj ) for k  [i, (j-1)]

Example: Matrix Multiplication • The Best table we just built tells us that the optimal number of multiplications is 348 • But it doesn’t tell us the correct way of producing this optimal • Much like the first table in Ncoins told us the optimal number of coins to use, but not which ones they were • We need a second table in order to determine the optimal factorization of the matrices • We will store the “winner” at each stage • Much like the second table we needed in Ncoins

Example: Matrix Multiplication

Example: Matrix Multiplication A1 A2 A3 A4 A5 A6 (A1) (A2 A3 A4 A5 A6) (A1) ((A2 A3 A4 A5) (A6)) (A1) (((A2 A3 A4)(A5)) (A6)) (A1) ((((A2 A3)(A4))(A5)) (A6)) (A1) (((((A2)(A3))(A4))(A5)) (A6))

Example: Matrix Multiplication public intminMult(int n, int[] d) { for(inti = 1; i <= n; i++) { //base case: Middle Diagonal M[i][i] = 0; //M is optimal table } for(intdia = 1; dia < n; dia++) { //iterate through each diagonal for(inti = 1; i <= n-dia; i++) { //Fill in M & P int j = i + dia; intminM = inf; int p = -1; for(int k = i; k < j; k++) { //Find M[i][j] = min... intMij = M[i][k] + M[k+1][j] + d[i-1]*d[k]*d[j]; if(Mij < minM) { minM = Mij; p = k; } } M[i][j] = minM; //M is optimal table P[i][j] = p; //P tells you where to break } } return M[1][n]; }

Example: Edit Distance • Edit distance is a measure of how far a particular word is away from another word • The number of character edits one needs to make the two words match • A single “character edit” consists of either: • Insertion of a single character • sort  sport (insertion of p) • Deletion of a single character • sport  sort (deletion of p) • Changing a single character • computer  commuter (change p to m) Can anyone think of an application for this idea?

Example: Edit Distance • There are many sequences of edits that can change one word into another • We want the optimal (minimal in this case) • And since we are going to accomplish this using Dynamic Programming the first thing we will need is a recursive definition of the problem • Edist(str1, str2) • Here is the recursive definition: • Edist(ε, ε) = 0 • Edist(str, ε) = Edist(ε, str) = | str | • Edist(str1+ch1, str2+ch2) = Min( Edist(str1, str2) + (0 if ch1==ch2, 1 otherwise), Edist(str1+ch1, str2) + 1, Edist(str1, str2+ch2) + 1 )

Example: Edit Distance • In every recursive case, we recurse with at least one string shorter by 1 character • Both are shorter in the change case • That means it will keep calling itself until at least 1 string reaches the empty string at which time our base case kicks in • However, implementing this with a recursive divide and conquer approach would lead to an exponential running time because of all the repeated work • So we try a Dynamic Programming approach instead • Start by filling in a table with the base case information • Fill in the rest of the table bottom up until you reach your goal solution • We will again have a 2D table • The dimensions will be the length of the first string +1 by the length of the second string +1 • The +1s are there because we need a spot in the table for ε

Example: Edit Distance • First fill in the base cases • Our goal is to find the edit distance for the entire string “cake” to the entire string “cat” • So the number we want to find is in the lower right corner

Example: Edit Distance • The 3 recursive cases are: • Edist(i, j) uses • Edist(i-1, j-1) • Edist(i, j-1) • Edist(i-1, j) • Where i represents a substring of “cake” from character 1 up to character i (1 indexed) • And j is similar, but for “cat” • So this tells us that for each cell, we need the values from the cells to the upper left, to the left, and above Edist(ε, ε) = 0 Edist(str, ε) = Edist(ε, str) = | str | Edist(str1+ch1, str2+ch2) = Min( Edist(str1, str2) + (0 if ch1==ch2, 1 otherwise), Edist(str1+ch1, str2) + 1, Edist(str1, str2+ch2) + 1 )

Example: Edit Distance • Fill in the table row by row Edist(ε, ε) = 0 Edist(str, ε) = Edist(ε, str) = | str | Edist(str1+ch1, str2+ch2) = Min( Edist(str1, str2) + (0 if ch1==ch2, 1 otherwise), Edist(str1+ch1, str2) + 1, Edist(str1, str2+ch2) + 1 )

Example: Edit Distance • Until we end up with the final table

Example: Edit Distance • Still need to find actual sequence of edits that results in the minimum cost • We can do this without creating a new table • Trace back from optimal value to find where it must have originated Edit CAKE to CAT

Example: Edit Distance • Still need to find actual sequence of edits that results in the minimum cost • We can do this without creating a new table • Trace back from optimal value to find where it must have originated • Edit CAKE to CAT • Change E to T

Example: Edit Distance • Still need to find actual sequence of edits that results in the minimum cost • We can do this without creating a new table • Trace back from optimal value to find where it must have originated • Edit CAKE to CAT • Change E to T • Delete K

Example: Edit Distance • Still need to find actual sequence of edits that results in the minimum cost • We can do this without creating a new table • Trace back from optimal value to find where it must have originated • Edit CAKE to CAT • Change E to T • Delete K • A Remains

Example: Edit Distance • Still need to find actual sequence of edits that results in the minimum cost • We can do this without creating a new table • Trace back from optimal value to find where it must have originated • Edit CAKE to CAT • Change E to T • Delete K • A Remains • C Remains

Example: Edit Distance • Pseudocode: • Your Turn! • Recursive Definition • Edist(ε, ε) = 0 • Edist(str, ε) = Edist(ε, str) = | str | • Edist(str1+ch1, str2+ch2) = Min( Edist(str1, str2) + (0 if ch1==ch2, 1 otherwise), Edist(str1+ch1, str2) + 1, Edist(str1, str2+ch2) + 1 ) • Table

Dynamic Programming