15-211 Fundamental Data Structures and Algorithms

15-211Fundamental Data Structures and Algorithms Dynamic Programming Part 2 Ananda Guna March 20, 2003

Two main methods • To keep track of sub-instances • Memoizing : Store Partial Results in a hashtable for later reuse • Easier to keep track of sub-instances that are solved • A top-down approach • Dynamic Programming: Construct a table of Partial results • A bottom-up approach • More later..

Memoization • Name coined by Donald Michie, Univ of Edinburgh (1960s) • Fib is the Perfect Example • Useful for • game searches • evaluation functions • web caching

Dynamic programming • Underlying idea: • Use memoization so that overlap can be exploited. • As smaller subproblems are solved, solving the larger subproblems might get easier. • Can sometimes reduce seemingly exponential problems to polynomial time.

Using dynamic programming • Key ingredients: • Simple subproblems. • Problem can be broken into subproblems, typically with solutions that are easy to store in a table/array. • Subproblem optimization. • Optimal solution is composed of optimal subproblem solutions. • Subproblem overlap. • Optimal solutions to separate subproblems can have subproblems in common.

Homework Example - Again • Consider a homework assignment with different parts • Each part will have a point value and a time to do it. • You have 15 hours • Which parts should you do to maximize the points?

The Big question? • How to figure out the next row from the previous ones? • valueArray[i][t] = MAX { valueArray[i-1][t] OR // don't use item i { valueArray[i-1][t - time[i]] + value[i] //use item i.

Complete the table valueArray: t 0 1 2 3 4 5 6 7 8 9 15 +------------------------------------------------------ i :0| 0........... |------------------------------------------------------ 1| 0 0 0 7 7 |------------------------------------------------------ 2| 0 0 0 7 9 9 |------------------------------------------------------ 3| 0 0 5 7 9 12 14 |------------------------------------------------------ 4| 0 0 5 7 9 12 14 16 ...| • What is the solution if i = 7 & t = 15?

The recursive Program ComputeValue(N,T) // T = time left, N = # items still to choose from { if (T <= 0 || N = 0) return 0; if (time[N] > T) return ComputeValue(N-1,T); // can't use Nth item return max(value[N] + ComputeValue(N-1, T - Time[N]), ComputeValue(N-1, T)); } • What is the running time of this algorithm? This is exponetial

Using Memoizing ComputeValue(N,T) // T = time left, N = # items still to choose from { if (T <= 0 || N = 0) return 0; if (arr[N][T] != unknown) return arr[N][T]; // OK, we haven't computed it yet. Compute and store. if (time[N] > T) arr[N][T] = ComputeValue(N-1,T); else arr[N][T] = max(value[N] + ComputeValue(N-1, T - Time[N]), ComputeValue(N-1, T)); return arr[N][T]; } • What is the running time of this memoized version? • This is O(N*T)

Longest Common Subsequence Problem

CGATAATTGAGA A subsequence AAAG String subsequences • Consider DNA sequences. • Strings over the alphabet {A,C,G,T}. • Given a string X = x0x1x2…xn-1: • a subsequence of X is any string that is of the form xi1xi2…xik where ij<ij+1. • Example:

LCS = 6 LCS problem • The longest common subsequence problem: • Given two strings X and Y, find the longest string S that is a subsequence of both X and Y. CGATAATTGAGA GTTCCTAATA

A brute-force approach • A brute-force algorithm: • for each subsequence S of X: • determine whether S is a subsequence of Y. • if yes, then remember if it is the longest encountered so far. • Return the longest subsequence found. • Requires O(m2n) time! • There are 2n different subsequences of X. • Determining whether S is a subsequence of Y requires m time.

01234567891011 Y = CGATAATTGAGA GTTCCTAATA X = 0123456789 DP ingredients • Simple subproblems? • (that are easy to store in a table?) • Yes! • Let L(i,j) = LCS for the first i+1 letters of X and first j+1 letters of Y. • L(-1,j) = L(i,-1) = 0. L(8,9) = 5

L -1 0 1 2 3 4 5 6 7 8 9 1011 -1 0 1 2 3 4 5 6 7 8 9 Storing solutions in a table Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A memo table X 5

DP ingredients, cont’d • Subproblem optimization? • Yes! Optimal solution for L(i,j) is a combination of optimal solutions for L(k,l) for some k,l such that ki and lj. • Let X = x0x1x2…xn-1 and Y = y0y1y2…ym-1.

01234567891011 Y = CGATAATTGAGA L(8,10) = 5 GTTCCTAATA X = 0123456789 DP ingredients, cont’d • Subproblem optimization, Case 1: Example: Suppose i=9 and j=11 • If xi=yj, then • L(i,j) = L(i-1,j-1) + 1

012345678910 Y = CGATAATTGAG L(9,9) = 6 L(8,10) = 5 GTTCCTAATA X = 0123456789 Ingredients, cont’d • Subproblem optimization, Case 2: Example: Suppose i=9 and j=10 L(i,j) = max(L(i-1,j),L(i,j-1)).

DP ingredients, cont’d • Subproblem optimization? • Yes! Optimal solution for L(i,j) is a combination of optimal solutions for L(k,l) for some k,l such that ki and lj. • Let X = x0x1x2…xn-1 and Y = y0y1y2…ym-1. • Case 1: If xi=yj, then • L(i,j) = L(i-1,j-1) + 1. • Case 2: If xiyj, then • L(i,j) = max(L(i-1,j),L(i,j-1)).

DP ingredients, cont’d • Subproblem overlap? • Yes! Solutions for L(i-1,j-1) can be re-used in computing L(i,j), L(i-1,j), and L(i,j-1).

We have the ingredients • So, we have all of the ingredients for an application of dynamic programming. • We will start with L(0,0) and fill out a memoization table of optimal solutions to all subproblems.

The problem 01234567891011 Y = CGATAATTGAGA GTTCCTAATA X = 0123456789

L -1 0 1 2 3 4 5 6 7 8 9 1011 -1 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 Algorithm initialization Y 01234567891011 Y = CGATAATTGAGA X = GTTCCTAATA 0 0 0 0 0 0 0 0 0 0 0 0123456789 X

L -1 0 1 2 3 4 5 6 7 8 9 1011 -1 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 Computing L(0,j) Y 01234567891011 Y = CGATAATTGAGA X = GTTCCTAATA 0 0 0 0 0 0 0 0 0 0 0 0123456789 0 1 1 1 1 1 1 1 1 1 1 1 X Case 1: If xi=yj, then • L(i,j) = L(i-1,j-1) + 1 Case 2: If xiyj, then • L(i,j) = max(L(i-1,j),L(i,j-1))

L -1 0 1 2 3 4 5 6 7 8 9 1011 -1 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 Computing L(1,j) Y 01234567891011 Y = CGATAATTGAGA X = GTTCCTAATA 0 0 0 0 0 0 0 0 0 0 0 0123456789 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 2 2 2 2 2 2 2 2 2 X Case 1: If xi=yj, then • L(i,j) = L(i-1,j-1) + 1 Case 2: If xiyj, then • L(i,j) = max(L(i-1,j),L(i,j-1))

L -1 0 1 2 3 4 5 6 7 8 9 1011 -1 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 Computing L(2,j) Y 01234567891011 Y = CGATAATTGAGA X = GTTCCTAATA 0 0 0 0 0 0 0 0 0 0 0 0123456789 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 2 2 2 2 2 2 2 2 2 0 1 1 2 2 2 3 3 3 3 3 3 X Case 1: If xi=yj, then • L(i,j) = L(i-1,j-1) + 1 Case 2: If xiyj, then • L(i,j) = max(L(i-1,j),L(i,j-1))

L -1 0 1 2 3 4 5 6 7 8 9 1011 -1 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 Computing L(3,j) Y 01234567891011 Y = CGATAATTGAGA X = GTTCCTAATA 0 0 0 0 0 0 0 0 0 0 0 0123456789 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 2 2 2 2 2 2 2 2 2 0 1 1 2 2 2 3 3 3 3 3 3 1 1 1 2 2 2 3 3 3 3 3 3 X Case 1: If xi=yj, then • L(i,j) = L(i-1,j-1) + 1 Case 2: If xiyj, then • L(i,j) = max(L(i-1,j),L(i,j-1))

L -1 0 1 2 3 4 5 6 7 8 9 1011 -1 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 The complete table Y 01234567891011 Y = CGATAATTGAGA X = GTTCCTAATA 0 0 0 0 0 0 0 0 0 0 0 0123456789 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 2 2 2 2 2 2 2 2 2 0 1 1 2 2 2 3 3 3 3 3 3 1 1 1 2 2 2 3 3 3 3 3 3 1 1 1 2 2 2 3 3 3 3 3 3 1 1 1 2 2 2 3 4 4 4 4 4 1 1 2 2 3 3 3 4 4 5 5 5 1 1 2 2 3 4 4 4 4 5 5 6 1 1 2 3 3 4 5 5 5 5 5 6 1 1 2 3 4 4 5 5 5 6 6 6 X Case 1: If xi=yj, then • L(i,j) = L(i-1,j-1) + 1 Case 2: If xiyj, then • L(i,j) = max(L(i-1,j),L(i,j-1))

L -1 0 1 2 3 4 5 6 7 8 9 1011 -1 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 Reading the subsequence Y 01234567891011 Y = CGATAATTGAGA X = GTTCCTAATA 0 0 0 0 0 0 0 0 0 0 0 0123456789 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 2 2 2 2 2 2 2 2 2 0 1 1 2 2 2 3 3 3 3 3 3 1 1 1 2 2 2 3 3 3 3 3 3 1 1 1 2 2 2 3 3 3 3 3 3 1 1 1 2 2 2 3 4 4 4 4 4 1 1 2 2 3 3 3 4 4 5 5 5 1 1 2 2 3 4 4 4 4 5 5 6 1 1 2 3 3 4 5 5 5 5 5 6 1 1 2 3 4 4 5 5 5 6 6 6 X

Matrix Multiplication

Matrix Multiplication • If A is a mxp matrix and B is a pxn matrix, then AB is a mxn matrix. • If C = AB, then C is given by • C[i][j] =  A[i][k]*B[k][j] • O(m*n*p) algorithm. • There are speed up algorithms but we will skip them for now… i=1..m, j=1..n k=1…p

Matrix multiplication • Four matrices. Compute ABCD. • A: 50  10 B: 10  40 • C: 40  30 D: 30  5 • Note that matrix multiplication is associative, not commutative • Can compute the product in many ways • (((AB)C)D) (A((BC)D)) • ((AB)(CD)) (A(B(CD))) etc.

Question? • How many ways can we associate (ABCD)? • We can choose the shape of the tree (associativity) , but not the leaf order(commutativity). • Alternatively, how many full binary trees with n leaves are there? • Given by recursive formula P(1) = 1 • P(n) = P(i).P(n-i) (i=1..n-1) • It can be shown that P(n)=(n-3/2 4n)  A D B C A((BC)D)

Matrix multiplication • Cost of multiplication • M1: p x q M2: q x r • Naïve algorithm requires pqr multiplications • Example costs • (((AB)C)D) • 50x10x40 + 50x40x30 + 50x30x5 = 87,500 • (A(B(CD))) • 40x30x5 + 10x40x5 + 50x10x5 = 10,500

Problem formulation • Let Ai,j = AiAj (ij) • Let c(i,j) = cost of optimal solution for Ai,j • Let each Ai have dimension didi+1 • Then:

Example Suppose 5 matrices, with dimensions d = (10, 5, 1, 10, 2, 10) Then

c 1 2 3 4 5 1 2 3 4 5 0 0 0 Exercise: Fill in the rest! 0 0 Susceptible to dynamic programming? 50 d = (10, 5, 1, 10, 2, 10) 50 20 200

Dynamic programming solution public static void optMat (int[] d, long[][] c) { int n = d.length - 1; for (int left = 1; left <= n; left++) c[left][left] = 0; for (int k = 1; k < n; k++) // k is (right-left). for (int left = 1; left <= n - k; left++) { // For each position int right = left + k; c[left][right] = INFINITY; for (int i = left; i < right; i++) { long cost = c[left][i] + c[i+1][right] + d[left-1] * d[i] * d[right]; if(cost < c[left][right]) c[left][right] = cost; } } } public static final long INFINITY = Long.MAX_VALUE;

Optimal Binary Search Trees

Optimal BST’s • Given a BST and probabilities for each node, find the BST that minimizes the cost Example:

Possible BST This is clearly not the optimal cost

DP Solution • Optimal sub-problems giving the solution to the problem • That is, the optimal tree composed of root and at most two optimal sub-trees (left and right) • It turns out that a recurrence formula for the optimal tree that contains nodes ai..aj given by • Where pij=pi+pi+1 +….+pj • If you simplify an create a table for T(i,j), then T(1,n) is what we are looking for. • More in 15-451

After Spring Break • We will talk about Graphs • Read Chapter 14 • Work on Homework 5 • HAVE A GOOD BREAK

15-211 Fundamental Data Structures and Algorithms