420 likes | 601 Views
A simple scheduling problem. Consider the following problem: given a processor and a set of n tasks, each with a known time requirement, determine how to schedule the tasks so that the sum of the completion times is minimized.
E N D
A simple scheduling problem • Consider the following problem: • given a processor and a set of n tasks, each with a known time requirement, • determine how to schedule the tasks so that the sum of the completion times is minimized. • If the tasks all have different owners, this minimizes the sum of the owners' waiting times • and thus also the average of these times
A greedy solution algorithm • The optimal solution is to schedule the tasks in order of their time requirements, from shorter to longer. • That is, we always choose the shortest unconsidered item to perform next • This is a greedy algorithm • we don't consider whether scheduling a different task might help us in the long run
Correctness of the algorithm • Suppose that a schedule has a longer task L scheduled before a shorter task S. • Then swapping S and L will • advance the completion time for any task in between, • reduce the sum of S's completion time and L's, • and not affect any other task • So this change reduces the total waiting time. • and therefore the schedule can't have been optimal
Correctness of greedy algorithms • Correctness proofs of other greedy algorithms are more complicated, but not greatly so. • Typical complications are • There isn't usually a simple notion like "unsorted" to describe the concept of "not constructed by the greedy algorithm" • There may be several optimal solutions
Structure of correctness proofs • In general, correctness proofs work like so: • Let G be the solution obtained by the greedy algorithm. • Let O be an optimal solution. • If O = G, we're done • If O ≠ G, then there's another optimal solution O' that is closer to G than O is. Continue modifying O until it equals G
Formal structure of correctness proofs • Formally, the proofs are minimal counterexample proofs • We let O be the optimal solution that's closest to G • The trick is to define "closer to" appropriately. • Also, we do need to make sure that any optimal solution can be changed to G in a finite number of steps.
Extensions of the scheduling problem • A similar algorithm works if there are P processors available • One might modify this version of the problem to ask how to minimize the overall completion time • In this case, the problem is NP-complete!
Minimum-length encoding • If an m-bit character code like ASCII or Unicode is used to encode a string of length n, the encoding will have length mn. • It's possible to get a shorter expected length by using a variable-length character code. • The trick is to give the more frequent characters shorter codes.
The Huffman algorithm • Suppose that the probability of the ith character ci is known (or estimated) to be pi • Suppose also that the jth character in the string is independent of all the others • Then there's a greedy algorithm that takes {pi} and constructs a character code with the minimum expected encoding length
Huffman output as a tree • We may represent a variable-length character code as a binary tree T, as in Figure 10.11 of Weiss • this representation automatically tells where one character code ends and the next begins • The quantity to be minimized is c(T) = S pidi • where di is the depth of ci in the tree • The Huffman algorithm doesn’t assume the pi are probabilities (i.e, that their sum is 1)
The Huffman algorithm – details • The Huffman algorithm maintains a set of binary trees, each with a weight • Initially the trees have size 1 with roots ci and weights pi • Until the set has size 1, the algorithm • takes the two trees of least weight pj and pk, • installs them as subtrees of a new tree with weight pj + pk, and • replaces the two trees by the new tree
The Huffman algorithm – summary • The algorithm’s output is the final remaining tree (whose weight is S pidi) • The algorithm makes C-1 passes if there are C characters • If a heap is used, the cost per iteration is O(log C), for an overall cost is O(C log C) • the time to build the initial heap can be ignored • Note that C is quite small compare to the length of the typical encoded string • so a naïve minimization algorithm isn’t very costly
Correctness of the Huffman algorithm – overview • Let N be the smallest input size for which the algorithm fails, and let T be an optimal tree of N leaves for an instance where it fails • We may construct a tree T’ for an instance of size N-1 by • swapping the two leaves of least weight pj and pk with two siblings at the lowest level • replacing the new parent of these leaves with a leaf of weight pj + pk
Contradiction to Huffman failure • If H and H’ are the Huffman trees for the old and new instances, we have c(T’) ≤ c(T)-pj-pk < c(H)-pj-pk = c(H’) <= c(T’) • the first equality comes from T’s optimality, and the definition of c • the first inequality from nonoptimality of H • the second equality from the way Huffman works • the second inequality from the minimality of M • So c(T’) < c(T’) • contradicting the assumption that Huffman fails
Recurrences in divide-and-conquer • We’ve seen that the time complexity of divide-and-conquer algorithms can typically be expressed by a recurrence • e.g., T(n) = 2T(n/2) + cn for mergesort • There’s a master theorem that gives closed form solutions for many such recurrences • it’s given in Weiss as Theorem 10.6, p. 449
Recurrences in divide-and-conquer • For T(n) = 2T(n/2) + cn • we have a=2, b=2, and k=1 • and thus T(n) is O(n log n), since a = bk • For binary search in an array, we have • T(n) = T(n/2) + c, and thus a=1, b=2, and k=0 • so T(n) is O(log n), since a = bk • For Sec. 7.7.6 selection, if all splits are even • T(n) = T(n/2) + cn, and thus a=1, b=2, and k=1 • so T(n) is O(n), since a < bk
Multiplying integers • To multiply two n-digit numbers, the conventional multiplication algorithm uses n2 multiplications • This number can be reduced by a divide-and-conquer strategy • at the cost of extra (linear-time) additions
Divide-and-conquer multiplication • Splitting the digits of each factor evenly corresponds to writing the factors as (a + b∙10n/2) and (c + d∙10n/2) • where a, b, c, and d all have length n/2 • and the product is (ac + [ad+bc]10n/2 + bd∙10n/2) • Our recurrence is then T(n) = 4T(n/2) + cn • since there are 4 subproblems, and some addition • The master theorem gives that T(n) = O(n2) • which is not an asymptotic improvement
A trick • We can get away only with 3 subproblems • namely ac, bd, and (a+b)(c+d) • The desired product is • ac + [(a+b)(c+d)-ac-bd]10n/2 + bd∙10n/2 • Our recurrence is then T(n) = 3T(n/2) + cn • with 3 subproblems, addition, and subtraction • The master theorem gives that T(n) = O(nlg 3) • where lg 3 = log23 ≈ 1.59 • which is an asymptotic improvement
A technique* • This approach may be used for binary multiplication • with 2 replacing 10 • And polynomial multiplication • with n replacing 10 • And matrix multiplication • with 7 replacing the 8 of ordinary divide & conquer • giving an exponent of log2 7 rather than 3 = log2 8 • the 7 subproblems were hard to find • and aren’t very useful in practice * a trick that works more than once
Divide & conquer algorithms • Relatively easy to design • Usually possible • especially when processing recursive structures • Relatively easy to prove correct • Analysis is often straightforward • using special techniques, or Master Theorem 10.6 • Often rather efficient • especially for processing recursive structures • but sometimes subproblems recur frequently
Divide & conquer algorithms – good examples • Good examples • binary search • mergesort • Strassen’s matrix multiplication
Dynamic programming algorithms • Often easy to design • usually implemented bottom up • Usually possible • although not always better than divide & conquer • Analysis is often straightforward • time complexity is often that of constructing a table
Dynamic programming algorithms • Not always highly efficient • except in the case of repeated subproblems • when they can be more efficient than divide-and-conquer versions • e.g., Fibonacci numbers & binomial coefficients • Good examples • Floyd's algorithm (all-pairs shortest paths) • Optimal binary search tree
Greedy algorithms • Often easy to design • Often aren't available • Often difficult to prove correct, except perhaps by minimal counterexample, i.e. • find the optimal solution closest to the greedy solution (e.g., with the latest first difference) • show that if it differs from the greedy solution, it can be modified to look more like that solution, while remaining optimal (a contradiction)
Greedy algorithms • Usually easy to analyze • Usually efficient • Good examples • Dijkstra's algorithm (single-source shortest paths) • Prim's and Kruskal's algorithms (minimum-cost spanning trees) • Huffman’s algorithm (minimum-length encoding)
Randomized algorithms • Not always easy to design • Not always helpful • Not always correct • and probability of correctness often hard to find • Analysis of average-case behavior is often awkward • this is generally the relevant metric • Often more efficient for a given problem than other algorithms • Good examples • quicksort (with randomly chosen pivot)
Branch-and-bound algorithms (backtracking) • Usually easy to design • Usually available • Easy to prove correct • Usually easy to analyze in worst-case • but often average-case behavior is most relevant • this is usually not easy to analyze • Often inefficient
A family of dynamic programming examples • Dynamic programming algorithms consider all possible subproblems • whether or not they arise from the main problem • Often this is wasteful • It’s less so if only a few subproblems exist • This holds when optimizing over a sequence • where subproblems correspond to subsequences • and there are only quadratically many of them • which may occur many times as subproblems
Optimizing over a sequence • Examples include finding the optimal … • BST with a given (inorder) traversal • the expected search time is to be minimized • Order of a chain of identical operations • the cost of performance is to be minimized • Parse tree for a given sentence • the “optimal” parse may mean the intended parse • or it may be enough to find any parse • the CYK algorithm is often presented in CS 154
The principle of optimality • Dynamic programming solutions depend on a principle of optimality • This principle states that solutions of subproblems of an optimization problem are acceptable iff they’re optimal • If it holds, then we need only solve subproblems recursively, without worrying about how their solutions interact
Ordering matrix multiplications • Matrix products can take greatly varying times to compute, depending on the ordering • Here we measure the cost in terms of the number of element multiplications • so if A is p by q and B is q by r, the cost of finding AB is pqr • If p=2, q=5, r = 10, and C is 10 by 20 • finding ABC left to right costs 100 + 400 = 500 • finding ABC right to left costs 1000 + 200 = 1200
The overall cost of multiplication • Given a sequence (Ak) of matrices: • Performing the ith multiplication last gives a total cost M1,n equal to • the cost of multiplying A1 through Ai, plus • the cost of multiplying Ai+1 through An, plus • the cost of multiplying these two products
A recurrence for the cost • More generally • indices needn’t start at 1 and needn’t end at n • we want the best value of i • Using Weiss’s notation for dimensions, this gives his recurrence in mid-page 467: • ML,R = min {ML,i + Mi+1,R + cL-1cicR } • where the minimum is taken over i≥L and i<R • So a 2-dimensional table is needed
Final details of the algorithm • Initially, we have that Mk,k = 0 • ML,R is computed in increasing order of R-L • The final cost is given in M1,n • The time required is O(n3) -- in fact, Q(n3) • O(n) per entry for O(n2) entries • A tree giving the order of multiplication may be found from the parallel table lastChange • whose value for L and R is the i giving the minimum value of ML,R
The table for Weiss’s example • Weiss’s example, p. 466, gives the table below for M: • Recall that the dimensions are • 50 by 10, 10 by 40, 40 by 30, and 30 by 5 • The upper-right entries are found by comparing • 27000 vs 60000, 8000 vs 13500, 10500 vs 36000 vs 34500
Optimal BSTs • To measure the expected search time in a BST T, we need to know (or estimate) • the probability of successful search for each key, • and the probability of unsuccessful search between any two adjacent keys • Weiss assumes this latter probability to be 0 • as will we
Computing the cost of an optimal BST • Then the cost to be minimized is Spi(1+ di) • where the probability of key i being sought is pi • and its depth in the tree is di • This cost may be computed recursively in terms of the costs of T’s two subtrees • The difference between their sum and the cost of T is just Spi (taken over all keys in T) • since the keys in the LST and RST increase their depth by 1, and the root enters at depth 0
A recurrence for the cost of an optimal BST • Let CL,R be the cost of the optimal BST containing keys L through R • Then CL,R = min {CL,i-1 + Ci+1,R + Spk } • where k and i run from L through R • note that the sum is independent of i • We initialize the 2D table with CL,L = pL • or better, CL,L-1 = 0 -- such entries are needed • The overall cost is found in C1,n • The time required is again Q(n3)
The table for a version of Weiss’s example • Suppose we restrict Weiss’s example, p. 469, to the first two and last two keys, doubling all probabilities • Then the probabilities, in order, are 0.44, 0,36, 0.04, 0.16, and we get the table below, where for example • 1.68 = min{1+0+0.8, 1+0.44+0.24, 1+1.16+0.16, 1+1.28+1}