170 likes | 294 Views
§1 Greedy Algorithms. 000 10 110 0 10 111. 2. Huffman Codes – for file compression. 〖 Example 〗 Suppose our text is a string of length 1000 that comprises the characters a , u , x , and z . Then it will take ? bits to store the string as 1000 one-byte characters. 8000.
E N D
§1 Greedy Algorithms 00010110010111 2. Huffman Codes – for file compression 〖Example〗 Suppose our text is a string of length 1000 that comprises the characters a, u, x, and z. Then it will take ? bits to store the string as 1000 one-byte characters. 8000 We may encode the symbols as a = 00, u = 01, x = 10, z = 11. For example, aaaxuaxz is encoded as 0000001001001011. Then the space taken by the string with length 1000 will be 2000 bits + space for code table. /* log C bits are needed in a standard encoding where C is the size of the character set */ Notice that we have only 4 distinct characters in that string. Hence we need only 2 bits to identify them. frequency ::= number of occurrences of a symbol. In string aaaxuaxz , f(a) = 4, f(u) = 1, f(x) = 2, f(z) = 1. The size of the coded string can be reduced using variable-length codes, for example, a = 0, u = 110, x = 10, z = 111. Note: If all the characters occur with the same frequency, then there are not likely to be any savings. 1/17
§1 Greedy Algorithms 0 1 0 1 0 1 a u x z 1 1 0 0 0 1 Find the full binary tree of minimum total cost where all characters are contained in the leaves. a x u z Representation of the original code in a binary tree /* trie */ If character Ci is at depth di and occurs fi times, then the cost of the code = difi . Now, with a = 0, u = 110, x = 10, z = 111 and the string 00010110010111, can you decode it? Cost (aaaxuaxz0000001001001011 ) = 24 + 21 + 22 + 21 = 16 Representation of the optimal code in a binary tree The answer is aaaxuaxz (with a = 0, u = 110, x = 10, z = 111). What makes this decoding method work? The trick is: No code is a prefix of another. Cost (aaaxuaxz00010110010111 ) = 14 + 31 + 22 + 31 = 14 All nodes either are leaves or have two children. Any sequence of bits can always be decoded unambiguously if the characters are placed only at the leaves of a full tree – such kind of code is called prefix code. 2/17
§1 Greedy Algorithms Huffman’s Algorithm (1952) void Huffman ( PriorityQueue heap[ ], int C ) { consider the C characters as C single node binary trees, and initialize them into a min heap; for ( i = 1; i < C; i++ ) { create a new node; /* be greedy here */ delete root from min heap and attach it to left_child of node; delete root from min heap and attach it to right_child of node; weight of node = sum of weights of its children; /* weight of a tree = sum of the frequencies of its leaves */ insert node into min heap; } } T = O( ? ) C log C 3/17
§1 Greedy Algorithms Ci a e i s t sp nl 〖Example〗 58 fi 10 15 12 3 4 13 1 25 25 33 33 a e i s t sp nl : 111 : 10 : 00 : 11011 : 1100 : 01 : 11010 a 10 e 15 i 12 s 3 t 4 sp 13 nl 1 t 4 i 12 4 e 15 sp 13 a 10 0 1 i 12 i 12 sp 13 sp 13 e 15 e 15 e 15 18 18 18 25 8 a 10 sp 13 e 15 i 12 nl 1 s 3 nl 1 s 3 a 10 e 15 t 4 sp 13 i 12 0 1 0 1 8 8 8 a 10 a 10 a 10 i 12 sp 13 t 4 4 0 1 t 4 t 4 t 4 4 4 4 i 12 e 15 sp 13 18 nl 1 s 3 0 1 nl 1 nl 1 nl 1 s 3 s 3 s 3 8 a 10 1 0 t 4 4 nl 1 s 3 Cost = 310 + 215 + 212 + 53 + 44 + 213 + 51 = 146 4/17
§1 Greedy Algorithms An optimal packing is a feasible one with maximum profit. That is, we are supposed to find the values of xi such that obtains its maximum under the constrains 3. Approximate Bin Packing The Knapsack Problem A knapsack with a capacity M is to be packed. Given N items. Each item i has a weight wi and a profit pi . If xiis thepercentage of the item i being packed, then the packed profit will be pi xi . Sunny Cup 2004 http://acm.zju.edu.cn/show_problem.php?pid=2109 n = 3, M = 20, (p1, p2, p3) = (25, 24, 15) (w1, w2, w3) = (18, 15, 10) Q: What must we do in each stage? A: Pack one item into the knapsack. Q: On which criterion shall we be greedy? maximum profit minimum weight ( 0, 1, 1/2 ) P = 31.5 maximum profit density pi / wi 5/17
§1 Greedy Algorithms 0.8 0.3 0.5 0.7 0.1 0.4 0.2 B1 B2 B3 The Bin Packing Problem Given N items of sizes S1 , S2 , …, SN , such that 0 < Si 1 for all 1i N . Pack these items in the fewest number of bins, each of which has unit capacity. 〖Example〗N = 7; Si = 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8 NP Hard An Optimal Packing 6/17
§1 Greedy Algorithms On-line Algorithms Place an item before processing the next one, and can NOT change decision. 〖Example〗Si = 0.4 , 0.4 , 0.6 , 0.6 You never know when the input might end. Hence an on-line algorithm cannot always give an optimal solution. 0.4 0.6 0.6 0.4 【Theorem】There are inputs that force any on-line bin-packing algorithm to use at least 4/3 the optimal number of bins. 7/17
§1 Greedy Algorithms Next Fit void NextFit ( ) { read item1; while ( read item2 ) { if ( item2 can be packed in the same bin as item1 ) place item2 in the bin; else create a new bin for item2; item1 = item2; } /* end-while */ } 【Theorem】Let M be the optimal number of bins required to pack a list I of items. Then next fit never uses more than 2M bins. There exist sequences such that next fit uses 2M – 2 bins. 8/17
§1 Greedy Algorithms First Fit void FirstFit ( ) { while ( read item ) { scan for the first bin that is large enough for item; if ( found ) place item in that bin; else create a new bin for item; } /* end-while */ } Can be implemented in O( N log N ) 【Theorem】Let M be the optimal number of bins required to pack a list I of items. Then first fit never uses more than 17M / 10 bins. There exist sequences such that first fit uses 17(M– 1) / 10 bins. Best Fit Place a new item in the tightest spot among all bins. T = O( N log N ) and bin no. < 1.7M 9/17
§1 Greedy Algorithms Next Fit First Fit Best Fit 0.3 0.1 0.8 0.1 0.8 0.1 0.8 0.5 0.7 0.5 0.7 0.5 0.7 0.3 0.4 0.4 0.4 0.3 0.2 0.2 0.2 〖Example〗Si = 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8 〖Example〗Si = 1/7+, 1/7+, 1/7+, 1/7+, 1/7+, 1/7+, 1/3+, 1/3+, 1/3+, 1/3+, 1/3+, 1/3+, 1/2+, 1/2+, 1/2+, 1/2+, 1/2+, 1/2+ where = 0.001. The optimal solution requires ? bins. However, all the three on-line algorithms require ? bins. 6 10 10/17
§1 Greedy Algorithms 0.8 0.5 0.7 Off-line Algorithms View the entire item list before producing an answer. Trouble-maker: The large items Solution: Sort the items into non-increasing sequence of sizes. Then apply first (or best) fit – first(or best) fit decreasing. 〖Example〗Si = 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8 0.8, 0.7, 0.5, 0.4, 0.3, 0.2, 0.1 0.2 0.3 0.1 【Theorem】Let M be the optimal number of bins required to pack a list I of items. Then first fit decreasing never uses more than 11M / 9 + 4 bins. There exist sequences such that first fit decreasing uses 11M / 9 bins. 0.4 Simple greedy heuristics can give good results. 11/17
Cases solved by divide and conquer §2 Divide and Conquer Divide: Smaller problems are solved recursively (except base cases). Conquer: The solution to the original problem is then formed from the solutions to the subproblems. The maximum subsequence sum – the O( N log N ) solution Tree traversals – O( N ) Mergesort and quicksort – O( N log N ) Note: Divide and conquer makes at least two recursive calls and the subproblems are disjoint. 12/17
§2 Divide and Conquer 【Theorem】The solution to the equation T(N) = a T(N / b) + (NklogpN ), where a 1, b > 1, and p 0 is T = O( N log N ) T = O( N1.59) 1. Running Time of Divide and Conquer Algorithms 〖Example〗 Mergesort has a = b = 2, p = 0 and k = 1. 〖Example〗 Divide with a = 3, and b = 2 for each recursion; Conquer with O( N ) – that is, k = 1 and p = 0 . If conquer takes O( N2) then T = O( N2) . 13/17
§2 Divide and Conquer 2. Closest Points Problem Given N points in a plane. Find the closest pair of points. (If two points have the same position, then that pair is the closest with distance 0.) Simple Exhaustive Search Check ? pairs of points. T = O( ? ). N ( N – 1 ) / 2 N 2 Divide and Conquer – similar to the maximum subsequence sum problem 〖Example〗 Sort according to x-coordinates and divide; Conquer by forming a solution from left, right, and cross. 14/17
§2 Divide and Conquer It is O( N log N ) all right. But is it really clearly so? It is so simple, and we clearly have an O( N log N ) algorithm. How about k ? Can you find the cross distance in linear time? Just like finding the max subsequence sum, we have a = b = 2 … 15/17
§2 Divide and Conquer If NumPointInStrip = , we have - strip /* points are all in the strip */ for ( i=0; i<NumPointsInStrip; i++ ) for ( j=i+1; j<NumPointsInStrip; j++ ) if ( Dist( Pi , Pj ) < ) = Dist( Pi , Pj ); The worst case: NumPointInStrip = N /* points are all in the strip */ /* and sorted by y coordinates */ for ( i = 0; i < NumPointsInStrip; i++ ) for ( j = i + 1; j < NumPointsInStrip; j++ ) if ( Dist_y( Pi , Pj ) > ) break; else if ( Dist( Pi , Pj ) < ) = Dist( Pi , Pj ); The worst case: For any pi , at most 7 points are considered. Textra = O( N ) 16/17
§2 Divide and Conquer Note: Sorting y-coordinates in each recursive call gives O( N log N ) extra work instead of O( N ). Solution: Please read the last paragraph on p.374. 3. The Selection Problem – self-study the O( N ) algorithm 4. Big Integer Multiplication and Matrix Multiplication Self-study: only of theoretical interests. 17/17