430 likes | 448 Views
CS 575 Design and Analysis of Computer Algorithms Professor Michal Cutler Lecture # 8 September 22, 2005. This class. Lower bound on number of comparisons done by comparison based sorts Linear sorts Pigeonhole sort O(n +k) Counting sort O(n + k) Bucket sort O(n)
E N D
CS 575Design and Analysis ofComputer AlgorithmsProfessor Michal CutlerLecture # 8September 22, 2005
This class • Lower bound on number of comparisons done by comparison based sorts • Linear sorts • Pigeonhole sort O(n +k) • Counting sort O(n + k) • Bucket sort O(n) • Radix sort O(b/r(n+2r)) • Randomized select for order statistics
Sort • Sort algorithms (heapsort, mergesort) run in Q(n lg n) in the worst case. • CAN WE DO BETTER? • No: If the sort is a comparison-sort (based only on comparison of keys) • Yes: If we allow arithmetic operations or take advantage of additional restrictions on the keys
Sort We will show: 1) How to represent the execution of each comparison-sort algorithm with a decisiontree in which we: • Model only the comparisons • Ignore all other aspects of the algorithm 2)Explain why a decision tree for a correct sort algorithm has at least n! leaves 3) Show that depth of decision tree is (nlgn) • Analysis assumes all keys are distinct
Decision Trees • A decision tree is a binary tree that shows the execution of a comparison based algorithm on all possible inputs of a given size. • Each internal node contains the pair of elements which are compared (<, or <=) • Each leaf contains an output. • Each branch is labeled by the result of the comparison (<= or >, yes or no)
Input: a, b, c if a < b if b < c a,b,celse if a < c a,c,belse c,a,b else if b < c if a < c b,a,celse b,c,a else c,b,a Decision tree for sortThree a<b yes no b<c b<c yes no yes no a,b,c a<c c,b,a a<c yes yes no no c,a,b b,a,c b,c,a a,c,b 3!=6 leaves representing 6 permutations of 3 distinct numbers. 2 paths with 2 comparisons 2 paths with 3 comparisons Total 5 comparisons
1. for (i = 1; i n -1; i++)2. for (j = i + 1; j n ; j++)3. if ( S[ j ] < S[ i ])4. swap(S[ i ] ,S[ j ]) Exchange Sort At end of i = 1: S[1] = minS[ i ] At end of i = 2: S[2] = minS[ i ] At end of i = 3: S[3] = minS[ i ] At end of i = n-1: S[n-1] = minS[ i ] 1 i n 2 i n 3 i n n- 1 i n
a,b,c Decision Tree for Exchange Sort for n=3 Example =(7,3,5) a,b,c s[2]<s[1] i=1 3 7 5 b,a,c a,b,c ab s[3]<s[1] s[3]<s[1] 3 7 5 b,a,c c,b,a a,b,c c,a,b cb ca s[3]<s[2] s[3]<s[2] s[3]<s[2] s[3]<s[2] cb ca ab ab b,c,a c,a,b a,c,b c,a,b c,b,a c,b,a b,a,c 3 5 7 For clarity we show the swaps and the current state of the list Every path has 3 comparisonsTotal 7 comparisons8 leaves. (c,b,a) and (c,a,b) appear twice.
A decision tree for sort has depth (n lg n ). Assume depth of tree is d (i.e. there are d comparisons on the longest path from the root to a leaf ). • A binary tree of depth d can have at most l 2dleaves. • A decision tree for a correct algorithm must have at least l n! leaves (outputs) • Thus, n! 2d • Taking lg of both sides we get d lg (n!). • It can be shown that lg (n !) = (n lg n ).
2 2 1 4 3 2 1 1 2 3 4 2 3 1 1 Pigeonhole sort • Problem: sort n keys in ranges 1 to k • Main idea:1) Count the number of keys of value i, maintain count in an auxiliary array, C2) Use counts to overwrite input • After step 1) • After step 2) • Analysis? Input A Aux C Output A 1 2 2 2 3 1 4
Pigeonhole-Sort( A, k) fori 1 to k //initialize C C[i ] 0 forj 1 tolength[A] C[A[ j ] ] C[A[ j ] ] + 1 //Count keys q <-1 forj 1to k //rewrite A while C[j]>0 A[q] = j C[ j ] C[ j ]-1 q <- q+1 Pigeonhole Sort
Counting sort • Problem: Sort n records stored in A[1..n] • Each record contains a key and data • All keys are in the range of 1 to k
Counting sort • Main idea: • Count in C, number records with key = i, i = 1,…, k. • Use counts in C to compute the offset in sorted B of record with key i for i = 1,…, k. • Copy A into sorted B using and updating (decrementing) the computed offsets. Tomake the sort stable we start at lastposition of A.
Counting sort • Additional Space • The sorted list is stored in B • Additional array C of size k • Note: Pigeonhole sort does not require array B
How shall we compute the offsets? • Assume C[1]= 3 (then 3 records with key=1 should be stored in positions 1, 2, 3 in the sorted array B). We keep the offset for key 1 = 3. • Let C[2]=2 (then 2 records with key=2 should be in stored in positions 4, 5 in B). • We compute the offset for key 2 to be (C[2] + offset for key 1) = 2 +3 = 5 • In general offset for key i is (C[i] + offset for key i-1).
Counting-Sort( A, B, k) fori 1 to k //initialize C C[i ] 0 forj 1 tolength[A] C[A[ j ] ] C[A[ j ] ] + 1 //Count keys fori 2 tok C[i ] C[i ] +C[i -1] //Compute offset forj length[A] downto 1 //copy B [ C[A[ j ] ] ] A[ j ] C[A[ j ] ] ] C [A[ j ] ] –1//update offset Counting Sort
B Counting sort A C C C 3 Clinton 4 Smith 1 Xu 2 Adams 3 Dunn 4 Yi 2 Baum 1 Fu 3 Gold 1 Lu 1 Land 1 Lu 1 Land 3 Gold 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 0 0 0 0 4 2 3 2 (4)(3)2 6 (9)8 11 1 2 3 4 1 2 3 4 1 2 3 4 finalcounts "offsets" Original list Sorted list
Analysis: • O(k + n) time • What if k = O(n) • Requires k + n extra storage. • Stable sort: Preserves the original order of equal keys. • Is counting sort stable? • Is counting sort good for sorting records with 32 bit keys?
Radix sort • Radix sort is used in card sorting machines
Hollerith’s punched cards • Hollerith devised what was to become the computer punched card • Each card has 12 rows and 80 columns • Each column represents a single alphanumeric character or symbol. • The card punching machine punches holes in some of the 12 positions of each column
IBM card punching machine Card punching machine
Hollerith’s tabulating machines • As the cards were fed through a "tabulating machine," pins passed through the positions where holes were punched completing an electrical circuit and subsequently registered a value. • The 1880 census in the U.S. took seven years to complete • With Hollerith's "tabulating machines" the 1890 census took the Census Bureau six weeks • Through mergers company’s name - IBM
Card sorting machine IBM’s card sorting machine
Radixsort • Main idea • Break key into “digit” representation key = id, id-1, …, i2, i1 • "digit" can be a number in any base, a character, etc • Radix sort: for i= 1 to d sort “digit” i using a stable sort • Analysis : (d (stable sort time)) where d is the number of “digit”s
Radix sort • Which stable sort? • Since the range of values of a digit is small the best stable sort to use is Counting Sort. • When counting sort is used the time complexity is (d (n +k )) where k is the range of a "digit". • When k O(n), (d n)
Radix sort- 910 321 572 294 326 178 368 139 139 178 294 321 326 368 572 910 1 2 3 4 5 6 7 8 178 139 326 572 294 321 910 368 910 321 326 139 368 572 178 294 Sorted list Input list
Lemma 8.4 • Given nb-bit numbers and any positive integer r<=b, radix sort correctly sorts these numbers in ((b/r)(n + 2r)) • Proof • Divide the number into b/r “digits”. • Each “digit” has r bits and a range 0 to 2r-1. • Radix sort executes b/r counting sorts. • Each counting sort is (n + 2r) • So the total is ((b/r)(n + 2r))
Bucket sort • Assumption: Keys are distributed uniformly in interval [0, 1) • Main idea • n records are distributed into nbuckets (O(n)) • insert A[i] into list of B[nA[i]] • Buckets are sorted with insertion sort • Buckets are combined (O(n))
BUCKET-SORT(A) • n <- length[A] • for i<-1 to n • insert A[i] into list of B[nA[i]] • for i=0 to n – 1 • sort list B[i] with insertion sort • Concatenate the lists B[0], B[1], …, B[n-1] together in order
Bucket sort - example B B A .78 .17 .39 .26 .72 .94 .21 .12 .23 .68 1 2 3 4 5 6 7 8 9 10 / / / / / / / / 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 .12 .12 .17/ .17/ .21 .23 .23 .21 .26/ .26/ .39/ .39/ .68/ .68/ .78/ .72 .78/ .72 .94/ .94/ Step 2 sort Step 1 distribute
Analysis • Let nibe the number of elementsin B[i] • Let xij = 1 when {A[j] falls in bucket i} else 0
Analysis – E(ni2)= 2 – 1/n • What is the worst case run time of bucket sort?
Median and Order Statistics • ith order statistic • find the ith smallest of n given elements (called the element with ranki). • if i = 1, find the minimum element, • if i = n find the maximum and • if i = ë (n +1)/2 û and/or i = é (n +1)/2 ù we find the median element. • Problem called selection.
The selection problem • Solution 1: Sort the array and then return the ith element of the sorted array. The worst case time is O(n lg n). • Solution 2: Use randomized partition to find if the ith element is the pivot, or resides in one of the partitions • In algorithm on next slide 0 < i <= r-p+1 • Select(A, 1, n, 5) to find 5th smallest number
Randomized select RANDOMIZED-SELECT(L, p, r, i) • if p = r return L[p] //0<i<=1 • q <- Randomized-Partition(L, p, r) • k <- q – p + 1 //found kth element L[q] • if i = k return L[q] • if i < k //ith smallest in left partition • return RANDOMIZED-SELECT(L, p, q-1, i) • else //find (i-k) smallest in right partition • return RANDOMIZED-SELECT(L, q+1, r, i-k)
Randomized select • Assume: partition divides the array into two parts, and the larger part (where we find our element) is always 9/10th of the original array. • We get T(n) £T(9 n /10) + Q(n). Case 3 of Master theorem applies nlog10/91 = n0 = 1, Q(n)/1= Q(n) = (n1) and 9n/10<=cn for c=9/10 and n>= 1. Thus T(n) = Q(n) • For any fixed proportion of n, (99n/100, 999n/1000) the asymptotic time is the same.
Randomized select • In the worst case we get T(n) = T(n -1) + Q(n) = Q(n2) • After partition, the ith element is either: • the one at k, or • in an array of size k-1, • or in an array of size n-k, for k=1,…, n • To get an upper bound on the average case time assume that the ith element always falls in the larger partition
Randomized selection Assume the linear function f(n) described by the O(n) term satisfies f(n) £ an for n ³0
E(T(n))=O(n) • cn/4- an ³c/2 • n(c-4a) ³2c • Let c>4a, so n ³2c/(c-4a) • We have proved T(n) £ cn if n ³2c/(c-4a) and c>4a. • Assume that T(n)=O(1) for n<2c/(c-4a) • We have E(T(n))=O(n)
Next class • Linear select