110 likes | 398 Views
Selection --Medians and Order Statistics (Chap. 9). The i th order statistic of n elements S={ a 1 , a 2 ,…, a n } : i th smallest elements Also called selection problem Minimum and maximum Median, lower median, upper median Selection in expected/average linear time
E N D
Selection --Medians and Order Statistics (Chap. 9) • The ith order statistic of n elements S={a1, a2,…, an} : ith smallest elements • Also called selection problem • Minimum and maximum • Median, lower median, upper median • Selection in expected/average linear time • Selection in worst-case linear time
O(nlg n) Algorithm • Suppose n elements are sorted by an O(nlg n) algorithm, e.g., MERGE-SORT • Minimum: the first element • Maximum: the last element • The ith order statistic: the ith element. • Median: • If n is odd, then ((n+1)/2)th element. • If n is even, • then ((n+1)/2)th element, lower median • then ((n+1)/2)th element, upper median • All selections can be done in O(1), so total: O(nlg n). • Can we do better?
Selection in Expected Linear Time O(n) • Select ith element • A divide-and-conquer algorithm RANDOMIZED-SELECT • Similar to quicksort, partition the input array recursively • Unlike quicksort, which works on both sides of the partition, just work on one side of the partition. • Called prune-and-search, prune one side, just search the other side). • (Please review or read quicksort in chapter 7.)
RANDOMIZED-SELECT(A,p,r,i) • ifp=rthenreturn A[p] • qRANDOMIZED-PARTITION(A,p,r) • //the q holds for A[p,q-1]A[q] A[q+1,r] • k q-p+1 • ifi=k thenreturn A[q] • else if i<k • then return RANDOMIZED-SELECT(A,p,q-1,i) • else return RANDOMIZED-SELECT(A,q+1,r,i-k)
Analysis of RANDOMIZED-SELECT • Worst-case running time (n2), why??? it may be unlucky and always partition into A[q], an empty side and a side with remaining elements. So every partitioning of m elements will take (m) time, and m=n,n-1,…,2. Thus total is (n)+ (n-1)+…+ (2)= (n(n+1)/2-1) = (n2). Moreover, no particular input elicits the worst-case behavior, Because of “randomness”. But in average, it is good. By using probabilistic analysis/random variable, it can be proven that the expected running time is O(n). (ref. to page 187). Can we do better, such that O(n) in worst case??
Selection in worst case linear time O(n) • Select the ith smallest element of S={a1, a2,…, an} • Use so called prune-and-search technique: • Let x S, and partition S into three subsets • S1={aj | aj <x}, S2={aj | aj =x}, S3={aj | aj >x} • If | S1 |>i, search ith smallest element in S1 recursively, (prune S2 and S3 away) • Else If | S1 |+| S2 |>i, then return x (the ith smallest element) • Else search (i-(| S1 |+| S2 |))th in S3 recursively, (prune S1 and S2 away) • The question is how to select x such thatS1 andS3 are nearly equal.?
The Way to Select x At least (3n/10)-6 elements <x Divide elements into n/5 groups of 5 elements each. Find the median of each group Find the median of the medians At least (3n/10)-6 elements >x Because each of 1/2 n/5-2 groups contributes 3 elements which are x
SELECT ith Element in n Elements) • Divide n elements into n/5 groups of 5 elements. • Find the median of each group. • Use SELECT recursively to find the median x of the above n/5 medians. • Partition n elements around x into S1, S2 , and S3. • If |S1|>i, search ith smallest element in S1 recursively, Else If |S1|+|S2|>i, then return x (the ith smallest element) Else search (i-(|S1|+|S2|))th in S3 recursively,
Analysis of SELECT (cont.) • Steps 1,2,4 take O(n), • Step 3 takes T(n/5). • Let us see step 5: • At least half of medians in step 2 are x, thus at least 1/2 n/5-2 groups contribute 3 elements which are x. i.e, 3(1/2 n/5 -2) (3n/10)-6. • Similarly, the number of elements x is also at least (3n/10)-6. • Thus, |S1| is at most (7n/10)+6, similarly for |S3|. • Thus SELECT in step 5 is called recursively on at most (7n/10)+6 elements. • Recurrence is: • T(n)= O(1) if n< some value (i.e. 140) • T(n/5)+T(7n/10+6)+O(n) if n the value (i.e, 140)
Solve recurrence by substitution • Suppose T(n) cn, for some c. • T(n) c n/5+ c(7n/10+6)+ an • cn/5+ c + 7/10cn+6c+ an • = 9/10cn+an+7c • =cn+(-cn/10+an+7c) • Which is at most cn if -cn/10+an+7c<0. • i.e., c 10a(n/(n-70)) when n>70. • So select n=140, and then c 20a. • Note: n may not be 140, any integer >70 is OK.
Summary • Bucket sort, counting sort, radix sort: • Their running times, • Modifications • The ith order statistic of n elements S={a1, a2,…, an} : ith smallest elements: • Minimum and maximum. • Median, lower median, upper median • Selection in expected/average linear time • Worst case running time • Prune-and-search • Selection in worst-case linear time: • Why group size 5?