280 likes | 325 Views
CS 3343: Analysis of Algorithms. Lecture 14: Order Statistics. Order statistics. The i th order statistic in a set of n elements is the i th smallest element The minimum is thus the 1 st order statistic The maximum is the n th order statistic The median is the n/2 order statistic
E N D
CS 3343: Analysis of Algorithms Lecture 14: Order Statistics
Order statistics • The ith order statistic in a set of n elements is the ith smallest element • The minimum is thus the 1st order statistic • The maximum is the nth order statistic • The median is the n/2 order statistic • If n is even, there are 2 medians • How can we calculate order statistics? • What is the running time?
Order statistics – selection problem • Select the ith smallest of n elements • Naive algorithm: Sort. • Worst-case running time Q(n log n) using merge sort or heapsort (not quicksort). • We will show: • A practical randomized algorithm with Q(n)expected running time • A cool algorithm of theoretical interest only with Q(n)worst-case running time
k £x x ³x p r q Recall: Quicksort • The function Partition gives us the rank of the pivot • If we are lucky, k=i. done! • If not, at least get a smaller subarray to work with • k > i: ith smallest is on the left subarray • k < i : ith smallest is on the right subarray • Divide and conquer • If we are lucky, k close to n/2, or desired # is in smaller subarray • If unlucky, desired # is in larger subarray (possible size n-1)
k £A[r] ³A[r] p r q Randomized divide-and-conquer algorithm • RAND-SELECT(A, p, q, i) ⊳ith smallest ofA[p..q] • if p = q & i > 1 then error! • rRAND-PARTITION(A, p, q) • k r – p + 1 ⊳k = rank(A[r]) • if i = k thenreturnA[r] • if i < k • thenreturn RAND-SELECT(A, p, r – 1, i) • else return RAND-SELECT(A, r + 1, q, i – k)
Randomized Partition • Randomly choose an element as pivot • Every time need to do a partition, throw a die to decide which element to use as the pivot • Each element has 1/n probability to be selected Rand-Partition(A, p, q){ d = random(); // draw a random number between 0 and 1 index = p + floor((q-p+1) * d); // p<=index<=q swap(A[p], A[index]); Partition(A, p, q); // now use A[p] as pivot }
Partition: k = 4 3 2 5 7 11 8 10 13 Select the 6 – 4 = 2nd smallest recursively. Example Select the i = 6th smallest: i = 6 7 10 5 8 11 3 2 13 pivot
3 2 5 7 11 8 10 13 k = 4 i = 6 – 4 = 2 k = 3 10 8 11 13 i = 2 < k k = 2 8 10 i = 2 = k 10 Complete example: select the 6th smallest element. 7 10 5 8 11 3 2 13 i = 6 Note: here we always used first element as pivot to do the partition (instead of rand-partition).
Unlucky: T(n) = T(n – 1) + Q(n) = Q(n2) arithmetic series Worse than sorting! Intuition for analysis (All our analyses today assume that all elements are distinct.) Lucky: T(n) = T(9n/10) + Q(n) = Q(n) CASE 3
Running time of randomized selection • For upper bound, assume ith element always falls in larger side of partition • The expected running time is an average of all cases T(max(0, n–1)) + n if 0:n–1 split, T(max(1, n–2)) + n if 1:n–2 split, M T(max(n–1, 0)) + n if n–1:0 split, T(n) ≤ Expectation
Substitution method Want to show T(n) = O(n). So need to prove T(n) ≤ cn for n > n0 Assume: T(k) ≤ ck for all k < n if c ≥ 4 Therefore, T(n) = O(n)
A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973]. IDEA: Generate a good pivot recursively. Summary of randomized selection • Works fast: linear expected time. • Excellent algorithm in practice. • But, the worst case is very bad: Q(n2). Q. Is there an algorithm that runs in linear time in the worst case?
SELECT(i, n) • Divide the n elements into groups of 5. Find the median of each 5-element group by rote. • Recursively SELECT the median x of the ën/5û group medians to be the pivot. • Partition around the pivot x. Let k = rank(x). • if i = k thenreturnx • elseif i < k • then recursively SELECT the ith smallest element in the lower part • else recursivelySELECTthe(i–k)th smallest element in the upper part Worst-case linear-time selection Same as RAND-SELECT
Choosing the pivot • Divide the n elements into groups of 5.
lesser greater Choosing the pivot • Divide the n elements into groups of 5. Find the median of each 5-element group by rote.
lesser greater Choosing the pivot x • Divide the n elements into groups of 5. Find the median of each 5-element group by rote. • Recursively SELECT the median x of the ën/5û group medians to be the pivot.
lesser greater Analysis x At least half the group medians are £x, which is at leastëën/5û /2û = ën/10ûgroup medians.
lesser greater Analysis x • At least half the group medians are £x, which is at leastëën/5û /2û = ën/10ûgroup medians. • Therefore, at least 3ën/10û elements are £x. (Assume all elements are distinct.)
lesser greater Analysis x • At least half the group medians are £x, which is at leastëën/5û /2û = ën/10ûgroup medians. • Therefore, at least 3ën/10û elements are £x. • Similarly, at least 3ën/10û elements are ³x.
3ën/10û 3ën/10û Possible position for pivot Analysis Need “at most” for worst-case runtime • At least 3ën/10û elements are £ x at most n-3ën/10û elements are x • At least 3ën/10û elements are x at most n-3ën/10û elements are x • The recursive call to SELECT in Step 4 is executed recursively on at most n-3ën/10û elements.
Analysis • Use fact that ëa/bû > a/b-1 • n-3ën/10û < n-3(n/10-1) 7n/10 + 3 • 3n/4 if n ≥ 60 • The recursive call to SELECT in Step 4 is executed recursively on at most 7n/10+3elements.
SELECT(i, n) • Divide the n elements into groups of 5. Find the median of each 5-element group by rote. • Recursively SELECT the median x of the ën/5û group medians to be the pivot. • Partition around the pivot x. Let k = rank(x). • if i = k thenreturnx • elseif i < k • then recursively SELECT the ith smallest element in the lower part • else recursivelySELECTthe(i–k)th smallest element in the upper part Developing the recurrence T(n) Q(n) T(n/5) Q(n) T(7n/10+3)
Solving the recurrence Assumption:T(k) £ck for all k < n ifn ≥ 60 ifc ≥ 20 and n ≥ 60
Conclusions • Since the work at each level of recursion is basically a constant fraction (19/20) smaller, the work per level is a geometric series dominated by the linear work at the root. • In practice, this algorithm runs slowly, because the constant in front of n is large. • The randomized algorithm is far more practical. Exercise:Try to divide into groups of 3 or 7. Exercise:Think about an application in sorting.