1 / 25

CS 3343: Analysis of Algorithms

CS 3343: Analysis of Algorithms. Lecture 14: Order Statistics. Order statistics. The i th order statistic in a set of n elements is the i th smallest element The minimum is thus the 1 st order statistic The maximum is the n th order statistic The median is the n/2 order statistic

elsa
Download Presentation

CS 3343: Analysis of Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 3343: Analysis of Algorithms Lecture 14: Order Statistics

  2. Order statistics • The ith order statistic in a set of n elements is the ith smallest element • The minimum is thus the 1st order statistic • The maximum is the nth order statistic • The median is the n/2 order statistic • If n is even, there are 2 medians • How can we calculate order statistics? • What is the running time?

  3. Order statistics – selection problem • Select the ith smallest of n elements • Naive algorithm: Sort. • Worst-case running time Q(n log n) using merge sort or heapsort (not quicksort). • We will show: • A practical randomized algorithm with Q(n)expected running time • A cool algorithm of theoretical interest only with Q(n)worst-case running time

  4. k £x x ³x p r q Recall: Quicksort • The function Partition gives us the rank of the pivot • If we are lucky, k=i. done! • If not, at least get a smaller subarray to work with • k > i: ith smallest is on the left subarray • k < i : ith smallest is on the right subarray • Divide and conquer • If we are lucky, k close to n/2, or desired # is in smaller subarray • If unlucky, desired # is in larger subarray (possible size n-1)

  5. k £A[r] ³A[r] p r q Randomized divide-and-conquer algorithm • RAND-SELECT(A, p, q, i) ⊳ith smallest ofA[p..q] • if p = q & i > 1 then error! • rRAND-PARTITION(A, p, q) • k r – p + 1 ⊳k = rank(A[r]) • if i = k thenreturnA[r] • if i < k • thenreturn RAND-SELECT(A, p, r – 1, i) • else return RAND-SELECT(A, r + 1, q, i – k)

  6. Randomized Partition • Randomly choose an element as pivot • Every time need to do a partition, throw a die to decide which element to use as the pivot • Each element has 1/n probability to be selected Rand-Partition(A, p, q){ d = random(); // draw a random number between 0 and 1 index = p + floor((q-p+1) * d); // p<=index<=q swap(A[p], A[index]); Partition(A, p, q); // now use A[p] as pivot }

  7. Partition: k = 4 3 2 5 7 11 8 10 13 Select the 6 – 4 = 2nd smallest recursively. Example Select the i = 6th smallest: i = 6 7 10 5 8 11 3 2 13 pivot

  8. 3 2 5 7 11 8 10 13 k = 4 i = 6 – 4 = 2 k = 3 10 8 11 13 i = 2 < k k = 2 8 10 i = 2 = k 10 Complete example: select the 6th smallest element. 7 10 5 8 11 3 2 13 i = 6 Note: here we always used first element as pivot to do the partition (instead of rand-partition).

  9. Unlucky: T(n) = T(n – 1) + Q(n) = Q(n2) arithmetic series Worse than sorting! Intuition for analysis (All our analyses today assume that all elements are distinct.) Lucky: T(n) = T(9n/10) + Q(n) = Q(n) CASE 3

  10. Running time of randomized selection • For upper bound, assume ith element always falls in larger side of partition • The expected running time is an average of all cases T(max(0, n–1)) + n if 0:n–1 split, T(max(1, n–2)) + n if 1:n–2 split, M T(max(n–1, 0)) + n if n–1:0 split, T(n) ≤ Expectation

  11. Substitution method Want to show T(n) = O(n). So need to prove T(n) ≤ cn for n > n0 Assume: T(k) ≤ ck for all k < n if c ≥ 4 Therefore, T(n) = O(n)

  12. A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973]. IDEA: Generate a good pivot recursively. Summary of randomized selection • Works fast: linear expected time. • Excellent algorithm in practice. • But, the worst case is very bad: Q(n2). Q. Is there an algorithm that runs in linear time in the worst case?

  13. SELECT(i, n) • Divide the n elements into groups of 5. Find the median of each 5-element group by rote. • Recursively SELECT the median x of the ën/5û group medians to be the pivot. • Partition around the pivot x. Let k = rank(x). • if i = k thenreturnx • elseif i < k • then recursively SELECT the ith smallest element in the lower part • else recursivelySELECTthe(i–k)th smallest element in the upper part Worst-case linear-time selection Same as RAND-SELECT

  14. Choosing the pivot

  15. Choosing the pivot • Divide the n elements into groups of 5.

  16. lesser greater Choosing the pivot • Divide the n elements into groups of 5. Find the median of each 5-element group by rote.

  17. lesser greater Choosing the pivot x • Divide the n elements into groups of 5. Find the median of each 5-element group by rote. • Recursively SELECT the median x of the ën/5û group medians to be the pivot.

  18. lesser greater Analysis x At least half the group medians are £x, which is at leastëën/5û /2û = ën/10ûgroup medians.

  19. lesser greater Analysis x • At least half the group medians are £x, which is at leastëën/5û /2û = ën/10ûgroup medians. • Therefore, at least 3ën/10û elements are £x. (Assume all elements are distinct.)

  20. lesser greater Analysis x • At least half the group medians are £x, which is at leastëën/5û /2û = ën/10ûgroup medians. • Therefore, at least 3ën/10û elements are £x. • Similarly, at least 3ën/10û elements are ³x.

  21. 3ën/10û 3ën/10û Possible position for pivot Analysis Need “at most” for worst-case runtime • At least 3ën/10û elements are £ x at most n-3ën/10û elements are x • At least 3ën/10û elements are x at most n-3ën/10û elements are x • The recursive call to SELECT in Step 4 is executed recursively on at most n-3ën/10û elements.

  22. Analysis • Use fact that ëa/bû > a/b-1 • n-3ën/10û < n-3(n/10-1)  7n/10 + 3 •  3n/4 if n ≥ 60 • The recursive call to SELECT in Step 4 is executed recursively on at most 7n/10+3elements.

  23. SELECT(i, n) • Divide the n elements into groups of 5. Find the median of each 5-element group by rote. • Recursively SELECT the median x of the ën/5û group medians to be the pivot. • Partition around the pivot x. Let k = rank(x). • if i = k thenreturnx • elseif i < k • then recursively SELECT the ith smallest element in the lower part • else recursivelySELECTthe(i–k)th smallest element in the upper part Developing the recurrence T(n) Q(n) T(n/5) Q(n) T(7n/10+3)

  24. Solving the recurrence Assumption:T(k) £ck for all k < n ifn ≥ 60 ifc ≥ 20 and n ≥ 60

  25. Conclusions • Since the work at each level of recursion is basically a constant fraction (19/20) smaller, the work per level is a geometric series dominated by the linear work at the root. • In practice, this algorithm runs slowly, because the constant in front of n is large. • The randomized algorithm is far more practical. Exercise:Try to divide into groups of 3 or 7. Exercise:Think about an application in sorting.

More Related