1 / 25

CS200: Algorithm Analysis

Explore the Median and Randomized Divide and Conquer algorithms for finding the ith smallest element among n elements. Learn Lumoto's Partition Pseudo Code and RandomSelect approach. Discover analysis methods, including Master Method and Iteration Method, for different cases. Understand the average, best, and worst-case scenarios. Delve into the selection and pivoting strategies to ensure efficient runtime. Gain insights into the practical applications and complexities of Order Selection algorithms.

lmarcus
Download Presentation

CS200: Algorithm Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS200: Algorithm Analysis

  2. MEDIAN ORDER STATISTICS Order Statistics Problem : Find the ith smallest of n elements (the element with rank = i). if i = 1 then minimum. if i = n then maximum if i = floor or ceiling of (n/2) then median. Simple solution is to sort the elements and index into ith element but runtime for this brute-force algorithm is ______________? Possible to do better with a randomized divide and conquer algorithm:

  3. Partition the elements: Lumoto's Partition Pseudo Code Partition(A, p, r) (*pivot is A[r]*) x = A[r] i = p-1 for j = p to r-1 do if A[j] <= x then i = i+1 swap(A[i], A[j]) swap(A[i+1], A[r]) return i+1 (*new pivot*)

  4. Selection of ith element: RandomSelect(A,p,r,i) if p = r then return A[p] // single element must be ith else q = RandomPartition(A,p,r) //ordered around q k = q - p + 1 //size of lower part. + 1(pivot) if k = i then return A[q] (* pivot is ith element else if i < k then RandomSelect(A,p,q-1,i) //in lower else RandomSelect(A,q+1,r,i-k) //in upper

  5. How partition works

  6. Small trace Pivot is 6, array values are 2, 3, 5, 6, 8, 10, 11, 13 in some arbitrary order. After randomized Partition =>

  7. Ensure Average Case behavior: RandomPartition(A,p,r) i = Random(p,r) swap(A[i],A[r]) return(Partition(A,p,r)

  8. Best Case : always partition around the element with rank = i. Assume a 9 to 1 split on partition then T(n) = ? Use Master Method for solution. Which case? Realize that the assumption of a 9 to 1 split makes no difference. Lets say that the split is 99 to 1 then T(n) = ? Using Master Method gives same result with 100/99 as the base. Why does this hold no matter what split occurs in the Partition? Case 3 T(n) = θ(n)

  9. Worst Case :always partition around the largest remaining element in the partition then T(n) = ? • Use iteration method for solution.

  10. Average case: assume the ith element always falls in the larger partition (giving an upper bound on average case). Assume all partitions are equally likely and that partition may produce subdivisions of 0 .. n-1. T(n) <= 1/n( S n-1 [T(max(k ,(n-1-k))]) + Q(n) k=0 E[T(n)]<= 2/n( Sn-1 [T(k )]) + Q(n) k=n/2 Someone explain simplification?

  11. Solve above recurrence by using substitution method. Guess solution is T(n) <= cn Assume T(k) <= 2/n( S n-1ck) + Q(n) for all k < n. k=n/2 Prove T(n) <= cn by induction. Go over solution.

  12. Expectation of MOS Continued

  13. Expectation of MOS Continued

  14. Expectation MOS Continued

  15. Summary • Order selection (statistics) • Know algorithm and • Its runtime • Ugly analysis of average case (but much easier than for Quicksort)

  16. Order Selection Works fast: linear expected time.Excellent algorithm in practice.But, the worst case is very bad: Θ(n2). • Q. Is there an algorithm that runs in linear time in the worst case? • A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973]. IDEA: Generate a good pivot recursively.

  17. 1. Select Pivot Divide the n elements into groups of 5.

  18. 1cont. Select pivot - Θ(n) Find the median of each 5-element group by rote. How?

  19. By Rote - Θ(1), Brute Force, but fastest known method Let A = 7 5 3 1 6 • Sort the first two pairs. [2 comparisons] 5 7 1 3 a b c d e • Order the pairs w.r.t. their larger element. [1 comparison] 1 3 5 7 6 • Call the result [a,b,c,d,e]; we know a<b<d and c<d. Now there are 3 elements less than d, hence d can't be the median( i.e. 4th element of the sorted array) a b c e • WLOG say c<e and a<b (known already). [ 1 comparison] 1 3 5 6 • Order the pairs w.r.t. their larger element. [ 1 comparison] • WLOG a<b<e and c<e. Compare c & b, greater one is the median. [ 1 comparison] • Hence, total number of comparisons is equal to 6.

  20. 2. Select final pivot – T(n/5) Recursively SELECT the median x of the ⎣n/5⎦ group medians to be the pivot. Just recursively call our algorithm on previous slide on blocks of 5 medians. EX. n =125, 125/5 = 25 medians, recurse on 25 medians, 25/5 = 5 medians, apply base case.

  21. Analysis of how A is split via Partitioning – T(3n/4) At least half the group medians are ≤ x, which is at least ⎣ ⎣n/5⎦ /2⎦ = ⎣n/10⎦ group medians. Therefore, at least 3 ⎣n/10⎦ elements are ≤ x. Similarly, at least 3 ⎣n/10⎦ elements are ≥ x. Where is 3 coming from?

  22. Analysis For n ≥ 50, we have 3 ⎣n/10⎦ ≥ n/4. So the number of elements ≥ or ≤ x is at least ¼ of the array. But we want to recurse on the larger partition so ….. For n ≥ 50 the recursive call to SELECT is executed recursively on ≤ 3n/4 elements. • • The recurrence for running time can assume the recursive call takes time T(3n/4) in the worst case. • • For n < 50, we know that the worst-case time is T(n) = Θ(1).

  23. Solve Recurrence by Substitution

  24. Conclusion Since the work at each level of recursion is a constant fraction smaller, the work per level is a geometric series dominated by the linear work at the root. In practice, this algorithm runs slowly, because the constant in front of n is large. The randomized algorithm is far more practical.

More Related