CS200: Algorithm Analysis

CS200: Algorithm Analysis

MEDIAN ORDER STATISTICS Order Statistics Problem : Find the ith smallest of n elements (the element with rank = i). if i = 1 then minimum. if i = n then maximum if i = floor or ceiling of (n/2) then median. Simple solution is to sort the elements and index into ith element but runtime for this brute-force algorithm is ______________? Possible to do better with a randomized divide and conquer algorithm:

Partition the elements: Lumoto's Partition Pseudo Code Partition(A, p, r) (*pivot is A[r]*) x = A[r] i = p-1 for j = p to r-1 do if A[j] <= x then i = i+1 swap(A[i], A[j]) swap(A[i+1], A[r]) return i+1 (*new pivot*)

Selection of ith element: RandomSelect(A,p,r,i) if p = r then return A[p] // single element must be ith else q = RandomPartition(A,p,r) //ordered around q k = q - p + 1 //size of lower part. + 1(pivot) if k = i then return A[q] (* pivot is ith element else if i < k then RandomSelect(A,p,q-1,i) //in lower else RandomSelect(A,q+1,r,i-k) //in upper

How partition works

Small trace Pivot is 6, array values are 2, 3, 5, 6, 8, 10, 11, 13 in some arbitrary order. After randomized Partition =>

Ensure Average Case behavior: RandomPartition(A,p,r) i = Random(p,r) swap(A[i],A[r]) return(Partition(A,p,r)

Best Case : always partition around the element with rank = i. Assume a 9 to 1 split on partition then T(n) = ? Use Master Method for solution. Which case? Realize that the assumption of a 9 to 1 split makes no difference. Lets say that the split is 99 to 1 then T(n) = ? Using Master Method gives same result with 100/99 as the base. Why does this hold no matter what split occurs in the Partition? Case 3 T(n) = θ(n)

Worst Case :always partition around the largest remaining element in the partition then T(n) = ? • Use iteration method for solution.

Average case: assume the ith element always falls in the larger partition (giving an upper bound on average case). Assume all partitions are equally likely and that partition may produce subdivisions of 0 .. n-1. T(n) <= 1/n( S n-1 [T(max(k ,(n-1-k))]) + Q(n) k=0 E[T(n)]<= 2/n( Sn-1 [T(k )]) + Q(n) k=n/2 Someone explain simplification?

Solve above recurrence by using substitution method. Guess solution is T(n) <= cn Assume T(k) <= 2/n( S n-1ck) + Q(n) for all k < n. k=n/2 Prove T(n) <= cn by induction. Go over solution.

Expectation of MOS Continued

Expectation MOS Continued

Summary • Order selection (statistics) • Know algorithm and • Its runtime • Ugly analysis of average case (but much easier than for Quicksort)

Order Selection Works fast: linear expected time.Excellent algorithm in practice.But, the worst case is very bad: Θ(n2). • Q. Is there an algorithm that runs in linear time in the worst case? • A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973]. IDEA: Generate a good pivot recursively.

1. Select Pivot Divide the n elements into groups of 5.

1cont. Select pivot - Θ(n) Find the median of each 5-element group by rote. How?

By Rote - Θ(1), Brute Force, but fastest known method Let A = 7 5 3 1 6 • Sort the first two pairs. [2 comparisons] 5 7 1 3 a b c d e • Order the pairs w.r.t. their larger element. [1 comparison] 1 3 5 7 6 • Call the result [a,b,c,d,e]; we know a<b<d and c<d. Now there are 3 elements less than d, hence d can't be the median( i.e. 4th element of the sorted array) a b c e • WLOG say c<e and a<b (known already). [ 1 comparison] 1 3 5 6 • Order the pairs w.r.t. their larger element. [ 1 comparison] • WLOG a<b<e and c<e. Compare c & b, greater one is the median. [ 1 comparison] • Hence, total number of comparisons is equal to 6.

2. Select final pivot – T(n/5) Recursively SELECT the median x of the ⎣n/5⎦ group medians to be the pivot. Just recursively call our algorithm on previous slide on blocks of 5 medians. EX. n =125, 125/5 = 25 medians, recurse on 25 medians, 25/5 = 5 medians, apply base case.

Analysis of how A is split via Partitioning – T(3n/4) At least half the group medians are ≤ x, which is at least ⎣ ⎣n/5⎦ /2⎦ = ⎣n/10⎦ group medians. Therefore, at least 3 ⎣n/10⎦ elements are ≤ x. Similarly, at least 3 ⎣n/10⎦ elements are ≥ x. Where is 3 coming from?

Analysis For n ≥ 50, we have 3 ⎣n/10⎦ ≥ n/4. So the number of elements ≥ or ≤ x is at least ¼ of the array. But we want to recurse on the larger partition so ….. For n ≥ 50 the recursive call to SELECT is executed recursively on ≤ 3n/4 elements. • • The recurrence for running time can assume the recursive call takes time T(3n/4) in the worst case. • • For n < 50, we know that the worst-case time is T(n) = Θ(1).

Solve Recurrence by Substitution

Conclusion Since the work at each level of recursion is a constant fraction smaller, the work per level is a geometric series dominated by the linear work at the root. In practice, this algorithm runs slowly, because the constant in front of n is large. The randomized algorithm is far more practical.

CS200: Algorithm Analysis