270 likes | 516 Views
Order Statistics. Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill. Order Statistic. i th order statistic: i th smallest element of a set of n elements. Minimum: first order statistic. Maximum: n th order statistic.
E N D
Order Statistics Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill
Order Statistic • ithorder statistic:ithsmallest element of a set of n elements. • Minimum: first order statistic. • Maximum:nth order statistic. • Median: “half-way point” of the set. • Unique, when n is odd – occurs at i = (n+1)/2. • Two medians when n is even. • Lower median, at i = n/2. • Upper median, at i= n/2+1. • For consistency, “median” will refer to the lower median.
Selection Problem • Selection problem: • Input: A set A of ndistinct numbers and a number i, with 1in. • Output: the element x A that is larger than exactly i – 1 other elements of A. • Can be solved in O(n lg n) time. How? • We will study faster linear-time algorithms. • For the special cases when i = 1 and i = n. • For the general problem.
Minimum (Maximum) Minimum (A) 1. min A[1] 2. fori2tolength[A] 3. doifmin > A[i] 4. thenmin A[i] 5. return min Maximumcan be determined similarly. • T(n) = (n). • No. of comparisons: n – 1. • Can we do better? Why not? • Minimum(A) has worst-case optimal # of comparisons.
Problem Minimum (A) 1. min A[1] 2. fori2tolength[A] 3. doifmin > A[i] 4. thenmin A[i] 5. return min • Average for random input: How many times do we expect line 4 to be executed? • X = RV for # of executions of line 4. • Xi = Indicator RV for the event that line 4 is executed on the ith iteration. • X = i=2..nXi • E[Xi] = 1/i. How? • Hence, E[X] = ln(n) – 1 = (lg n).
Simultaneous Minimum and Maximum • Some applications need to determine both the maximum and minimum of a set of elements. • Example: Graphics program trying to fit a set of points onto a rectangular display. • Independent determination of maximum and minimum requires 2n – 2 comparisons. • Can we reduce this number? • Yes.
Simultaneous Minimum and Maximum • Maintain minimum and maximum elements seen so far. • Process elements in pairs. • Compare the smaller to the current minimum and the larger to the current maximum. • Update current minimum and maximum based on the outcomes. • No. of comparisons per pair = 3. How? • No. of pairs n/2. • For odd n: initialize min and max to A[1]. Pair the remaining elements. So, no. of pairs = n/2. • For even n: initialize min to the smaller of the first pair and max to the larger. So, remaining no. of pairs = (n – 2)/2 < n/2.
Simultaneous Minimum and Maximum • Total no. of comparisons, C 3n/2. • For odd n:C = 3n/2. • For even n:C = 3(n – 2)/2 + 1 (For the initial comparison). = 3n/2 – 2 < 3n/2.
General Selection Problem • Seems more difficult than Minimum or Maximum. • Yet, has solutions with same asymptotic complexity as Minimum and Maximum. • We will study 2 algorithms for the general problem. • One with expectedlinear-time complexity. • A second, whose worst-case complexity is linear.
Selection in Expected Linear Time • Modeled after randomized quicksort. • Exploits the abilities of Randomized-Partition (RP). • RP returns the index k in the sorted order of a randomly chosen element (pivot). • If the order statistic we are interested in, i, equals k, then we are done. • Else, reduce the problem size using its other ability. • RP rearranges the other elements around the random pivot. • If i < k, selection can be narrowed down to A[1..k – 1]. • Else, select the (i – k)th element from A[k+1..n]. (Assuming RP operates on A[1..n]. For A[p..r], change k appropriately.)
Randomized Quicksort: review Rnd-Partition(A, p, r) i := Random(p, r); A[r] A[i]; x, i := A[r], p – 1; for j := p to r – 1 do if A[j] x then i := i + 1; A[i] A[j] fi od; A[i + 1] A[r]; return i + 1 Quicksort(A, p, r) if p < r then q := Rnd-Partition(A, p, r); Quicksort(A, p, q – 1); Quicksort(A, q + 1, r) fi A[p..r] 5 A[p..q – 1] A[q+1..r] Partition 5 5 5
Randomized-Select Randomized-Select(A, p, r, i) // select ith order statistic. 1. ifp = r 2. thenreturnA[p] 3. q Randomized-Partition(A, p, r) 4. k q – p + 1 5. ifi = k 6. then return A[q] 7. elseif i < k 8. thenreturn Randomized-Select(A, p, q – 1, i) 9. else return Randomized-Select(A, q+1, r, i – k)
Analysis • Worst-case Complexity: • (n2) – As we could get unlucky and always recurse on a subarray that is only one element smaller than the previous subarray. • Average-case Complexity: • (n) – Intuition: Because the pivot is chosen at random, we expect that we get rid of half of the list each time we choose a random pivot q. • Why (n) and not (n lg n)?
Average-case Analysis • Define Indicator RV’s Xk, for 1 k n. • Xk = I{subarray A[p…q] has exactly k elements}. • Pr{subarray A[p…q] has exactly k elements} = 1/n for all k = 1..n. • Hence, E[Xk] = 1/n. • Let T(n) be the RV for the time required by Randomized-Select (RS) on A[p…q] of n elements. • Determine an upper bound on E[T(n)]. (9.1)
Average-case Analysis • A call to RS may • Terminate immediately with the correct answer, • Recurse on A[p..q – 1], or • Recurse on A[q+1..r]. • To obtain an upper bound, assume that the ith smallest element that we want is always in the larger subarray. • RP takes O(n) time on a problem of size n. • Hence, recurrence for T(n) is: • For a given call of RS, Xk =1 for exactly one value of k, and Xk = 0 for all other k.
Average-case Analysis (by linearity of expectation) (by Eq. (C.23)) (by Eq. (9.1))
Average-case Analysis (Contd.) The summation is expanded • If n is odd, T(n – 1) thru T(n/2) occur twice and T(n/2) occurs once. • If n is even, T(n – 1) thru T(n/2) occur twice.
Average-case Analysis (Contd.) • We solve the recurrence by substitution. • Guess E[T(n)] = O(n). Thus, if we assume T(n) = O(1) for n < 2c/(c – 4a), we have E[T(n)] = O(n).
Selection in Worst-Case Linear Time • Algorithm Select: • Like RandomizedSelect, finds the desired element by recursively partitioning the input array. • Unlike RandomizedSelect, is deterministic. • Uses a variant of the deterministic Partition routine. • Partition is told which element to use as thepivot. • Achieves linear-time complexity in the worst case by • Guaranteeingthat the split is always “good” at each Partition. • How can a good split be guaranteed?
Guaranteeing a Good Split • We will have a good split if we can ensure that the pivot is the median element or an element close to the median. • Hence, determining a reasonable pivot is the first step.
Choosing a Pivot • Median-of-Medians: • Divide the n elements into n/5 groups. • n/5 groups contain 5 elements each. 1 group contains n mod 5 < 5 elements. • Determine the median of each of the groups. • Sort each group using Insertion Sort. Pick the median from the sorted list of group elements. • Recursively find the median x of the n/5 medians. • Recurrence for running time (of median-of-medians): • T(n) = O(n) + T(n/5) + ….
Algorithm Select • Determine the median-of-medians x (using the procedure on the previous slide.) • Partition the input array around x using the variant of Partition. • Let k be the index of x that Partition returns. • If k = i, then return x. • Else if i < k, then apply Select recursively to A[1..k–1] to find the ith smallest element. • Else if i > k, then apply Select recursively to A[k+1..n] to find the (i– k)th smallest element. (Assumption: Select operates on A[1..n]. For subarrays A[p..r], suitably change k. )
Worst-case Split Arrows point from larger to smaller elements. n/5 groups of 5 elements each. Elements < x n/5th group of n mod 5 elements. Median-of-medians, x Elements > x
Worst-case Split • Assumption: Elements are distinct. • At least half of the n/5 medians are greater than x. • Thus, at least half of the n/5 groups contribute 3 elements that are greater than x. • The last group and the group containing x may contribute fewer than 3 elements. Exclude these groups. • Hence, the no. of elements > x is at least • Analogously, the no. of elements < x is at least 3n/10–6. • Thus, in the worst case, Select is called recursively on at most 7n/10+6 elements.
Recurrence for worst-case running time • T(Select)T(Median-of-medians) +T(Partition) +T(recursive call to select) • T(n) O(n) + T(n/5) + O(n) + T(7n/10+6) = T(n/5) + T(7n/10+6) + O(n) • Assume T(n) (1), for n 140. T(Median-of-medians) T(Partition) T(recursive call)
Solving the recurrence • To show: T(n) = O(n) cn for suitable c and all n > 0. • Assume:T(n) cn for suitable c and all n 140. • Substituting the inductive hypothesis into the recurrence, • T(n) c n/5 + c(7n/10+6)+an cn/5 + c + 7cn/10 + 6c + an = 9cn/10 + 7c + an = cn +(–cn/10 + 7c + an) cn, if –cn/10 + 7c + an 0. • n/(n–70) is a decreasing function of n. Verify. • Hence, c can be chosen for any n = n0 > 70, provided it can be assumed that T(n) = O(1) for n n0. • Thus, Select has linear-time complexity in the worst case. –cn/10 + 7c + an 0 c 10a(n/(n – 70)), when n > 70. For n 140, c 20a.