350 likes | 366 Views
Order Statistics. Order Statistic. i th order statistic: i th smallest element of a set of n elements. Minimum: first order statistic. Maximum: n th order statistic. Median: “half-way point” of the set. Unique, when n is odd – occurs at i = ( n+ 1)/2.
E N D
Order Statistics Comp 550, Spring 2015
Order Statistic • ithorder statistic:ithsmallest element of a set of n elements. • Minimum: first order statistic. • Maximum:nth order statistic. • Median: “half-way point” of the set. • Unique, when n is odd – occurs at i = (n+1)/2. • Two medians when n is even. • Lower median, at i = n/2. • Upper median, at i= n/2+1. • For consistency, “median” will refer to the lower median. Comp 550
Selection Problem • Selection problem: • Input: A set A of ndistinct numbers and a number i, with 1in. • Ouput: the element x A that is larger than exactly i – 1 other elements of A. • Can be solved in O(n lg n) time. How? • We will study linear-time algorithms. • For the special cases when i = 1 and i = n. • For the general problem. Comp 550
Minimum (Maximum) Minimum (A) 1. min A[1] 2. fori2tolength[A] 3. doifmin > A[i] 4. thenmin A[i] 5. return min Maximumcan be determined similarly. • T(n) = (n). • No. of comparisons: n – 1. • Can we do better? Why not? • Minimum(A) has worst-case optimal # of comparisons. Comp 550
Selection Problem Minimum (A) 1. min A[1] 2. fori2tolength[A] 3. doifmin > A[i] 4. thenmin A[i] 5. return min • Average for random input: How many times do we expect line 4 to be executed? • X = RV for # of executions of line 4. • Xi = Indicator RV for the event that line 4 is executed on the ith iteration. • X = i=2..nXi • E[Xi] = 1/i. How? • Hence, E[X] = ln(n) – 1 = (lg n). Comp 550
Simultaneous Minimum and Maximum • Bounding box: compute minimum and maximum simultaneously • Separately, each require n – 1 comparisons, • Together, can we use fewer than 2n – 2 comparisons? • Yes: process elements in pairs: 3n/2 comparisons. • For each pair, find smaller&larger with one comparison, • Compare smaller to min, larger to max • At a cost of 3 comparisons for every 2 elements • Key idea: structure your data Comp 550
Selection Problem • Selection problem: • Input: A set A of ndistinct numbers and a number i, with 1in. • Ouput: the element x A that is larger than exactly i – 1 other elements of A. • Seems harder than min or max, but has the same asymptotic running time. • QuickMedian: expected linear time (randomized) • BFPRT73: linear time (deterministic) • Key idea: geometric series sum to linear Comp 550
Randomized Quicksort: review Rnd-Partition(A, p, r) i := Random(p, r); A[r] A[i]; x, i := A[r], p – 1; for j := p to r – 1 do if A[j] x then i := i + 1; A[i] A[j] fi od; A[i + 1] A[r]; return i + 1 Quicksort(A, p, r) if p < r then q := Rnd-Partition(A, p, r); Quicksort(A, p, q – 1); Quicksort(A, q + 1, r) fi A[p..r] 5 A[p..q – 1] A[q+1..r] Partition 5 5 5 Comp 550
Selection in Expected Linear Time • Key idea: Quicksort, but recur only on list containing i • Exploit two abilities of Randomized-Partition (RP). • RP returns the index k in the sorted order of a randomly chosen element (pivot). • If the order statistic i = k, then we are done. • Else reduce the problem size using its other ability. • RP rearranges all other elements around the pivot. • If i < k, selection can be narrowed down to A[1..k – 1]. • Else, select the (i – k)th element from A[k+1..n]. Comp 550
Randomized-Select Randomized-Select(A, p, r, i) // select ith order statistic. 1. ifp = r 2. thenreturnA[p] 3. q Randomized-Partition(A, p, r) 4. k q – p + 1 5. ifi = k 6. then return A[q] 7. elseif i < k 8. thenreturn Randomized-Select(A, p, q – 1, i) 9. else return Randomized-Select(A, q+1, r, i – k) Comp 550
Randomized-Select Example • Goal: Find 3rd smallest element M. C. Lin
Randomized-Select Example M. C. Lin
Randomized-Select Example M. C. Lin
Randomized-Select Example M. C. Lin
Randomized-Select Example M. C. Lin
Randomized-Select Example M. C. Lin
Randomized-Select Example M. C. Lin
Analysis • Worst-case Complexity: • (n2) – As we could get unlucky and always recurse on a subarray that is only one element smaller than the previous subarray. (T(n) = T(n-1) + (n) ) • Average-case Complexity: • (n) – Intuition: Because the pivot is chosen at random, we expect that we get rid of half of the list each time we choose a random pivot q. • Why (n) and not (n lg n)? Comp 550
Analysis • Average-case Complexity - more intuition • If we get rid of 10% of the list each time we choose a random pivot q... • T(n) = T(9/10 n) + (n) Comp 550
Analysis • Average-case Complexity - more intuition • If we get rid of 10% of the list each time we choose a random pivot q... • T(n) = T(9/10 n) + (n) • Let a 1 and b > 1 ,T(n) = a T(n/b) + c nk, n 0 • 1. If a > bk, then T(n) = ( nlog_b a ). • 2. If a = bk, then T(n) = ( nklg n ). • 3. If a < bk, then T(n) = ( nk ). Comp 550
Analysis • Average-case Complexity - more intuition • If we get rid of 10% of the list each time we choose a random pivot q... • T(n) = T(9/10 n) + (n) • T(n) = (n) • Let a 1 and b > 1 ,T(n) = a T(n/b) + c nk, n 0 • 1. If a > bk, then T(n) = ( nlog_b a ). • 2. If a = bk, then T(n) = ( nklg n ). • 3. If a < bk, then T(n) = ( nk ). Comp 550
Average-case Analysis • A call to RS may • Terminate immediately with the correct answer, • Recurse on A[p..q – 1], or • Recurse on A[q+1..r]. • To obtain an upper bound, assume that the ith smallest element that we want is always in the larger subarray. • RP takes O(n) time on a problem of size n. • Hence, recurrence for T(n) is: Comp 550
Solving the recurrence Comp 550
Average-case Analysis (Contd.) The summation is expanded • If n is odd, T(n – 1) thru T(n/2) occur twice and T(n/2) occurs once. • If n is even, T(n – 1) thru T(n/2) occur twice. Comp 550
Average-case Analysis (Contd.) • We solve the recurrence by substitution. • Guess T(n) = O(n). Thus, if we assume T(n) = O(1) for n < 2c/(c – 4a), we have E[T(n)] = O(n). Comp 550
Selection in Worst-Case Linear Time • Algorithm Select: • Like RandomizedSelect, finds the desired element by recursively partitioning the input array. • Unlike RandomizedSelect, is deterministic. • Uses a variant of the deterministic Partition routine. • Partition is told which element to use as thepivot. • Achieves linear-time complexity in the worst case by • Guaranteeingthat the split is always “good” at each Partition. • How can a good split be guaranteed? Comp 550
Choosing a Pivot • Median-of-Medians: • Divide the n elements into n/5 groups. • n/5 groups contain 5 elements each. 1 group may contain n mod 5 < 5 elements. • Determine the median of each of the groups. • Recursively find the median x of the n/5 medians. n/5 groups of 5 elements each. n/5th group of n mod 5 elements. Comp 550
Example Z. Guo
Algorithm Select • Determine the median-of-medians x(on previous slide.) • Partition input array around x (Partition from Quicksort). • Let k be the index of x that Partition returns. • If k = i, then return x. • Else if i < k, apply Select recursively to A[1..k–1] to find the ith smallest element. • Else if i > k, apply Select recursively to A[k+1..n] to find the (i– k)th smallest element. Comp 550
Worst-case Split Arrows point from larger to smaller elements. n/5 groups of 5 elements each. Elements < x n/5th group of n mod 5 elements. Median-of-medians, x Elements > x Comp 550
Worst-case Split • Assumption: Elements are distinct. Why? • At least n/5 /2 groups have 3 of their 5 elements ≥ x. • Ignore the last group if it has fewer than 5 elements. • Hence, the no. of elements ≥ x is at least 3(n–4)/10. • Likewise, the no. of elements ≤ x is at least 3(n–4)/10. • Thus, in the worst case, Select is called recursively on at most (7n+12)/10 elements. Comp 550
Recurrence for worst-case running time • T(Select)T(Median-of-medians) + T(Partition) + T(recursive call to select) • T(n) O(n) + T(n/5) + O(n) + T((7n+12)/10) = T(n/5) + T(7n/10+1.2) + O(n) • Base: for n 24, assume we just use Insertionsort. • So T(n) 24n for all n 24. T(Median-of-medians) T(Partition) T(recursive call) Comp 550
Solving the recurrence • Base: for all n 24, T(n) 24n • For n > 24, T(n) ≤ an+ T(n/5) + T(7n/10+1.2) • We want to find c>0 so for all n>0 T(n) ≤ cn. • Base implies c ≥ 24 • T(n) ≤ an+ T(n/5) + T(7n/10+1.2)?≤ an+ c n/5 + c 7n/10 + 1.2c= cn – (c n/10 –an – 1.2c)= cn – ((c/20 –a)n + (n/20 – 1.2)c)≤ cn, as long as c ≥ 20a. • So, c = max(24, 20a) works Comp 550
Conclusions • We can find the ith largest in an unordered list in Θ(n) worst-case time • Let’s us do Quicksort in worst-case Θ(n lg n) . • That constant, 20× partition cost, was high; use RandomizedSelect (aka QuickMedian) in practice. Comp 550