180 likes | 345 Views
Linear Time Selection. These lecture slides are adapted from CLRS 10.3. Where to Build a Road?. Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways?. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 12. 11.
E N D
Linear Time Selection These lecture slides are adaptedfrom CLRS 10.3.
Where to Build a Road? • Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? 1 2 3 4 5 6 7 8 9 10 12 11
Where to Build a Road? • Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? • n1 = nodes on top. • n2 = nodes on bottom. 1 Decreases total cost by (n2-n1) 2 3 4 5 6 7 8 9 10 12 11
Where to Build a Road? • Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? • n1 = nodes on top. • n2 = nodes on bottom. 1 2 3 4 5 6 7 8 9 10 12 11
Where to Build a Road? • Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? • n1 = nodes on top. • n2 = nodes on bottom. 1 2 3 4 5 6 7 8 9 10 12 11
Where to Build a Road? • Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? • Solution: put street at median of y coordinates. 1 2 3 4 5 6 7 8 9 10 12 11
Order Statistics • Given N linearly ordered elements, find ith smallest element. • Min: i = 1 • Max: i = N • Median: i = (N+1) / 2 and (N+1) / 2 • O(N) for min or max. • O(N log N) comparisons by sorting. • O(N log i) with heaps. • Can we do in worst-case O(N) comparisons? • Surprisingly, yes. (Blum, Floyd, Pratt, Rivest, Tarjan, 1973) • Assumption to make presentation cleaner. • All items have distinct values.
Select x Partition(N, a1, a2, ..., aN) k rank(x) if (i == k) return x else if (i < k) b[] all items of a[] less than x return Select(ith, k-1, b1, b2, ..., bk-1) else if (i > k) c[] all items of a[] greater than x return Select((i-k)th, N-k, c1, c2, ..., cN-k) Select (ith, N, a1, a2, . . . , aN) • Similar to quicksort, but throw away useless "half" at each iteration. • Select ith smallest element from a1, a2, . . . , aN. x = partition element Want to choose x so thatx is (roughly) the ith largest.
Partition 29 10 38 37 02 55 18 24 34 35 36 22 44 52 11 53 12 13 43 20 04 27 28 23 06 26 40 19 01 46 31 49 08 14 09 05 03 54 30 48 47 32 51 21 45 39 50 15 25 16 41 17 33 07 • Partition(). • Divide N elements into N/5 groups of 5 elements each, plus extra. N = 54
Partition 14 29 09 10 38 05 37 03 02 02 55 12 01 18 17 24 34 20 35 04 36 22 22 10 44 52 06 11 11 53 25 12 16 13 13 43 24 31 20 04 07 27 28 28 23 23 38 06 26 15 40 40 19 19 18 01 46 43 32 31 35 49 08 29 14 39 09 50 05 03 26 53 54 30 30 48 41 46 47 33 32 49 51 21 45 45 44 39 52 50 15 37 25 54 16 53 41 48 47 17 34 33 51 07 • Partition(). • Divide N elements into N/5 groups of 5 elements each, plus extra. • Brute force sort each of the 5-element groups.
Partition 28 • Partition(). • Divide N elements into N/5 groups of 5 elements each, plus extra. • Brute force sort each of the 5-element groups. • Find x = "median of medians" by Select() on N/5 medians. 14 09 05 03 02 12 01 17 20 04 36 22 10 06 11 25 16 13 24 31 07 27 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51
Select if (N is small) use mergesort Divide a[] into groups of 5, and let m1, m2, ..., mN/5 be list of medians. x Select(N/10, m1, m2, ..., mN/5) k rank(x) if (i == k) // Case 1 return x else if (i < k) // Case 2 b[] all items of a[] less than x return Select(ith, k-1, b1, b2, ..., bk-1) else if (i > k) // Case 3 c[] all items of a[] greater than x return Select((i-k)th, N-k, c1, c2, ..., cN-k) Select (ith, N, a1, a2, . . . , aN) median of medians
Selection Analysis • Crux of proof: delete roughly 30% of elements by partitioning. • At least 1/2 of 5 element medians x • at least N / 5 / 2 = N / 10 medians x 14 09 05 03 02 12 01 17 20 04 36 median of medians 22 10 06 11 25 16 13 24 31 07 27 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51
Selection Analysis • Crux of proof: delete roughly 30% of elements by partitioning. • At least 1/2 of 5 element medians x • at least N / 5 / 2 = N / 10 medians x • At least 3 N / 10 elements x. 14 09 05 03 02 12 01 17 20 04 36 median of medians 22 10 06 11 25 16 13 24 31 07 27 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51
Selection Analysis • Crux of proof: delete roughly 30% of elements by partitioning. • At least 1/2 of 5 element medians x • at least N / 5 / 2 = N / 10 medians x • At least 3 N / 10 elements x. • At least 3 N / 10 elements x. 14 09 05 03 02 12 01 17 20 04 36 median of medians 22 10 06 11 25 16 13 24 31 07 27 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51
Selection Analysis • Crux of proof: delete roughly 30% of elements by partitioning. • At least 1/2 of 5 element medians x • at least N / 5 / 2 = N / 10 medians x • At least 3 N / 10 elements x. • At least 3 N / 10 elements x. Select() called recursively (Case 2 or 3) with at most N - 3 N / 10 elements. • C(N) = # comparisons on a file of size N. • Now, solve recurrence. • Apply master theorem? • Assume N is a power of 2? • Assume C(N) is monotone non-decreasing?
Selection Analysis • Analysis of selection recurrence. • T(N) = # comparisons on a file of size N. • T(N) is monotone, but C(N) is not! • Claim: T(N) 20cN. • Base case: N < 50. • Inductive hypothesis: assume true for 1, 2, . . . , N-1. • Induction step: for N 50, we have: For n 50,3 N / 10 N / 4.
Linear Time Selection Postmortem • Practical considerations. • Constant (currently) too large to be useful. • Practical variant: choose random partition element. • O(N) expected running time ala quicksort. • Open problem: guaranteed O(N) with better constant. • Quicksort. • Worst case O(N log N) if always partition on median. • Justifies practical variants: median-of-3, median-of-5.