260 likes | 386 Views
Bucket & Radix Sorts. Efficient Sorts To Date. QuickSort : O( nlogn ) – O(n 2 ) MergeSort : O( nlogn ) TreeSort : O( nlogn ) HeapSort : O( nlogn ) Coincidence?. Comparisons. N! possible orderings of N items Represent as decision tree Decision tree to reach N! states has depth log(N!).
E N D
Efficient Sorts To Date • QuickSort : O(nlogn) – O(n2) • MergeSort : O(nlogn) • TreeSort : O(nlogn) • HeapSort : O(nlogn) Coincidence?
Comparisons • N! possible orderings of N items • Represent as decision tree • Decision tree to reach N! states has depth log(N!)
Comparisons min height ≥ Log(N!) ≥ Log(1 * 2 * 3 * … * N) ≥ Log(1) + Log(2) + … +Log(N) ≥ Log(n/2) + … + Log(N) ≥ (n/2)Log(n/2) ≥ (n/2)(Logn – Log2) = (n/2)(Logn – 1) ≥ (nLogn– n)/2 ≥ (nLogn) Just drop first half All n/2 logs are ≥ n/2 Omega– asymptotic lower bound
Efficient Sorts To Date • QuickSort : O(nlogn) – O(n2) • MergeSort : O(nlogn) • TreeSort : O(nlogn) • HeapSort : O(nlogn) Coincidence? • Comparison sorts can never beat nlogn Comparison based
Sorting with Buckets • Ever used a sorting stick?
Bucket Sort Bucket Sort: For each item Pick correct bucket and inert item Make new List For each bucket Add contents to List Return List
Bucket Sort Bucket Sort: For each item Pick correct bucket and inert item Make new List For each bucket Add contents to List Return List O(1) O(1) O(1) Each bucket is linked list O(1)
Bucket Sort Bucket Sort: For each item Pick correct bucket and inert item Make new List For each bucket Add contents to List Return List O(n) – n = num items O(1) O(1) O(k) – k = num buckets O(1) Each bucket is linked list O(1)
Bucket Sort • Bucket Sort : • O(n + k) • Sort granularity limited by buckets • Perfect sort, k = range of values
Bucket Sort • Bucket Sort : • O(n + k) • Sort granularity limited by buckets • Perfect sort, k = range of values • Sort 30,000 integers perfectly • 30,000 + 4,000,000,000 • VS n log n30,000 log 30,000 ≈ 450,000 4 billion + buckets to represent all ints
Bucket Sort • Bucket Sort : • O(n + k) • Sort granularity limited by buckets • Perfect sort, k = range of values • Efficient if k < in relation to n • Sort 4 million people in OR by Zip Code • Bucket • n = 4,000,000 k = 1000 (less than 1000 zips in OR) • Time ~4,001,000 • NLogN • 4,000,000 * log(4,000,000) ~ 4,000,000 * 22
Bucket Sort • Perfect sort with limited buckets: • BucketSort • InsertionSort each bucket • Best Case : O(N + K) • Worst Case : O(N + N2) • One bucket gets everything
Sort • Sorting a real big pile alphabetically
Sort • Sorting a real big pile alphabetically • Sort A-Z • Set aside each pile • Sort the A's by second letter • Then B's, C's… • Then take AA's • Sort by third letter…
Radix Sort • Radix : Base • Radix Sort • Sort digital data • Bucket sort based on each digit successively
Radix Sort • MSD – Most Significant Digit • MSD Radix Sort • Partition list based on first digit
Radix Sort • MSD – Most Significant Digit • MSD Radix Sort • Partition list based on first digit • Recursively sort on nextdigit
MSD Advantages • May not examine all keys • Works on variable lengths: • Little extra space
How does it work? • Radix Exchange • MSD radix sort • Partition like QuickSort,but swap on misplaced digits • Unstable http://www.cse.hut.fi/en/research/SVG/TRAKLA2/exercises/TrueRecursiveRadixExchangeSort-25.html
LSD Radix • LSD – Least Significant Digit • Work from smallest digit to largest • Hard with variable lengths • Stable!
How does it work? • Iterative Radix Sort • Buckets used as counters • Goal is to find starting/ending point of each value
How does it work? • Iterative Radix Sort • Buckets used as counters • Check each item add one to appropriate bucket • Compute cumulative totals from buckets • Place each item in temp array • Use bucket value as index • Decrement counter as we go http://www.cs.usfca.edu/~galles/visualization/RadixSort.html
So it wins? • RadixSort : O(R*N) where R = num digits • Num digits is constant… • O(N)!!
So it wins? • RadixSort : O(R*N) where R = num digits • Num digits isn't constant in general • If M distinct values and base kR = logkM • O(R * N) = O(logkM * N) • Only better then nlogn for specific situations where range of distinct values known
Radix Summary • For specific problems, runs in linear time • Always depends on particulars of data • No general RadixSort algorithm • Right tool for big jobs on specific sets of data