Bucket & Radix Sorts

Bucket & Radix Sorts

Efficient Sorts To Date • QuickSort : O(nlogn) – O(n2) • MergeSort : O(nlogn) • TreeSort : O(nlogn) • HeapSort : O(nlogn) Coincidence?

Comparisons • N! possible orderings of N items • Represent as decision tree • Decision tree to reach N! states has depth log(N!)

Comparisons min height ≥ Log(N!) ≥ Log(1 * 2 * 3 * … * N) ≥ Log(1) + Log(2) + … +Log(N) ≥ Log(n/2) + … + Log(N) ≥ (n/2)Log(n/2) ≥ (n/2)(Logn – Log2) = (n/2)(Logn – 1) ≥ (nLogn– n)/2 ≥ (nLogn) Just drop first half All n/2 logs are ≥ n/2 Omega– asymptotic lower bound

Efficient Sorts To Date • QuickSort : O(nlogn) – O(n2) • MergeSort : O(nlogn) • TreeSort : O(nlogn) • HeapSort : O(nlogn) Coincidence? • Comparison sorts can never beat nlogn Comparison based

Sorting with Buckets • Ever used a sorting stick?

Bucket Sort Bucket Sort: For each item Pick correct bucket and inert item Make new List For each bucket Add contents to List Return List

Bucket Sort Bucket Sort: For each item Pick correct bucket and inert item Make new List For each bucket Add contents to List Return List O(1) O(1) O(1) Each bucket is linked list O(1)

Bucket Sort Bucket Sort: For each item Pick correct bucket and inert item Make new List For each bucket Add contents to List Return List O(n) – n = num items O(1) O(1) O(k) – k = num buckets O(1) Each bucket is linked list O(1)

Bucket Sort • Bucket Sort : • O(n + k) • Sort granularity limited by buckets • Perfect sort, k = range of values

Bucket Sort • Bucket Sort : • O(n + k) • Sort granularity limited by buckets • Perfect sort, k = range of values • Sort 30,000 integers perfectly • 30,000 + 4,000,000,000 • VS n log n30,000 log 30,000 ≈ 450,000 4 billion + buckets to represent all ints

Bucket Sort • Bucket Sort : • O(n + k) • Sort granularity limited by buckets • Perfect sort, k = range of values • Efficient if k < in relation to n • Sort 4 million people in OR by Zip Code • Bucket • n = 4,000,000 k = 1000 (less than 1000 zips in OR) • Time ~4,001,000 • NLogN • 4,000,000 * log(4,000,000) ~ 4,000,000 * 22

Bucket Sort • Perfect sort with limited buckets: • BucketSort • InsertionSort each bucket • Best Case : O(N + K) • Worst Case : O(N + N2) • One bucket gets everything

Sort • Sorting a real big pile alphabetically

Sort • Sorting a real big pile alphabetically • Sort A-Z • Set aside each pile • Sort the A's by second letter • Then B's, C's… • Then take AA's • Sort by third letter…

Radix Sort • Radix : Base • Radix Sort • Sort digital data • Bucket sort based on each digit successively

Radix Sort • MSD – Most Significant Digit • MSD Radix Sort • Partition list based on first digit

Radix Sort • MSD – Most Significant Digit • MSD Radix Sort • Partition list based on first digit • Recursively sort on nextdigit

MSD Advantages • May not examine all keys • Works on variable lengths: • Little extra space

How does it work? • Radix Exchange • MSD radix sort • Partition like QuickSort,but swap on misplaced digits • Unstable http://www.cse.hut.fi/en/research/SVG/TRAKLA2/exercises/TrueRecursiveRadixExchangeSort-25.html

LSD Radix • LSD – Least Significant Digit • Work from smallest digit to largest • Hard with variable lengths • Stable!

How does it work? • Iterative Radix Sort • Buckets used as counters • Goal is to find starting/ending point of each value

How does it work? • Iterative Radix Sort • Buckets used as counters • Check each item add one to appropriate bucket • Compute cumulative totals from buckets • Place each item in temp array • Use bucket value as index • Decrement counter as we go http://www.cs.usfca.edu/~galles/visualization/RadixSort.html

So it wins? • RadixSort : O(R*N) where R = num digits • Num digits is constant… • O(N)!!

So it wins? • RadixSort : O(R*N) where R = num digits • Num digits isn't constant in general • If M distinct values and base kR = logkM • O(R * N) = O(logkM * N) • Only better then nlogn for specific situations where range of distinct values known

Radix Summary • For specific problems, runs in linear time • Always depends on particulars of data • No general RadixSort algorithm • Right tool for big jobs on specific sets of data

Bucket & Radix Sorts