  1. CSC 213 – Large Scale Programming Lecture 26:BUCKET SORT & RADIX Sort

  2. Today’s Goals • Review discussion of merge sort and quick sort • How do they work & why divide-and-conquer? • Are they fastest possible sorts? • Another way to sort data presented • How can we sort data with single simple value? • What are limits on using buckets to sort our data? • If we want more buckets, can we expand these limits? • How does radix sort work? How long does it need?

  3. Quick Sort v. Merge Sort Quick Sort Merge Sort • Divide data around pivot • Want pivot to be near middle • All comparisons occur here • Conquer with recursion • Does not need extra space • Merge usually done already • Data already sorted! • Divide data in blindly half • Always gets even split • No comparisons performed! • Conquer with recursion • Needs* to use other arrays • Merge combines solutions • Compares from (sorted) halves

  4. Complexity of Sorting • With n! external nodes, binary tree’s height is: O(n log n)

  5. Bucket-Sort • Buckets, B, is array of Sequence • Sorts Collection, C, in two phases: • Remove each elementv from C & add to B[v] • Move elements from each bucket back to C A B C

  6. Bucket-Sort • Buckets, B, is array of Sequence • Sorts Collection, C, in two phases: • Remove each elementv from C & add to B[v] • Move elements from each bucket back to C

  7. Bucket-Sort Algorithm AlgorithmbucketSort(Sequence<Integer>C)B=new Sequence[10] // & instantiate eachSequence // Phase 1 for each element v in CB[v].addLast(v) // Assumes each number in C between 0 & 9endfor// Phase 2loc = 0for each Sequenceb in Bfor each element v in bC.set(loc,v)loc+= 1endforendfor return C

  8. Bucket Sort Properties • For this to work, values must be legal indices • Non-negative integer indices needed to access arrays • Sorting occurs without comparing objects

  9. Bucket Sort Properties • For this to work, values must be legal indices • Non-negative integer indices needed to access arrays • Sorting occurs without comparing objects

  10. Bucket Sort Properties • For this to work, values must be legal indices • Non-negative integer indices needed to access arrays Sorting occurs without comparing objects

  11. Bucket Sort Properties • For this to work, values must be legal indices • Non-negative integer indices needed to access arrays • Sorting occurs without comparing objects • Stable sort describes any sort of this type • Preserves relative ordering of objects with same value • (Bubble-sort & Merge-sort are other stable sorts)

  12. Bucket Sort Extensions • Use Comparator for Bucket-sort • Get index for vusing compare(v,null) • Comparatorfor booleans could return • 0when vis false • 1 when vis true • Comparator for US states, could return • Annual per capita consumption of Jello • Consumption of jellooverall, in cubic feet • State’s ranking by population

  13. Bucket Sort Extensions • State’s ranking by population

  14. Bucket Sort Extensions • Extended Bucket-sort works with many types • Limited set of data neededfor this to work • Need way to enumeratevalues of the set

  15. Bucket Sort Extensions • Extended Bucket-sort works with many types • Limited set of data neededfor this to work • Need way to enumeratevalues of the set enumerateis subtle hint

  16. d-Tuples • Combination of d values such as (k1, k2, …, kd) • ki is ith dimension of the tuple • A point (x,y,z) is 3-tuple • xis1st dimension’s value • Value of 2nd dimension isy • zis3rd dimension’s value

  17. Lexicographic Order • Assume a&bare both d-tuples • a= (a1,a2, …, ad) • b= (b1,b2, …, bd) • Can say a<bif and only if • a1< b1OR • a1= b1&& (a2, …, ad) < (b2, …, bd) • Order these 2-tuples using previous definition(3 4) (7 8) (3 2) (1 4) (4 8)

  18. Lexicographic Order • Assume a&bare both d-tuples • a= (a1,a2, …, ad) • b= (b1,b2, …, bd) • Can say a<bif and only if • a1< b1OR • a1= b1&& (a2, …, ad) < (b2, …, bd) • Order these 2-tuples using previous definition(3 4) (7 8)(3 2)(1 4)(4 8)(1 4) (3 2)(3 4) (4 8) (7 8)

  19. Radix-Sort • Very fast sort for data expressed as d-tuple • Cheats to win;faster than sorting’s lower bound • Sort performed using d calls to bucket sort • Sorts least to most important dimension of tuple • Luckily lots of data are d-tuples • String is d-tuple of char “L E T T E R S” “L I N G E R S”

  20. Radix-Sort • Very fast sort for data expressed as d-tuple • Cheats to win;faster than sorting’s lower bound • Sort performed using d calls to bucket sort • Sorts least to most important dimension of tuple • Luckily lots of data are d-tuples • Digits of an intcan be used for sorting, also 1 0 0 1 3 7 2 9 1 0 0 9 2 2 1 0

  21. Radix-Sort For Integers • Represent int as a d-tuple of digits:621010 = 1111102041010 = 0001002 • Decimal digits needs 10 buckets to use for sorting • Ordering using their bits needs 2 buckets • O(d∙n) time needed to run Radix-sort • d is length of longest element in input • In most cases value of dis constant (d = 31 for int) • Radix sort takes O(n) time, ignoring constant

  22. Radix-Sort In Action • List of 4-bit integers sorted using Radix-sort 1001 0010 1101 0001 1110

  23. Radix-Sort In Action • List of 4-bit integers sorted using Radix-sort 1001 0010 0010 1110 1101 1001 0001 1101 0001 1110

  24. Radix-Sort In Action • List of 4-bit integers sorted using Radix-sort 1001 0010 1001 0010 1110 1101 1101 1001 0001 0001 1101 0010 0001 1110 1110

  25. Radix-Sort In Action • List of 4-bit integers sorted using Radix-sort 1001 0010 1001 1001 0010 1110 1101 0001 1101 1001 0001 0010 0001 1101 0010 1101 0001 1110 1110 1110

  26. Radix-Sort In Action • List of 4-bit integers sorted using Radix-sort 1001 0010 1001 1001 0001 0010 1110 1101 0001 0010 1101 1001 0001 0010 1001 0001 1101 0010 1101 1101 0001 1110 1110 1110 1110

  27. Radix-Sort AlgorithmradixSort(Sequence<Integer>C) // Works from least to most significant value for bit = 0 to 30C = bucketSort(C, bit) // Sort C using the specified bitendfor return C • What is big-Oh complexity for Radix-Sort? • Call in loop uses each element twice • Loop repeats once per digit to complete sort

  28. Radix-Sort AlgorithmradixSort(Sequence<Integer>C) // Works from least to most significant value for bit = 0 to 30C = bucketSort(C, bit) // Sort C using the specified bitendfor return C • What is big-Oh complexity for Radix-Sort? • Call in loop uses each element twice O(n) • Loop repeats once per digit to complete sort * O(1) O(n)

  29. Radix-Sort AlgorithmradixSort(Sequence<Integer>C) // Works from least to most significant value for bit = 0 to 30C = bucketSort(C, bit) // Sort C using the specified bitendfor return C • What is big-Oh complexity for Radix-Sort? • Call in loop uses each element twice O(n) • Loop repeats once per digitto complete sort * O(1) O(log n) times (?)O(n log n)

  30. For Next Lecture • Start thinking test cases for program #2 • Wed. is next deadline when these must be submitted • Spend time on this: tests & design saves coding • Tuesday deadline for weekly assignment • For Wednesday, review index files, Set & sorts • Quiz will be like others this term with mix of problems

