1 / 40

Adaptive Sorting

Adaptive Sorting. “A Dynamically Tuned Sorting Library” “Optimizing Sorting with Genetic Algorithms” By Xiaoming Li, Maria Jesus Garzaran, and David Padua Presented by Anton Morozov. Motivations and Observations. Success of ATLAS, FFTW and SPIRAL (signal processing libraries).

hashim
Download Presentation

Adaptive Sorting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Sorting “A Dynamically Tuned Sorting Library” “Optimizing Sorting with Genetic Algorithms” By Xiaoming Li, Maria Jesus Garzaran, and David Padua Presented by Anton Morozov

  2. Motivations and Observations • Success of ATLAS, FFTW and SPIRAL (signal processing libraries) What Can be done for Sorting?

  3. Why are we interested in the sorting algorithms? Does this reflects the performance of the sorting algorithms?

  4. Which additional factors influence the performance of the sorting algorithm?

  5. Performance vs. Standard Deviation

  6. Observation Quicksort and Merge sort are both comparison based sorts, thus they are independent of the chosen distribution or standard deviation Performance depends on degree of sortedness i.e. the number of inversions Max n(n-1)/2

  7. Architectural Model and Empirical Search • We saw how programs like BLAS and ATLAS use search to establish the parameters of the underlying architecture

  8. So what Sort Algorithm is better? • What performance of the sorting algorithm depends on? • How to choose the best sorting algorithm?

  9. Sorting algorithms • QuickSort • Radix Sort • Merge Sort • Insertion Sort • Sorting Networks • Heap Sort

  10. Sorting algorithms • QuickSort • Radix Sort • Merge Sort • Insertion Sort • Sorting Networks • Heap Sort

  11. Sorting algorithms • QuickSort • Radix Sort • Cache-Conscious Radix sort • Merge Sort • Multiway Merge Sort • Insertion Sort • Sorting Networks • Heap Sort } Register sorts

  12. Quick Sort • Description: Pick a pivot, move records around the pivot, records which are smaller than pivot go to the front, bigger go to the back, and pivot inserted between them. • Improvements: • Move iteratively • Choose pivot among the first, middle and last keys • Use fast sorts for the small partitioning. (insertion or sorting networks)

  13. Cache-Conscious Radix Sort Having b-bit integer and a radix of size 2r, algorithm first sorts by lower r bits then sorts by next r bits total in b/r phases, where r is chosen to be r≤ log2STBL-1 where STBL number of entries in translation look-aside buffer. • Improvements: • Proceed iteratively, • Compute the histogram of the each r bits first time the sort is applied, • Choose r as described above

  14. Multiway merge sort. It partitions the keys into p subsets, each subset is then sorted in (in this case with CC-radix sort) and then subsets are merged using heap. First smallest/largest element of the subset is promoted to the leaves of the heap then leaves are compared and an appropriate leaf is promoted. • Heap contains 2*p-1 leaves. • Each parent in a heap has A/r children, A cache line, r size of a node.

  15. Insertion Sort. Used for the small data sizes Algorithm working from left to right for each key scans to the left of the key and places it in the appropriate place Sorting Networks Algorithms compares two inputs in sequence and if one is bigger then the other it swaps them.

  16. Input Data Factors • Number of keys • Distribution • Standard deviation • … Approximate S.D. with Entropy vector ∑i -Pi*log2Pi where Pi =ci /N, ci is a number of keys with value i in that digit

  17. Learning procedure  : (N,E) → {CC-radix, Multiway Merge(N,E), Quicksort} Winnow algorithm: ∑i wi *Ei> Θ Computes weights vector and threshold depending on the Entropy vector

  18. Selection at run time Sample the input array (every fourth entry) Compute the entropy vector Compute S = ∑i wi * entropyi If S ≥Ө choose CC-radix else choose others based on size of input (either Merge Sort or QuickSort)

  19. Summarize Architectural Factors Cache / TLB size Number of Registers Cache Line Size Empirical Search Runtime Factors Distribution shape of the data Amount of data to Sort Distribution Width Any, since it doesn’t matter Learn at installation time

  20. Performance Results

  21. Performance Results

  22. Is it possible to do better?

  23. Sorting Primitives To build a new sorting algorithms: sorting and selection primitives • Sorting primitive: Is a pure sorting algorithm looked before • Selection primitive: Is a process to be executed at run time to decide which sorting algorithm to apply

  24. Sorting Primitives • Divide-by-Value: corresponds to the first phase of Quicksort takes the number of pivots as a parameter (np+1) • - A step in Quicksort • Select one or multiple pivots and sort the input array around these pivots • Divide-by-Position: corresponds to initial break of Merg Sort • takes size of each partition and fan-out of the heap • - Divide input into same-size sub-partitions • - Use heap to merge the multiple sorted sub-partitions

  25. Sorting Primitives • Divide-by-Radix: corresponds to the step in the radix sort algorithm. Takes a radix as a parameter. • Parameter: radix (r bits) • Step 1: Scan the input to get distribution array, which records how many elements in each of the 2r sub-partitions. • Step 2: Compute the accumulative distribution array, which is used as the indexes when copying the input to the destination array. • Step 3: Copy the input to the 2r sub-partitions. src. counter accum. dest. 0 1 2 3 0 1 2 3 11 23 30 12 1 1 1 1 0 1 2 3 1 2 3 4 30 11 12 23

  26. Sorting Primitives • Divide-by-radix-assuming-Uniform-distribution: same as above. Assumes that each bucket contains n/2r keys • - Step 1 and Step 2 in DR are expensive. • - If the input elements are distributed among 2r sub-partitions near evenly, the input can be copied into the destination array directly assuming every partition have the same number of elements. • - Overhead: partition overflow

  27. Sorting Primitives • Once the partition is small: • Leaf-Divide-by-Value: same as DV but applies recursively to the partitions. < Threshold applies register sorting • Leaf-Divide-by-Radix: same as DR but is used on all remaining subsets. < threshold applies register sorting

  28. Selection Primitives • Branch-by-Size: used to select different paths based on size • Branch-by-Entropy: uses entropy to branch on different path. • Uses Winnow for learning the weight vector

  29. Genetic Algorithm • Crossover: • Propagate good sub-trees • Mutation: • Mutate the structure of the algorithm. • Change the parameter values of primitives.

  30. Genetic Algorithm • Fitness function: • Average performance by S.D. • Uses Rank instead of fitness.

  31. Performance Results

  32. Performance Results

  33. Is it possible to do better? Empirically was observed that Branch-by-Entropy selection primitive was never used

  34. Classifier Sorting Based on the idea that the performance of the algorithm in one region of input space can be independent of the other. i is an input characteristic string, c is a condition string with “1”, “0” and “*” for don’t care.

  35. Example: Encode number of keys into 4 bits. 0000: 0~1M, 0001: 1~2M… Number of keys = 10.5M. Encoded into “1100” 1100 01** 1100 1010 1100 110* (dv 2 ( lr 6 16))

  36. Experimental Results

  37. Experimental Results

  38. Experimental Results

  39. Summary and Future work • The work presented shows how sorting can be adapted to underlying platforms • Potential future work: • Figure out what went wrong or not wrong with those graphs • Incorporate the notion of “sortedness” into sort selection • Simplify the selection algorithm • See if these notions can be used in the cache oblivious way

More Related