1 / 31

Parallel Sorting: An Analysis

Parallel Sorting: An Analysis. Madison Solarana & Kevin Zheng. Outline. Sorting: Introspective Sort Odd-Even Transposition Sort Shear Sort Rank/Enumeration Sort Merge Sort Hyperquicksort Bitonic Sort Radix Sort Sample Sort Reconfigurable Architecture. Introspective Sort.

alyn
Download Presentation

Parallel Sorting: An Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Sorting: An Analysis Madison Solarana & Kevin Zheng

  2. Outline • Sorting: • Introspective Sort • Odd-Even Transposition Sort • Shear Sort • Rank/Enumeration Sort • Merge Sort • Hyperquicksort • BitonicSort • Radix Sort • Sample Sort • Reconfigurable Architecture

  3. Introspective Sort Quicksort + Heapsort = Introsort Switches from Quicksort to Heapsort based on recursive depth. C++ Implementation in <algorithm> - std::sort(firstElement, lastElement) Worst Case: Average Case: Not parallel, but replaces C’s qsort for sequential sorting

  4. Odd-Even Transposition Sort pOddEven(n) id := rank for i := 1 to n do if i is odd then if id is odd then compare-exchange-min(id+ 1); else compare-exchange-max(id– 1); if i is even then if id is even then compare-exchange-min(id + 1); else compare-exchange-max(id – 1); end for end pOddEven

  5. Odd-Even Transposition Sort Compares all (odd, even)-pairs of adjacent elements in a list and, if a pair is in the wrong order, the elements are switched. Repeat this for (even, odd)-pairs , then alternate between (odd, even) and (even, odd) steps until the list is sorted. N Iterations with one compare-exchange operation per iteration. if and only if if due to costs of merge-splits and exchanges.

  6. Shear Sort A sorting algorithm used specifically for mesh architecture where P = n2 Sorting a row of length n with odd-even transposition takes n steps (P = n) and there are log(n) iterations, so Shear Sort takes O(n log(n)). Speedup = Tseq / Tpar = O(n2log(n)) / O(n log(n)) = O(n) Efficiency = 1/n

  7. Shear Sort

  8. Shear Sort ShearSort(n) for i := 1 to log(n) do if i is odd then if id is odd then call Odd_Even_Row_Sort(n) else call Odd_Even_Column_Sort(n) end ShearSort

  9. Rank/Enumeration Sort forall (i=0; i<n; ++i) { numRank=0; for (j=0; j<n; ++j) { if (rawNums[i] > rawNums[j]) { ++numRank; } } sortedNums[numRank] = rawNums[i]; }

  10. Rank/Enumeration Sort Counts the number of numbers that are smaller than each number. This determines its rank (order). if if if and if concurrent memory writes are allowed.

  11. Merge Sort pMergesort(myid, d, data, newdata) data = mergesort(data) for dim = 1 to d data = pMerge(myid, dim, data) endfor newdata = data end

  12. Merge Sort Tseq = O(n log(n)) Tpar = O(4n) = O(n) if P = N S = O(log(n)) E = O(log(n)/ n)

  13. Hyperquicksort • Distribute the data evenly to all nodes. • Each node sorts the data it has using sequential quicksort. • Node 0 broadcasts its median key K to the rest of the cube. • Each node separates its data into two groups: keys <= K and keys > K • Break up the cube into two subcubes: the lower subcube ( node 0 through ) and the upper subcube ( nodes through). Each node in the lower subcube sends its items whose keys are > K to its adjacent node in the upper subcube. Each node in the upper subcube sends its items whose keys are <= K to its adjacent node in the lower subcube. Now,all data whose keys are <= K are in the lower subcube while all those whose keys are > K are in the upper subcube. • Each node merges together the group it just received with the one it kept so that its data is sorted. • Repeat steps 3 through 6 on each of the two subcubes. This time node 0 will correspond to the lowest-number node in the subcube, and the value of d will be one less. • Repeat steps 3 through 7 until the subcubes only contain a single node.

  14. Hyperquicksort

  15. Hyperquicksort Sequential Quicksort: Simple Parallel Quicksort: , where Hyperquicksort: , where is the broadcast cost and is the merging cost.

  16. Bitonic Sort IF master processor Retrieve data to sort Scatter it among all processors ELSE Receive portion to sort Sort local data using std::sort FOR( level = 1; level <= lg(P) ; level++ ) FOR ( j = 0; j<level; j++ ) partner = rank ^ (1<<(level-j-1)); Exchange data with partner IF((rank<partner) == ((rank & (1<<level)) ==0)) extract low values from local and received data (mergeLow) ELSE extract high values from local and received data (mergeHigh) Collect Sorted Data

  17. Bitonic Sort if if

  18. Radix Sort Uses Bucket Sort and is similar to Histogram Sort Array is partitioned into bucket and then each bucket is sorted individually. LSD & MSD Radix Sort

  19. Radix Sort Parallelize counting sort: each processor gets N/p elements from p partitions. All processors work to compute the global prefix sum Then each processor copies its assigned values to the shared output array Tseq= O(kN)

  20. Sample Sort n/p elements per-processor. Each processor sorts its local elements. (std::sort or bitonic sort) Each processor selects p-1 equally spaced elements from its own list. The combined p(p-1) set of elements are sorted and p-1 equally spaced elements are selected from that list. Each processor splits its own list according to these splitters into p buckets. Each processor sends its ithbucket to the ithprocessor. Each processor merges the elements that it receives.

  21. Sample Sort

  22. Reconfigurable Architecture • RMESH – Reconfigurable mesh with buses • PARBUS • MRN • Polymorphic torus

  23. Reconfigurable Architecture

  24. Reconfigurable Architecture

  25. Benefits of Reconfigurable Architecture In any given time unit, one of the PE in this collection can choose to broadcast a message which is assumed to be readable in the same time unit by all the other PE in this collection.

  26. Column Sort on a Rmesh Step 0: [Input] Q is available in column major order, one element per PE, in row 1 of the RMESH Step 1: [Sort Transpose] Obtain the Q matrix by sorting column-wise then conducting a transpose. Step 2: [Sort Untranspose] Obtain the Q matrix by sorting column-wise then untransposing the matrix. Step 3: [Sort Shift] Obtain the Q matrix by sorting column-wise then shifting the matrix. Step 4: [Sort Unshift] Obtain the result after sorting column-wise then unshifting the matrix.

  27. Column Sort on a Rmesh

  28. Column Row Transposition

  29. Column Sort on a Rmesh • O(1) • Column Sort • Total number of broadcasts is 139. • Rotate Sort • Total number of broadcasts is 120. • Sort n elements in O(1) using n2 processors. • In general, n = Nknumbers can be sorted in O(1) time using Nk+1 = n1+1/k processors in a k+1 dimensional configuration.

  30. Questions?

  31. Main Sources M. Nigam and S. Sahni, “Sorting n Number On n x n Reconfigurable Meshes With Buses” J. Jang and V. Prasanna, “An optimal sorting algorithm on reconfigurable meshes” R. Lin, S. Olariu, J. L. Schwing, J. Zhang “Sorting in O(1) time on an n xn reconfigurable mesh”

More Related