360 likes | 376 Views
Explore parallel sorting algorithms, such as Merge Sort and Quick Sort, along with issues and techniques like pivot selection. Discover the scalability and efficiency of Hypercube Quicksort in parallel computing. Learn about Bitonic Sequences and Sorting Networks.
E N D
Sorting • One of the most common operations • Definition: • Arrange an unordered collection of elements into a monotonically increasing or decreasing order. • Two categories of sorting • internal (fits in memory) • external (uses auxiliary storage)
Sorting Algorithms • Comparison based • compare-exchange • O(n log n) • Noncomparison based • Uses known properties of the elements • O(n) - bucket sort etc.
Parallel Sorting Issues • Input and Output sequence storage • Where? • Local to one processor or distributed • Comparisons • How compare elements on different nodes • # of elements per processor • One (compare-exchange --> comm.) • Multiple (compare-split --> comm.)
Parallel Sorting Algorithms • Merge Sort • Quick Sort • Bitonic Sort • Others …
Merge Sort • Simplest parallel sorting algorithm? • Steps • Distribute the elements • Everybody sort their own sequence • Merge the lists • Problem • How to merge the lists
Quicksort • Simple, low overhead • O(n log n) • Divide and conquer • Divide recursively into smaller subsequences.
Quicksort • n elements stored in A[1…n] • Divide • Divide a sequence into two parts • A[q…r] becomes A[q…s] and A[s+1…r] • make all elements of A[q…s] smaller than or equal to all elements of A[s+1…r] • Conquer • Recursively apply Quicksort
Quicksort • Partition the sequence A[q…r] by picking a pivot. • Performance is greatly affected by the choice of the pivot. • If we pick a bad pivot, we end up with a O(n2) algorithm.
Parallelizing Quicksort • Task parallelism • At each step of the algorithm 2 recursive calls are made. • Farm out one of the recursive calls to another processor. • Problems • The work of partitioning is done by one processor.
Parallelizing Quicksort • Consider domain decomposition. • Hypercube • a d dimensional hypercube can be split into two (d-1) dimensional hypercubes such that each processor in one cube is connected to one in the other cube. • If all processors know the pivot, neighbors split their respective lists and all elements larger than the pivot are distributed to one subcube and smaller elements are distributed to the other subcube
Parallelizing Quicksort • After we go through each dimension, if n>p the numbers are not totally sorted. • Why? • Each processor then sorts their own sublist using a sequential quicksort. • Pivot selection is particularly important • Bad pivots eliminate some processors
Pivot Selection • Random selection • During the ith split one of the processors in each subcube picks a random element from its list and broadcasts to others. • Problem • What if a bad pivot is selected at first?
Pivot Selection • Median selection • If the distribution is uniform then each processor's list is a representative sample thus the median is representative • Problem • Is the distribution really uniform? • Can we assume that a single processor's list has the same distribution as the full list?
Procedure HypercubeQuickSort(B) sort B using sequential quicksort for I = 1 to d Select pivot and broadcast or receive pivot partition B into B1 and B2 such that B1<= pivot < B2 if ith bit of iproc is zero then send B2 to neighbor along ith dimension C = subsequence received along ith dimension Merge B1 and C into B else send B1 to neighbor along C = subsequence received along ith dimension Merge B2 and C into B endif endfor
Analysis • Iterations = log2p • Select a pivot = O(n) • keep sublist sorted • Broadcast pivot = O(log2p) • Split the sequence • split own sequence = O(log n/p) • exchange blocks with neighbor = O(n/p) • merge blocks = O(n/p)
Hypercube Quicksort Model • Execution Time = MyPortionSortTime + NumSteps * (PivotSelection + Exchange + CompareData) • Execution Time = n/p * log2(n/p) * CompareTime + log2(p) * ((latency + 1/bandwidth) + 2*(latency + n/(p*bandwidth) + (CompareTime * 2*n/p)
Analysis • Quicksort appears very scalable • Depends heavily on the pivot • Easy to parallelize • Hypercube sorting algorithms depend on the ability to map a hypercube onto the node communication architecture.
Sorting Networks • Specialized hardware for sorting • based on comparator x y x y max{x,y} min{x,y} min{x,y} max{x,y}
Bitonic Sort • Key operation: • rearrange a bitonic sequence to ordered • Bitonic Sequence • sequence of elements <a0, a1, … , an-1> • There exists i such that <a0, … ,ai> is monotonically increasing and <ai+1,… , an-1> is monotonically decreasing or • There exists a cyclic shift of indices such that the above is satisfied.
Bitonic Sequences • <1, 2, 4, 7, 6, 0> • First it increases then decreases • i = 3 • <8, 9, 2, 1, 0, 4> • Consider a cyclic shift • i will equal 2 or 3
Rearranging a Bitonic Sequence • Let s = <a0, a1, … , an-1> • an/2 is the beginning of the decreasing seq. • Let s1= <min{a0, an/2}, min{a1, an/2 +1}…min{an/2-1,an-1}> • Let s2=<max{a0, an/2}, max{a1,an/2+1}… max{an/2-1,an-1} > • In sequence s1 there is an element bi = min{ai, an/2+i} • all elements before bi are from increasing • all elements after bi are from decreasing • Sequence s2 has a similar point • Sequences s1 and s2 are bitonic
Rearranging a Bitonic Sequence • Every element of s1 is smaller than every element of s2 • Thus, we have reduced the problem of rearranging a bitonic sequence of size n to rearranging two bitonic sequences of size n/2 then concatenating the sequences.
What about unordered lists? • To use the bitonic merge for n items, we must first have a bitonic sequence of n items. • Two elements form a bitonic sequence • Any unsorted sequence is a concatenation of bitonic sequences of size 2 • Merge those into larger bitonic sequences until we end up with a bitonic sequence of size n
Wires 10 10 5 3 0000 20 20 9 5 0001 5 9 10 8 0010 9 5 20 9 0011 3 3 14 10 0100 8 8 12 12 0101 12 14 8 14 0110 14 12 3 20 0111 90 0 0 95 1000 0 90 40 90 1001 60 60 60 60 1010 40 40 90 40 1011 23 23 95 35 1100 35 35 35 23 1101 95 95 23 18 1110 18 18 18 0 1111 Creating a Bitonic Sequence
Mapping onto a hypercube • One element per processor • Start with the sorting network maps • Each wire represents a processor • Map processors to wires to minimize the distance traveled during exchange
Bitonic Sort Procedure BitonicSort for i = 0 to d -1 for j = i downto 0 if (i + 1)st bit of iproc <> jth bit of iproc comp_exchange_max(j, item) else comp_exchange_min(j, item) endif endfor endfor comp_exchange_max and comp_exchange_min compare and exchange the item with the neighbor on the jth dimension
Assignment • Pick 16 random integers • Draw the Bitonic Sort network • Step through the Bitonic sort network to produce a sorted list of integers. • Explain how the if statement in the Bitonic sort algorithm works.