220 likes | 250 Views
HYPERCUBE ALGORITHMS-2. Merging. Odd Even Merge:: Step 0: Store X1 in first m rows of butterfly and X2 on next m rows. Step 1: Partition X1 into its odd and even parts, O1 and E1 and X2 to O2 and E2. This can be done in one step on Bd. Step 2: Interchange O2 with E1
E N D
HYPERCUBE ALGORITHMS-2 Computer Engg, IIT(BHU)
Merging Odd Even Merge:: Step 0: Store X1 in first m rows of butterfly and X2 on next m rows. Step 1: Partition X1 into its odd and even parts, O1 and E1 and X2 to O2 and E2. This can be done in one step on Bd. Step 2: Interchange O2 with E1 Step 3: Recursively merge O1 with O2 to get O and E1 and E2 to get E. To do this, route the keys in first m rows using direct link and other keys using cross links
Merging • Now we have E1 and O2 in even sub-butterfly and O1 and E2 in odd sub-butterfly. • Step 4: Shuffle O with E. Compare adjacent elements and interchange if needed. • To do this, each processor at level d-1 sends its results along cross links as well as the direct link. When processor i receives two data from above, it keeps its minimum of the two if i is even otherwise it keeps the maximum. • Odd-even Merging takes O(d) time on Bd. On Hd also, it can be done in O(d) time.
Sorting • Odd Even Merge Sort • Step 1: Partition the sequence into two sub-sequences: X1'=k0, k1, k2... kn/2 and X2'= kn/2+1, kn/2+2... kn. X1 is in first half rows of butterfly network and X2 is in remaining rows. • Step 2: Sort the two subsequences recursively. Sort first part using left sub-butterfly and second using right. • Step 3: Merge two sorted subsequnce using Odd-even Merge sort.
Sorting • Odd even Merge sort takes O(d^2) time on Bd, where p=2^d. • Another type of sorting done using Hypercube is Bitonic Sort.
Sorting: Bitonic Sort • A bitonic sorting network sorts n elements in Θ(log2n) time. • A bitonic sequence has two tones - increasing and decreasing, or vice versa. Any cyclic rotation of such networks is also considered bitonic. • 1,2,4,7,6,0 is a bitonic sequence, because it first increases and then decreases. 8,9,2,1,0,4 is another bitonic sequence, because it is a cyclic shift of 0,4,8,9,2,1. • The kernel of the network is the rearrangement of a bitonic sequence into a sorted sequence.
Sorting: Bitonic Sort • Let s = a0,a1,…,an-1 be a bitonic sequence such that a0 ≤ a1 ≤ ··· ≤ an/2-1 and an/2 ≥an/2+1 ≥ ··· ≥ an-1. • Consider the following subsequences of s: • s1 = min{a0,an/2},min{a1,an/2+1},…,min{an/2-1,an-1} • s2 = max{a0,an/2},max{a1,an/2+1},…,max{an/2-1,an-1} • (1) • Note that s1 and s2 are both bitonic and each element of s1 is less than every element in s2. • We can apply the procedure recursively on s1 and s2 to get the sorted sequence.
Sorting: Bitonic Sort • Merging a 16-element bitonic sequence through a series of log 16 bitonic splits.
Sorting: Bitonic Sort • How do we sort an unsorted sequence using a bitonic merge? • We must first build a single bitonic sequence from the given sequence. • A sequence of length 2 is a bitonic sequence. • A bitonic sequence of length 4 can be built by sorting the first two elements using BM[2] and next two, using ӨBM[2]. • This process can be repeated to generate larger bitonic sequences.
Sorting: Bitonic Sort • The comparator network that transforms an input sequence of 16 unordered numbers into a bitonic sequence.
Mapping Bitonic Sort to Hypercubes • Consider the case of one item per processor. The question becomes one of how the wires in the bitonic network should be mapped to the hypercube interconnect. • Note from our earlier examples that the compare-exchange operation is performed between two wires only if their labels differ in exactly one bit! • This implies a direct mapping of wires to processors. All communication is nearest neighbor!
Mapping Bitonic Sort to Hypercubes • Communication during the last stage of bitonic sort. Each wire is mapped to a hypercube process; each connection represents a compare-exchange between processes.
Mapping Bitonic Sort to Hypercubes • During each step of the algorithm, every process performs a compare-exchange operation (single nearest neighbor communication of one word). • Since each step takes Θ(1) time, the parallel time is • Tp = Θ(log2n) (2) • This algorithm is cost optimal w.r.t. its serial counterpart, but not w.r.t. the best sorting algorithm.
Graph Algorithms • For graph algorithms, same model as that of mesh, is used. • G(V,E) be directed graph • N vertices • M(i,j) = 0 if i=j or directed edge between i & j • m(i,i) = 0 for every i • m(i,j) = • min {M(i[0],i[1])+M(i[1],i[2])….+M(i[k-1][k] for every i!=j • where i[0]=i,i[k]=j and min is taken for sequence of vertices
All pair shortest paths For every pair of vertices vi and vj in V, it is required to find the length of the shortest path from vi to vj along edges in E. Specifically, a matrix D is to be constructed such that dij is the length of the shortest path from vi to vj in G, for all i and j. Length of a path (or cycle) is the sum of the lengths (weights) of the edges forming it.
All pair shortest path • Begin with a hypercube of n3 processors • Each has registers A, B, and C • Arrange them in an nnn array (cube) • Set A(0, j, k) = wjk for 0 ≤ j, k ≤ n – 1 • i.e processors in positions (0, j, k) contain D1 = W • When done, C(0, j, k) contains APSP = Dm
All pair shortest path Algorithm HYPERCUBE SHORTEST PATH (A,C) Step 1: forj = 0 ton - 1 dopar fork = 0 ton - 1 dopar B(0, j, k) = A(0, j, k) end for end for Step 2: fori = 1 todo (2.1) HYPERCUBE MATRIX MULTIPLICATION(A,B,C) (2.2) forj = 0 ton - 1 dopar for k = 0 ton - 1 dopar (i) A(0, j, k) = C(0, j, k) (ii) B(0, j, k) = C(0, j, k) end for end for end for
All pair shortest path • Steps 1 and (2.2) require constant time • There are iterations of Step (2.1) • Each requires O(log n) time • The overall running time is t(n) = O(log2 n) • p(n) = n3 • Cost is c(n) = p(n) t(n) = O(n3 log2 n)