1 / 34

Chapter 11 B roadcasting with S elective R eduction -BSR-

Chapter 11 B roadcasting with S elective R eduction -BSR-. Serpil Tokdemir GSU, Department of Computer Science. What is Broadcasting with Selective Reduction?. BSR requires asymptotically no more resources than the PRAM for its implementation. an extension of the PRAM It consists;

oki
Download Presentation

Chapter 11 B roadcasting with S elective R eduction -BSR-

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 11Broadcasting with Selective Reduction-BSR- Serpil Tokdemir GSU, Department of Computer Science

  2. What is Broadcasting with Selective Reduction? • BSR requires asymptotically no more resources than the PRAM for its implementation. • an extension of the PRAM • It consists; • N processors • M shared-memory locations • MAU (memory access unit) • Forms of memory access; • ER • EW • CR • CW

  3. … … The BSR Model of Parallel Computation MEMORY LOCATIONS P1 MEMORY ACCESS UNIT (MAU) P2 . . . . . . . . . PN PROCESSORS SHARED MEMORY

  4. Broadcasting with Selective Reduction • During execution of an algorithm; • several processors may read from or write to the same memory location • all processors may gain access to all memory locations at the same time for the purpose of writing, • at each memory location, a subset of the incoming broadcast data is selected and reduced to one value. • according to an appropriate selectionand reductionoperator • this value is finally stored in the memory location, • BSR accommodates; • all forms of memory access allowed by the PRAM + broadcasting with selective reduction.

  5. BSR Continued • the width of the resulting MAU: O(M) • the depth of the resulting MAU: O(logM) • the size of the resulting MAU: O(MlogM) • How Long Does a Step Take in BSR? • Memory access should require a(N, M)=O(logM) • We assume here that a(N, M)=O(1) • Similarly, a computational operation takes constant time; • c(N, M)=O(1)

  6. THE BSR MODEL • Additional form of concurrent access to shared memory • BROADCAST – allows all processors to write all-shared memory locations simultaneously. • 3 phases, • A broadcasting phase, • Each processor Pi broadcasts a datum di and a tag gi, 1<=i<=N, destined to all memory locations. • A selection phase, • Each memory location Uj uses a limit lj, 1<=j<=M, and a selection rule  to test the condition gi lj. •  is selected from the set; • <, <=, =, >=, >, 

  7. The BSR Model (Continued) • A reduction phase, • All data di selected by Uj during the selection phase are combined into one datum that is finally stored in Uj. • Reduction operator – • SUM, • PRODUCT, • AND, OR, • EXCLUSIVE-OR, • MAXIMUM, MINIMUM All three phases are performed simultaneously for all processors Pi and all memory locations Uj.

  8. The three phases of the BROADCAST instruction g1, d1 g1 l1 g1, d1 g2 l1 gN l1 dN g1, d1 gN, dN g1 lM gN, dN g2 lM gN lM dN gN, dN

  9. The BSR Model • If a datum or a tag is not in a processor’s local register, • obtain it from the shared memory by an ER or a CR • The limits, selection rule and reduction operator, are assumed to be known by the memory locations. • If not, they can be stored in memory by ER or CW • Notation for the BROADCAST Instruction: • A • instruction Broadcast of BSR is written as follows: • a

  10. THE BSR MODEL • If no data are accepted by a given memory location, • Value is not affected by BROADCAST instruction • If only one datum is accepted, • Uj is assigned the value of that datum. • Comparing BSR to the PRAM • In BSR, the BROADCAST instruction requires O(1) time. On a PRAM-same # of p’s and U’s- require O(M) time, since • Broadcast is equivalent to M CW instructions • The latter is at least as powerful as the former • The BROADCAST instruction makes BSR strictly more powerful than the PRAM

  11. THE BSR MODEL • A , in nondecreasing order • distinct numbers , in increasing order • It is required to compute, for , the sum si of all those elements of X not equal to . • On the PRAM – O(n) – obviously optimal • The sum S of all the elements of X is first computed, • Y=X is merged with L, sorted by increasing order, • Y is scanned, , , is computed by subtracting from S all the elements of X equal to . • n processors can compute one of the in O(1) time

  12. THE BSR MODEL • BSR using one BROADCAST instruction: • Processor Pi, , broadcasts as the tag and datum pair. • Memory location Uj selects those xi not equal to , • Those xi selected by Uj are added up to obtain , • This requires O(1) time • Does not depend on X and L being sorted

  13. BSR ALGORITHMS • Prefix Sums • Given n numbers , • prefix sums • BSR PREFIX SUMS – n processors and n memory locations • Pibroadcast index as tag and as datum. • Memory location uses its index j as limit. • Relation for selection and as a reduction operator. • holds

  14. BSR Algorithms – Prefix Sums • Algorithm BSR PREFIX SUMS • Consists of one BROADCAST instruction • P(n)=n, t(n)=O(1), and c(n)=p(n)*t(n)=O(n) • optimal for j= 1 to n do in parallel for i= 1 to n do in parallel end for end for.

  15. BSR Algorithms – Prefix Sums • Example: n={1, 2, 3}

  16. BSR Algorithms – Sorting • A , rearrange the elements of X bbbbbbbbbb – in nondecreasing order • Requires n processors and n memory locations • Consists of two steps; • The rank rj of each element xj is computed • xj – Limit • < - Relation • - Reduction operator • Uj holds rj , for • xj is placed in position of the sorted sequence S. • If and are equal,

  17. BSR Algorithms - Sorting • Second step continued • , • to position • to position • to position • The next element with the next higher rank is placed in position of S. • Pi broadcasts the pair (ri, xi) • Uj uses its index j as limit • for selection • as a reduction • When this step terminates; • Uj holds sj – that is, the jth element of the sorted sequence

  18. BSR Algorithms - Sorting • Algorithm BSR SORT • Step 1: for j= 1 to n do in parallel • for i= 1 to n do in parallel • Step 2: for j= 1 to n do in parallel • for i= 1 to n do parallel end for end for end for end for

  19. BSR Algorithms - Sorting • Example: • Processors broadcast the pairs to all memory locations; • (8,1), (5,1), (2,1), (5,1) • Limits are 8, 5, 2, and 5 • Since • 5 < 8, 2 < 5, and 5< 8, r1=3 • Only 2 < 5, so r2=1 • r3=0 • Only 2 < 5, so r4=1

  20. BSR Algorithms - Sorting • Example continued; • Step 2 of the algorithm • Processors broadcast the pairs; • (4,8), (2,5), (1,2), (2,5) • Limits at the memory locations • 1, 2, 3, 4 • This gives the sorted sequence; • {2, 5, 5, 8}

  21. BSR Algorithms - Sorting • Analysis: • BSR SORT • p(n)=n and runs in t(n)=O(1) time, c(n)=O(n) • Uniform analysis • assumed; the time required for memory access, was taken to be O(1). • Discriminating Analysis: • , is taken to be equal to O(logM) – for BSR & PRAM • BSR: N=M=O(n), thus time is O(logn) • Each step is executed once and containing a constant number of computations and memory access, so;

  22. BSR Algorithms - Sorting • - OPTIMAL • PRAM SORT: N=M=O(n), thus time is O(logn) • executes O(logn) computational and memory access steps, therefore, • Cost is NOT optimal

  23. BSR Algorithms – Computing Maximal Points • , • , n points in the plane • , for • A point of S is said to be maximal with respect to S if and only if it is not dominated by any other point of S. • uses n processors and n memory locations • consists of three steps: • auxiliary sequence is created, • mi, associated with point qi, is set initially to equal yi, • The largest y coordinate is found, • mj is assigned the value of that coordinate • Pi broadcasts , xi = tag, yi = datum

  24. BSR Algorithms – Computing Maximal Points • Uj uses as its limit • The relation > for selection • for reduction, to compute mj • If , , it accepts the y-coordinate of every point • assigns the max of these to mj. • A decision is made as to whether qi is a maximal point • If mi was assigned to some point qk • If , then qk dominates qi • , • Else , neither qk nor any other point does not dominate • ,

  25. BSR Algorithms – Computing Maximal Points • Algorithm BSR MAXIMAL POINTS Step 1: for i= 1 to n do in parallel end for Step 2: for j= 1 to n do in parallel for i= 1 to n do in parallel end for end for Step 3: for i= 1 to n do in parallel if then else end if end for.

  26. BSR Algorithms – Computing Maximal Points • Analysis; • Each step – uses n processors & runs in O(1) time • P(n)=n, t(n)=O(1), and c(n)=O(n) • By taking memory access time O(logn), cost becomes O(nlogn) • On the other hand cost for PRAM is O(nlog2n) – not optimal • Example: are three points in the plane

  27. BSR Algorithms – Computing Maximal Points • After step 1 of the algorithm, • m1=y1, m2=y2, m3=y3 • After step 2, • m1=y3, m2=y3, m3=y3 • Since, • m1<y1, m2>y2 and • m3=y3, • both q1 and q3 are maximal

  28. BSR Algorithms – Maximum Sum Sebsequence • , the subsequence has the largest possible sum • among all subsequences of X. • Algorithm BSR MAXIMUM SUM SUBSEQUENCE • Step 1: for j=1 to n do in parallel for i= 1 to n do in parallel end for end for • Step 2

  29. BSR Algorithms – Maximum Sum Subsequence • Step 2: • (2.1) for j= 1 to n do in parallel for i= 1 to n do in parallel end for end for • (2.2) for j= 1 to n do in parallel for i= 1 to n do in parallel end for end for

  30. BSR Algorithms – Maximum Sum Subsequences • Step 3: for i= 1 to n do in parallel end for • Step 4: • (4.1) for i= 1 to n do in parallel • (i) L bi • (ii) if bi=L then u i end if end for • (4.2) MAX ARBITRARY

  31. BSR Algorithms – Maximum Sum Subsequences • Steps of algorithm; • Prefix sums are computed – uses BSR PREFIX SUMS • For each j; • Max prefix sum to he right of sj is found. • Value and index mj, aj • (i, si) = tag and datum • Uj uses j as limit, >= for selection and for reduction. • To compute ai • Pi broadcasts (si, i) as its tag and datum pair, • Uj uses mj as limit, = for selection and for reduction. • For each i, the sum of max sum subsequence is computed • Uses EW instruction

  32. BSR Algorithms – Maximum Sum Subsequences • Steps of algorithm continued • The sum and starting index u of the overall maximum sum subsequence are found. • Requires MAX CW instruction and an ARBITRARY CW instruction, • Analysis: Each step of algorithm runs in O(1) time and uses n processors. Thus; • p(n)=n, • t(n)=O(1) • and c(n)=O(n), • Optimal

  33. BSR Algorithms – Maximum Sum Subsequences • Example: X={-1, 1, 2, -2} • After step 1, prefix sums - sj • -1, 0, 2, 0 • Second broadcast instruction; • mj 2, 2, 2, 0

  34. BSR Algorithms – Maximum Sum Subsequences • Example continued • Third broadcast instruction for computing aj • aj 3, 3, 3, 4 • Step 3 computes each bi • bi 2, 3, 2, -2 • Finally; • L=3 • u=2 • v= a2=3

More Related