1 / 19

Parallel Algorithms (chap. 30, 1 st edition)

Parallel Algorithms (chap. 30, 1 st edition). Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. Shared memory. p 0. Multiple processors connected to a shared memory. Each processor access any location in unit time.

marge
Download Presentation

Parallel Algorithms (chap. 30, 1 st edition)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Algorithms (chap. 30, 1st edition) • Parallel: perform more than one operation at a time. • PRAM model: Parallel Random Access Model. Shared memory p0 Multiple processors connected to a shared memory. Each processor access any location in unit time. All processors can access memory in parallel. All processors can perform operations in parallel. p1 pn-1

  2. Concurrent vs. Exclusive Access • Four models • EREW: exclusive read and exclusive write • CREW: concurrent read and exclusive write • ERCW: exclusive read and concurrent write • CRCW: concurrent read and concurrent write • Handling write conflicts • Common-write model: only if they write the same value. • Arbitrary-write model: an arbitrary one succeeds. • Priority-write model: the one with smallest index succeeds. • EREW and CRCW are most popular.

  3. Synchronization and Control • Synchronization: • A most important and complicated issue • Suppose all processors are inherently tightly synchronized: • All processors execute the same statements at the same time • No race among processors, i.e, same pace. • Termination control of a parallel loop: • Depend on the state of all processors • Can be tested in O(1) time.

  4. Pointer Jumping –list ranking • Given a single linked list L with n objects, compute, for each object in L, its distance from the end of the list. • Formally: suppose next is the pointer field • d[i]= 0 if next[i]=nil • d[next[i]]+1 if next[i]nil • Serial algorithm: (n).

  5. List ranking –EREW algorithm • LIST-RANK(L) (in O(lg n) time) • for each processor i, in parallel • doifnext[i]=nil • thend[i]0 • elsed[i]1 • while there exists an object i such that next[i]nil • dofor each processor i, in parallel • do ifnext[i]nil • then d[i] d[i]+ d[next[i]] • next[i] next[next[i]]

  6. 3 4 6 1 0 5 (a) 1 1 1 1 1 0 4 4 3 2 1 0 5 4 3 2 1 0 List-ranking –EREW algorithm 3 4 6 1 0 5 (b) 2 2 2 2 1 0 3 4 6 1 0 5 (c) 3 4 6 1 0 5 (d)

  7. List ranking –correctness of EREW algorithm • Loop invariant: for each i, the sum of d values in the sublist headed by i is the correct distance from i to the end of the original list L. • Parallel memory must be synchronized: the reads on the right must occur before the wirtes on the left. Moreover, read d[i] and then read d[next[i]]. • An EREW algorithm: every read and write is exclusive. For an object i, its processor reads d[i], and then its precedent processor reads its d[i]. Writes are all in distinct locations.

  8. LIST ranking EREW algorithm running time • O(lg n): • The initialization for loop runs in O(1). • Each iteration of while loop runs in O(1). • There are exactly lg n iterations: • Each iteration transforms each list into two interleaved lists: one consisting of objects in even positions, and the other odd positions. Thus, each iteration double the number of lists but halves their lengths. • The termination test in line 5 runs in O(1). • Define work =#processors running time. O(n lg n).

  9. Parallel prefix on a list • A prefix computation is defined as: • Input: <x1, x2, …, xn> • Binary associative operation  • Output:<y1, y2, …, yn> • Such that: • y1= x1 • yk= yk-1 xk fork=2,3, …,n, i.e, yk=  x1  x2 … xk . • Suppose <x1, x2, …, xn> are stored orderly in a list. • Define notation: [i,j]= xi xi+1 … xj

  10. Prefix computation • LIST-PREFIX(L) • for each processor i, in parallel • doy[i] x[i] • while there exists an object i such that next[i]nil • dofor each processor i, in parallel • do ifnext[i]nil • then y[next[i]] y[i] y[next[i]] • next[i] next[next[i]]

  11. x5 x1 x2 x4 x6 x3 (a) [1,1] [2,2] [4,4] [5,5] [3,3] [6,6] x1 x1 x1 x6 x6 x6 x2 x2 x2 x5 x5 x5 x3 x3 x3 [1,1] [1,2] [1,3] [1,4] [1,5] [1,6] Prefix computation –EREW algorithm x4 (b) [1,1] [1,2] [2,3] [3,4] [4,5] [5,6] (c) [1,1] [1,2] [1,3] [1,4] [2,5] [3,6] (d)

  12. Find root –CREW algorithm • Suppose a forest of binary trees, each node i has a pointer parent[i]. • Find the identity of the tree of each node. • Assume that each node is associated a processor. • Assume that each node i has a field root[i].

  13. Find-roots –CREW algorithm • FIND-ROOTS(F) • for each processor i, in parallel • doifparent[i] = nil • thenroot[i]i • while there exist a node i such that parent[i] nil • dofor each processor i, in parallel • do if parent[i] nil • then root[i] root[parent[i]] • parent[i]  parent[parent[i]]

  14. Find root –CREW algorithm • Running time: O(lg d), where d is the height of maximum-depth tree in the forest. • All the writes are exclusive • But the read in line 7 is concurrent, since several nodes may have same node as parent. • See figure 30.5.

  15. Find roots –CREW vs. EREW (lg n) • How fast can n nodes in a forest determine their roots using only exclusive read? Argument: when exclusive read, a given peace of information can only be copied to one other memory location in each step, thus the number of locations containing a given piece of information at most doubles at each step. Looking at a forest with one tree of n nodes, the root identity is stored in one place initially. After the first step, it is stored in at most two places; after the second step, it is Stored in at most four places, …, so need lg n steps for it to be stored at n places. So CREW: O(lg d) and EREW: (lg n). If d=2(lg n), CREW outperforms any EREW algorithm. If d=(lg n), then CREW runs in O(lg lg n), and EREW is much slower.

  16. A[j] 5 6 9 2 9 m 5 F T T F T F 6 F F T F T F 9 F F F F F T 2 T T T F T F 9 F F F F F T A[i] max=9 Find maximum – CRCW algorithm • Given n elements A[0,n-1], find the maximum. • Suppose n2 processors, each processor (i,j) compare A[i] and A[j], for 0 i, j n-1. • FAST-MAX(A) • nlength[A] • fori 0 ton-1, in parallel • dom[i] true • fori 0 ton-1 and j 0 ton-1, in parallel • do ifA[i] < A[j] • thenm[i] false • fori 0 ton-1, in parallel • doifm[i] =true • thenmax  A[i] • returnmax The running time is O(1). Note: there may be multiple maximum values, so their processors Will write to max concurrently. Its work = n2 O(1) =O(n2).

  17. Find maximum –CRCW vs. EREW • If find maximum using EREW, then (lg n). • Argument: consider how many elements “think” that they might be the maximum. • First, n, • After first step, n/2, • After second step n/4. …, each step, halve. • Moreover, CREW takes (lg n).

  18. Stimulating CRCW with EREW • Theorem: • A p-processor CRCW algorithm can be no more than O(lg p) times faster than a best p-processor EREW algorithm for the same problem. • Proof: each step of CRCW can be simulated by O(lg p) computations of EREW. • Suppose concurrent write: • CRCW pi write data xi to location li, (li may be same for multiple pi ‘s). • Corresponding EREW pi write (li, xi) to a location A[i], (different A[i]’s) so exclusive write. • Sort all (li, xi)’s by li’s, same locations are brought together. in O(lg p). • Each EREW pi compares A[i]= (lj, xj), and A[i-1]= (lk, xk). If ljlkor i=0, then EREW pi writes xj to lj. (exclusive write). • See figure 30.7.

  19. CRCW vs. EREW • CRCW: • Some says: easier to program and more faster. • Others say: The hardware to CRCW is slower than EREW. And One can not find maximum in O(1). • Still others say: either EREW or CRCW is wrong. Processors must be connected by a network, and only be able to communicate with other via the network, so network should be part of the model.

More Related