330 likes | 345 Views
A Taste of Parallel Algorithms. We examine five simple building-block parallel operations and look at the corresponding algorithms on four simple parallel architectures: linear array, binary tree, 2D mesh, and a simple sharedvariable computer. Semigroup Computation. Parallel Prefix Computation.
E N D
We examine five simple building-block parallel operations and look at the corresponding algorithms on four simple parallel architectures: linear array, binary tree, 2D mesh, and a simple sharedvariable computer. Part I
Semigroup Computation Part I
Parallel Prefix Computation Part I
Packet Routing • A packet of information resides at Processor i and must be sent to Processor j. The problem is to route the packet through intermediate processors, if needed, such that it gets to the destination as quickly as possible. • The problem becomes more challenging when multiple packets reside at different processors, each with its own destination. • When each processor has at most one packet to send and one packet to receive, the packet routing problem is called one-to-one communication or 1-1 routing. Part I
Broadcasting • Given a value a known at a certain processor i, disseminate it to all p processors as quickly as possible, so that at the end, every processor has access to, or "knows," the value. This is sometimes referred to as one-to-all communication. • one-to-many communication, is known as multicasting. Part I
Sorting • Rather than sorting a set of records, each with a key and data elements, we focus on sorting a set of keys for simplicity. Part I
Linear Array • D=p-1 • d=2 • Ring? Part I
Binary Tree • If all leaf levels are identical and every nonleaf processor has two children, the binary tree is said to be complete. • D= • d=3 Part I
2D Mesh • D= • d=4 • Torus? Part I
Shared memory • A shared-memory multiprocessor can be modeled as a complete graph, in which every node is connected to every other node. • D=1 • d=p-1 Part I
Algorithms for a Linear Array (1) • Semigroup Computation • Let us consider first a special case of semigroup computation, namely, that of maximum finding. Each of the p processors holds a value initially and our goal is for every processor to know the largest of these values. Part I
Algorithms for a Linear Array (2) • Parallel Prefix Computation (Case1) Part I
Algorithms for a Linear Array (3) • Parallel Prefix Computation (Case2, more than one value) Part I
Algorithms for a Linear Array (4) • Packet Routing Part I
Algorithms for a Linear Array (5) • Broadcasting • If Processor i wants to broadcast a value a to all processors, it sends an rbcast(a) (read r-broadcast) message to its right neighbor and an lbcast(a) message to its left neighbor. Part I
Algorithms for a Linear Array (6) • Sorting (Case 1) Part I
Algorithms for a Linear Array (7) • Sorting (Case 2, odd-even transposition) (efficiency?) Part I
Algorithms for a Binary Tree (1) • In algorithms for a binary tree of processors, we will assume that the data elements are initially held by the leaf processors only. • The nonleaf (inner) processors participate in the computation, but do not hold data elements of their own. Part I
Algorithms for a Binary Tree (2) • Semigroup Computation • Each inner node receives two values from its children, applies the operator to them, and passes the result upward to its parent. Part I
Algorithms for a Binary Tree (3) • Parallel Prefix Computation Part I
Algorithms for a Binary Tree (4) • Packet Routing • depends on the processor numbering scheme used. • Preorder Part I
Algorithms for a Binary Tree (5) • Broadcasting • Processor i sends the desired data upwards to the root processor, which then broadcasts the data downwards to all processors. Part I
Algorithms for a Binary Tree (6) • Sorting Part I
Algorithms for 2D Mesh (1) • In all of the 2D mesh algorithms presented in this section, we use the linear-array algorithms of Section 2.3 as building blocks. • This leads to simple algorithms, but not necessarily the most efficient ones. Mesh-based architectures and their algorithms will be discussed in great detail in Part III. Part I
Algorithms for 2D Mesh (2) • Semigroup Computation • For example, in finding the maximum of a set of p values, stored one per processor, the row maximums are computed first and made available to every processor in the row. Then column maximums are identified. Part I
Algorithms for 2D Mesh (3) • Parallel Prefix Computation • (1) do a parallel prefix computation on each row, • (2) do a diminished parallel prefix computation in the rightmost column, and • (3) broadcast the results in the rightmost column to all of the elements in the respective rows and combine with the initially computed row prefix value. Part I
Algorithms for 2D Mesh (4) • Packet Routing • To route a data packet from the processor in Row r, Column c, to the processor in Row r', Column c', we first route it within Row r to Column c'. Then, we route it in Column c' from Row r to Row r'. (row-first routing) Part I
Algorithms for 2D Mesh (5) • Broadcasting • (1) broadcast the packet to every processor in the source node's row and • (2) broadcast in all columns. Part I
Algorithms for 2D Mesh (6) • Sorting Part I
Algorithms for Shared Variables • Semigroup Computation • Parallel Prefix computation • Packet Routing (Trivial in view of the direct communication path between any pair of processors) • Broadcasting (Trivial, as each processor can send a data item to all processors directly) • Sorting Part I