290 likes | 303 Views
Explore the diverse range of parallel computation models, from abstract PRAM to concrete circuit models. Get insights into PRAM submodels and key algorithms like data broadcasting and matrix multiplication. Understand the assumptions and submodels in the PRAM model for efficient parallel processing. Discover the power of semigroup computation, fan-in computation, and parallel prefix computation. Learn about ranking elements in linked lists and performing matrix multiplication efficiently in this insightful chapter.
E N D
Extreme Models Part II
The models of parallel computation range from very abstract to very concrete, with real parallel machine implementations and user models falling somewhere between these two extremes. • At one extreme lies the abstract shared-memory PRAM model. • The other extreme is the circuit model of parallel processing. • Intermediate models Part II
PRAM and Basic Algorithms Part II
In this chapter, • the relative computational powers of several PRAM submodels • five key building-block algorithms • Data broadcasting • Semigroup or fan-in computation • Parallel prefix computation • Ranking the elements of a linked list • Matrix multiplication Part II
PRAM Submodels and Assumptions (1) • PRAM model prescribes the concurrent operation of p processors (in SIMD or MIMD mode) on data that are accessible to all of them in an m-word shared memory. • In the synchronous SIMD or SPMD version of PRAM, Processor i can do the following in the three phases of one cycle: (Not all three phases need to be present in every cycle) Part II
PRAM Submodels and Assumptions (2) • It is possible that several processors may want to read data from the same memory location or write their values into a common location. • Four submodels of the PRAM model have been defined: Part II
PRAM Submodels and Assumptions (3) • Here are a few example submodels based on the semantics of concurrent writes in CRCW PRAM: Part II
PRAM Submodels and Assumptions (4) • The following relationships have been established between some of the PRAM submodels: Part II
DATA BROADCASTING (1) • one-to-all, broadcasting is used when one processor needs to send a data value to all other processors. • In the CREW or CRCW submodels, broadcasting is trivial. (sending processor can write the data value into a memory location, with all processors reading that data value in the following machine cycle) Θ(1) • All-to-all broadcasting, where each of the p processors needs to send a data value to all other processors, can be done through p separate broadcast operations in Θ(P)steps, which is optimal. Part II
DATA BROADCASTING (2) • The above scheme is clearly inapplicable to broadcasting in the EREW model. (one-to-all) • The simplest scheme for EREW broadcasting is to make p copies of the data value, say in a broadcast vector B of length p, and then let each processor read its own copy by accessing B[J] • a method known as recursive doublingis used to copy B[0] into all elements of B in log2p steps Part II
DATA BROADCASTING (3) • The complete EREW broadcast algorithm with this provision is given below. Part II
DATA BROADCASTING (4) • To perform all-to-all broadcasting, so that each processor broadcasts a value that it holds to each of the other p - 1 processors, we let Processor j write its value into B[j], rather than into B[0]. Part II
DATA BROADCASTING (5) • Given a data vector S of length p, a naive sorting algorithm can be designed based on the above all-to-all broadcasting scheme. Part II
SEMIGROUP OR FAN-IN COMPUTATION • This computation is trivial for a CRCW PRAM. • Here too the recursive doubling scheme can be used to do the computation on an EREW PRAM. Part II
PARALLEL PREFIX COMPUTATIO • parallel prefix computation consists of the first phase of the semigroup computation. • the divide-and-conquer paradigm Part II
MATRIX MULTIPLICATION Part II
Skip Chapters • Chapter 6: More shared-Memory Algorithm • Chapter 7: Sorting and Selection Networks • Chapter 8: Other Circuit-Level Examples Part II