260 likes | 618 Views
Parallel Processing & Parallel Algorithm. May 8, 2003 B4 Yuuki Horita. Chapter 3. Principles of Parallel Programming. Principles of Parallel Programming. Data Dependency Processors Communication Mapping Granularity. Data Dependency. data flow dependency data anti-dependency
E N D
Parallel Processing &Parallel Algorithm May 8, 2003 B4 Yuuki Horita
Chapter 3 Principles of Parallel Programming
Principles of Parallel Programming • Data Dependency • Processors Communication • Mapping • Granularity
Data Dependency • data flow dependency • data anti-dependency • data output dependency • data input dependency • data control dependency
data flow dependency &data anti-dependency - data flow dependency S1: A = B + C S2: D = A + E - data anti-dependency S1: A = B + C S2: B = D + E
data output dependency &data input dependency - data output dependency S1: A = B + C S2: A = D + E - data input dependency S1: A = B + C S2: D = B + E
data control dependency S1: A = B - C if ( A > 0 ) S2: then D = 1; S3: else D = 2; endif S1⇒S2,S3
Dependency Graph • G = ( V, E ) V(Vertices) : statements E(Edges) : dependency ex) S1: A = B + C S1 → S2 : flow dep. , anti-dep. S2: B = A * 3 S1 → S3 : output dep. , input dep. S3: A = 2 * C S2 → S3 : anti-dep. S4: P = B ≧0 S2 → S4 : flow dep. if ( P is True) S4 → S5 , S6 : control dep. S5: then D = 1 S6: else D = 2 endif
Elimination of the dependencies • data output dependency • data anti-dependency ⇒ renaming can remove these form of dependency
Ex) S1: A = B + C S2: B = D + E S3: A = F + G S1,S2 : anti-dep. S1,S3 : output-dep. S1: A = B + C S2: B’ = D + E S3: A’ = F + G No dependency! Elimination of the dependencies(2) Renaming
Processors communication • Message passing communication - processors communicate via communication links • Shared memory communication - processors communicate via common memory
Message Passing System Interconnection Network PE1 PE2 PE3 PEm ・・・ M1 M2 M3 Mm ・・・
Message Passing System(2) • Send and Receive operations - blocking - nonblocking • System - synchronous blocking send and blocking receive operations - asynchronous nonblocking send and blocking receive operations ( the messages are buffered )
Shared memory system Global Memory M1 M2 M3 Mm ・・・ Interconnection Network PE1 PE2 PE3 PEm ・・・
Mapping … matching parallel algorithms to parallel architecture mapping to - asynchronous architecture - synchronous architecture - distributed architecture
Mapping to Asynchronous Architecture Mapping a program to an asynchronous shared-memory computer has the following steps: • Allocate each statement in the program to a processor • Allocate each variable to a memory • Specify the control flow for each processor the sender may send many messages without the receiver removing them from the channel ⇒ the need for buffering messages
Mapping to Synchronous Architecture • Mapping a program is the same as Asynchronous Architecture • a common clock for synchronization purposes • each processor executes an instruction at each step ( at each clock tick) • only one message exists at any one time on the channel ⇒ no buffering is needed
Mapping to Distributed Architecture • A local memory accessible only by owning processor • only a pair of processor along the channel • Mapping a program is the same as in the asynchronous shared-memory architecture, except that each variable is allocated either to a processor or a channel
Granularity relates to the ratio of the amount of computation to the amount of communication • fine : at statement level • medium : at procedure level • coarse : at program level
Program Level Parallelism • a program creates a new process by creating a complete copy of itself - Fork() (UNIX)
Statement Level Parallelism ・ Parbegin-Parend block Par-begin Statement1 Statement2 : Statementn Par-end ⇒ the statements Statement1~Statementn are executed in parallel
Statement Level Parallelism(2) ex) (a + b) * (c + d) – (e / f ) Par-begin Par-begin t1 = a + b t2 = c + d Par-end t4 = t1 * t2 t3 = e / f Par-end t5 = t4 – t3
Statement Level Parallelism(3) ・ Fork, Join, Quit - Fork x cause a new process to be created and to start executing at the instruction labeled x - Join t, y t = t – 1 if ( t = 0) then go to y - Quit the process terminates
Statement Level Parallelism(4) ex) (a + b) * (c + d) – (e / f ) n = 2 m = 2 Fork P2 Fork P3 P1: t1 = a + b; Join m, P4; Quit; P2: t2 = c + d; Join m, P4; Quit; P4: t4 = t1*t2; Join n, P5; Quit; P3: t3 = e / f; Join n, P5; Quit; P5: t5 = t4 – t3