120 likes | 201 Views
EECS 570. Notes on Chapter 2 – Parallel Programs. Terminology. Task: Programmer-defined sequential piece of work Concurrency is only across tasks Qualitative amount of work may be: small large Process (thread): Abstract entity that performs tasks Equivalent OS concepts
E N D
EECS 570 • Notes on Chapter 2 – Parallel Programs EECS 570: Fall 2003 -- rev1
Terminology • Task: • Programmer-defined sequential piece of work • Concurrency is only across tasks • Qualitative amount of work may be: • small • large • Process (thread): • Abstract entity that performs tasks • Equivalent OS concepts • Must communicate and synchronize with other processes • Execute on processor • typically one-to-one mapping EECS 570: Fall 2003 -- rev1
Step in Creating a Parallel Program EECS 570: Fall 2003 -- rev1
Decomposition • Break up computation into tasks to be divided among processes • could be static, quasi-static or dynamic • i.e., identify concurrency and decide level at which to exploit it Goal: Enough tasks to keep processes busy... ...but not too many EECS 570: Fall 2003 -- rev1
Amdahl's Law • Assume fraction s of sequential execution is inherently serial • remainder (1- s) can be perfectly parallelized • Speedup with p processors is: 1 1- s s - p • Limit: ? EECS 570: Fall 2003 -- rev1
Aside on Cost-Effective Computing • Isn't Speedup(P) < P inefficient? • If only throughput matters, use P computers instead? • But much of a computer's cost is NOT in the processor [Wood & Hill, IEEE Computer 2/95] Let Costup(P) = Cost(P)/Cost(l) • Parallel computing cost-effective: Speedup(P) > Costup(P) • E.g. for SGI PowerChallenge w/500MB: Costup(32) = 8.6 EECS 570: Fall 2003 -- rev1
Assignment • Assign tasks to processes • Again, can be static, dynamic, or in between • Goals: • Balance workload • Reduce communication • Minimize management overhead • Decomposition + Assignment = Partitioning • Mostly independent of architecture/programming model EECS 570: Fall 2003 -- rev1
Orchestration • How do we achieve task communication, synchronization, and assignment given programming model? • data structures (naming) • task scheduling • communication: messages, shared data accesses • synchronization: locks, semaphores, barriers, etc, • Goals • Reduce cost of communication and synchronization • Preserve data locality (reduce communication, enhance caching) • Schedule tasks to satisfy dependencies early • Reduce overhead of parallelism management EECS 570: Fall 2003 -- rev1
Mapping • Assign processes to processors • Usually up to OS, maybe with user hints/preferences • Usually assume one-to-one, static • Terminology: • space sharing • gang scheduling • processor affinity EECS 570: Fall 2003 -- rev1
Parallelizing Computation vs. Data Above view is centered around computation • Computation is decomposed and assigned (partitioned) Partitioning data is often a natural view too • Computation follows data: owner computes • Grid example: data mining; High Performance Fortran (HPF) But not always sufficient • Distinction between comp. and data stronger in many applications • Barnes-Hut, Raytrace EECS 570: Fall 2003 -- rev1
Assignment • Static assignments (given decomposition into rows) • block: row i is assigned to process floor(i/p) • cyclic: process i is assigned rows I, i+p, and so on • Dynamic • get a row index, work on the row, get a new row, and so on • Static assignment reduces concurrency (from n to p) • block assign, reduces communication by keeping adjacent rows together EECS 570: Fall 2003 -- rev1