EECS 570

EECS 570 • Notes on Chapter 2 – Parallel Programs EECS 570: Fall 2003 -- rev1

Terminology • Task: • Programmer-defined sequential piece of work • Concurrency is only across tasks • Qualitative amount of work may be: • small • large • Process (thread): • Abstract entity that performs tasks • Equivalent OS concepts • Must communicate and synchronize with other processes • Execute on processor • typically one-to-one mapping EECS 570: Fall 2003 -- rev1

Step in Creating a Parallel Program EECS 570: Fall 2003 -- rev1

Decomposition • Break up computation into tasks to be divided among processes • could be static, quasi-static or dynamic • i.e., identify concurrency and decide level at which to exploit it Goal: Enough tasks to keep processes busy... ...but not too many EECS 570: Fall 2003 -- rev1

Amdahl's Law • Assume fraction s of sequential execution is inherently serial • remainder (1- s) can be perfectly parallelized • Speedup with p processors is: 1 1- s s - p • Limit: ? EECS 570: Fall 2003 -- rev1

Aside on Cost-Effective Computing • Isn't Speedup(P) < P inefficient? • If only throughput matters, use P computers instead? • But much of a computer's cost is NOT in the processor [Wood & Hill, IEEE Computer 2/95] Let Costup(P) = Cost(P)/Cost(l) • Parallel computing cost-effective: Speedup(P) > Costup(P) • E.g. for SGI PowerChallenge w/500MB: Costup(32) = 8.6 EECS 570: Fall 2003 -- rev1

Assignment • Assign tasks to processes • Again, can be static, dynamic, or in between • Goals: • Balance workload • Reduce communication • Minimize management overhead • Decomposition + Assignment = Partitioning • Mostly independent of architecture/programming model EECS 570: Fall 2003 -- rev1

Orchestration • How do we achieve task communication, synchronization, and assignment given programming model? • data structures (naming) • task scheduling • communication: messages, shared data accesses • synchronization: locks, semaphores, barriers, etc, • Goals • Reduce cost of communication and synchronization • Preserve data locality (reduce communication, enhance caching) • Schedule tasks to satisfy dependencies early • Reduce overhead of parallelism management EECS 570: Fall 2003 -- rev1

Mapping • Assign processes to processors • Usually up to OS, maybe with user hints/preferences • Usually assume one-to-one, static • Terminology: • space sharing • gang scheduling • processor affinity EECS 570: Fall 2003 -- rev1

Parallelizing Computation vs. Data Above view is centered around computation • Computation is decomposed and assigned (partitioned) Partitioning data is often a natural view too • Computation follows data: owner computes • Grid example: data mining; High Performance Fortran (HPF) But not always sufficient • Distinction between comp. and data stronger in many applications • Barnes-Hut, Raytrace EECS 570: Fall 2003 -- rev1

Assignment • Static assignments (given decomposition into rows) • block: row i is assigned to process floor(i/p) • cyclic: process i is assigned rows I, i+p, and so on • Dynamic • get a row index, work on the row, get a new row, and so on • Static assignment reduces concurrency (from n to p) • block assign, reduces communication by keeping adjacent rows together EECS 570: Fall 2003 -- rev1

EECS 570

EECS 570

Presentation Transcript

LIS 570

LIS 570

Ling 570

Muhammad (570-632)

Ling 570

Finance 570

570 A.D.

Ling 570

LIS 570

570 Camtasia

PSC 570

LIS 570

EECS 570

PSC 570

LIS 570

570

PSC 570

PSC 570

LIS 570

LIS 570

LIS 570

LIS 570