210 likes | 232 Views
CS 584 Lecture 19. Test? Assignment Glenda program Project Proposal is coming up! (March 13) 2 pages 3 references 1 page plan of action with goal dates. Modular Programming. Control Program Complexity Encapsulation Each module provides an interface
E N D
CS 584 Lecture 19 • Test? • Assignment • Glenda program • Project Proposal is coming up! (March 13) • 2 pages • 3 references • 1 page plan of action with goal dates.
Modular Programming • Control Program Complexity • Encapsulation • Each module provides an interface • Limit data access except through the interface. • Composition • Develop programs by combining modules • Reuse
Modular Design and Parallel Programming • Different than traditional sequential modular programming. • We must consider other issues: • Data distribution • Module Composition
Data Distribution • No simple answer. • Data distribution changes may necessitate different module structures and vice-versa. • Best Solution: • Design your code to be data distribution neutral • Not necessarily easy! • Different data distribution schemes sometimes dictate totally different algorithms.
Sequential Composition • Sequentially move from one parallel module or operation to the next. • SPMD • Great target for parallel library functions • ScaLAPACK • Not necessarily very flexible.
Parallel Composition • Different parts of the computer execute different programs. • Can enhance scalability • locality • Can also decrease memory requirements • less code and data replication
Concurrent Composition • Components are data-driven. • More directly matches task-channel model • Since the components are data-driven overlap of communication and computation is easier. • Can simplify design decisions.
Communication Computation • Overlap communication and computation • 2 basic methods • Send then compute then receive • Send ,post an asynchronous receive, compute something else until the receive completes. • Don't do send-receive pairs unless you must • receive-send pairs are the worst.
Case Study: Image Processing Data flow diagram for an image processing pipeline
FFT Algorithm Choices • FFTs are done by row and then column. • Sequential composition • Everybody does row FFT then column FFT • Parallel Composition • Some do row FFT others do column FFT
Case Study: Matrix Multiply • Goal: Data-distribution neutral • Three basic ways to distribute • row • column • submatrix • Question? • Does our library need different algorithms?
One Dimensional Decomposition • Each processor "owns" black portion • To compute the owned portion of the answer, each processor requires all of A. • This affects data-distribution.
Two Dimensional Decomposition • Requires less data per processor • Algorithm can be performed stepwise.
Broadcast an A sub- matrix to the other processors in row. Compute Rotate the B sub- matrix upwards
Analysis • Performance analysis reveals that the 2 dimensional decomposition is always better. • So our matrix multiply only needs one algorithm • Might need redistribution algorithm to be totally data distribution neutral • However, this is not the best algorithm.
Systolic Matrix Multiply • Replace the A row broadcast with a rotation similar to the B column rotation. • Eliminates the expensive broadcast and replaces it with nearest neighbor comm. • Communication costs much less. • Changes data distribution. • Should we include it in a library? • Redistribution costs?
Conclusion • Modular design is good. • Parallelism introduces different issues: • Data distribution • Module composition • Sequential composition easy but inflexible. • Parallel composition can improve locality. • Concurrent composition is most general.