1 / 21

CS 584 Lecture 19

CS 584 Lecture 19. Test? Assignment Glenda program Project Proposal is coming up! (March 13) 2 pages 3 references 1 page plan of action with goal dates. Modular Programming. Control Program Complexity Encapsulation Each module provides an interface

portwood
Download Presentation

CS 584 Lecture 19

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 584 Lecture 19 • Test? • Assignment • Glenda program • Project Proposal is coming up! (March 13) • 2 pages • 3 references • 1 page plan of action with goal dates.

  2. Modular Programming • Control Program Complexity • Encapsulation • Each module provides an interface • Limit data access except through the interface. • Composition • Develop programs by combining modules • Reuse

  3. Modular Design and Parallel Programming • Different than traditional sequential modular programming. • We must consider other issues: • Data distribution • Module Composition

  4. Data Distribution • No simple answer. • Data distribution changes may necessitate different module structures and vice-versa. • Best Solution: • Design your code to be data distribution neutral • Not necessarily easy! • Different data distribution schemes sometimes dictate totally different algorithms.

  5. Module Compostion

  6. Sequential Composition • Sequentially move from one parallel module or operation to the next. • SPMD • Great target for parallel library functions • ScaLAPACK • Not necessarily very flexible.

  7. Parallel Composition • Different parts of the computer execute different programs. • Can enhance scalability • locality • Can also decrease memory requirements • less code and data replication

  8. Concurrent Composition • Components are data-driven. • More directly matches task-channel model • Since the components are data-driven overlap of communication and computation is easier. • Can simplify design decisions.

  9. Communication Computation • Overlap communication and computation • 2 basic methods • Send then compute then receive • Send ,post an asynchronous receive, compute something else until the receive completes. • Don't do send-receive pairs unless you must • receive-send pairs are the worst.

  10. Case Study: Image Processing Data flow diagram for an image processing pipeline

  11. FFT Algorithm Choices • FFTs are done by row and then column. • Sequential composition • Everybody does row FFT then column FFT • Parallel Composition • Some do row FFT others do column FFT

  12. Performance Results

  13. Case Study: Matrix Multiply • Goal: Data-distribution neutral • Three basic ways to distribute • row • column • submatrix • Question? • Does our library need different algorithms?

  14. One Dimensional Decomposition • Each processor "owns" black portion • To compute the owned portion of the answer, each processor requires all of A. • This affects data-distribution.

  15. Two Dimensional Decomposition • Requires less data per processor • Algorithm can be performed stepwise.

  16. Broadcast an A sub- matrix to the other processors in row. Compute Rotate the B sub- matrix upwards

  17. Analysis • Performance analysis reveals that the 2 dimensional decomposition is always better. • So our matrix multiply only needs one algorithm • Might need redistribution algorithm to be totally data distribution neutral • However, this is not the best algorithm.

  18. Systolic Matrix Multiply • Replace the A row broadcast with a rotation similar to the B column rotation. • Eliminates the expensive broadcast and replaces it with nearest neighbor comm. • Communication costs much less. • Changes data distribution. • Should we include it in a library? • Redistribution costs?

  19. Conclusion • Modular design is good. • Parallelism introduces different issues: • Data distribution • Module composition • Sequential composition easy but inflexible. • Parallel composition can improve locality. • Concurrent composition is most general.

More Related