150 likes | 254 Views
CS 584 Lecture 20. Assignment Glenda program Project Proposal is coming up! (March 13) 2 pages text + 1 page plan of action 3 references No class March 13 Put your project proposal in my box. Paper presentations on March 11 (Tom Abbott). Module Compostion. Case Study: Matrix Multiply.
E N D
CS 584 Lecture 20 • Assignment • Glenda program • Project Proposal is coming up! (March 13) • 2 pages text + 1 page plan of action • 3 references • No class March 13 • Put your project proposal in my box. • Paper presentations on March 11 (Tom Abbott)
Case Study: Matrix Multiply • Goal: Data-distribution neutral • Three basic ways to distribute • row • column • submatrix • Question? • Does our library need different algorithms?
Analytical Model • Compare the two algorithms • Ignore the computation costs • What are the communication costs.
One Dimensional Decomposition • Each processor "owns" black portion • To compute the owned portion of the answer, each processor requires all of A. • This affects data-distribution.
1-D Decomp. æ ö 2 N ç ÷ = - + T ( P 1 ) t t ç ÷ s w P è ø
Two Dimensional Decomposition • Requires less data per processor • Algorithm can be performed stepwise.
Broadcast an A sub- matrix to the other processors in row. Compute Rotate the B sub- matrix upwards
Algorithm Set B' = Blocal for j = 0 to sqrt(P) -2 in each row I the [(I+j) mod sqrt(P)]th task broadcasts A' = Alocal to the other tasks in the row accumulate A' * B' send B' to upward neighbor done
2-D Decomp. ( ) æ ö 2 æ ö log P N ç ÷ = - + + ç ÷ T P 1 1 t t ç ÷ s w 2 P è ø è ø
Redistribution • If we only have one algorithm, we need to possibly redistribute the data • How much does this cost?
Redistribution ( ) æ ö 2 N ÷ = - + T P 1 t t ç ÷ s w P P è ø
Analysis • Performance analysis reveals that the 2 dimensional decomposition is always better. • So our matrix multiply only needs one algorithm • Might need redistribution algorithm to be totally data distribution neutral • However, this is not the best algorithm.
Systolic Algorithm ( ) æ ö 2 N ç ÷ = - + T 2 P 1 t t ç ÷ s w P ø è