1 / 10

Potential Projects

CS294 Fall, 2011 Communication-Avoiding Algorithms www.cs.berkeley.edu/~odedsc/CS294. Potential Projects. Jim Demmel. Types of Projects. Read and report on a paper Design, implement and/or test an algorithm Extend the result of the paper you read

zahi
Download Presentation

Potential Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS294 Fall, 2011 Communication-Avoiding Algorithms www.cs.berkeley.edu/~odedsc/CS294 Potential Projects Jim Demmel

  2. Types of Projects Read and report on a paper Design, implement and/or test an algorithm Extend the result of the paper you read Use, develop or extend a software generation tool Apply CA algorithms to an interesting application Demonstrate the effect of hardware trends Suggestions are welcomed

  3. 1. Read and report on a paper See http://www.cs.berkeley.edu/~odedsc/CS294/for suggested list of papers. Examples: Graph analysis for proving lower bounds for FFT: Hong and Kung. I/O complexity: The red-blue pebble game. Savage. Extending the Hong-Kung model to memory hierarchies. Many other papers on CA sorting, CA searching, CA discrete algorithms, CA dynamic programming, CA data structures, software generation tools

  4. 2. Design, implement and/or test an algorithm Design a new CA algorithm and analyze it Implement an algorithm and test and analyze the performance (eg HW counters), or implement two algorithms and compare their performance Your report should include tables/graphs comparing performance (time, communication etc) on various parameters (input size, input type if relevant, platform's parameters). Examples: Many linear algebra algorithms (next slides) Direct and Iterative Dense and Sparse All the other dwarfs/motifs Save communication by saving bits of precission

  5. Open Problems / Work in Progress (1/2)Just for Direct Linear Algebra! • Rank Revealing CAQR (RRQR) • AP = QR, P a perm. chosen to put “most independent” columns first • Does LAPACK geqp3 move O(n3) words, or fewer? • Better: Tournament Pivoting (TP) on blocks of columns • CALU with more reliable pivoting (Gu, Grigori, Khabou et al) • Replace tournament pivoting of panel (TSLU) with Rank-Revealing QR of Panel • LU with Complete Pivoting (GECP) • A = PrLUPcT , Pr and Pc are perms. putting “most independent” rows & columns first • Idea: TP on column blocks to chose columns, then TSLU on columns to choose rows • LU of A=AT, with symmetric pivoting • A = PLLTPT, P perm., L lower triangular • Bunch-Kaufman, Rook pivoting: O(n3) words moved? • CA approach: Choose block with one step of CA-GECP, permute rows & columns to top left, pick symmetric subset with local Bunch-Kaufman • A = PLBLTPT, with symmetric pivoting (Toledo et al) • L triangular, P perm., B banded • B tridiagonal: due to Aasen • CA approach: B block tridiagonal, stability still in question

  6. Open Problems / Work in Progress (2/2)Just for Direct Linear Algebra! • Heterogeneity (Ballard, Gearhart) • How to assign work to attain lower bounds when flop rates / bandwidths / latencies vary? • Using extra memory in distributed memory case (Solomonik) • Using all available memory to attain improved lower bounds • Just matmul and CALU so far • Using extra memory in shared memory case • Sparse matrices • CA Sparse Cholesky for “well-partitioned matrices” understood in theory • Sparse matrix multiplication • Implement CA parallel Strassen (already taken)

  7. 3. Extend the result of the paper you read Extend the result of the paper you read, e.g., apply the techniques to a new problem, or adapt the algorithm to new settings. Examples: Can you combine the approach of [Hong Kung] (dominator-sets analysis) and of [Ballard Demmel Holtz Schwartz] (expansion analysis) to obtain lower bounds for Strassen-like algorithms that allow recomputations? Prove new lower bounds for Flops and deduce new communication lower bounds.

  8. 4. Use, develop or extend a software generation tool Use, develop or extend a software generation tool.This may include implementing know algorithms/technique that are suitable for software generation tools. Examples: Sparse Polyhedral Framework [Strout et al] Automatic sparse code generation [Bodik, Gilad, et al] Adding features to pOSKI [Byun, Demmel, Yelick et al] SEJITS [Fox et al] – adding CA code generators [Ballard, Demmel, Gearhart 11] showed how to optimize the communication of dense matrix multiplication on heterogeneous platforms. Can you incorporate this algorithm in a compile-time auto-tuner?

  9. 5. Apply CA algorithms to an interesting application Incorporate and/or specialize a CA algorithm in an important applications Examples: Lots of ParLab applications Ex: Health App: Simulating blood flow of stroke victims Bottom Solver in AMR (Adaptive Mesh Refinement) Large scale data analysis, e.g., on clouds.

  10. 6. Demonstrate the effect of hardware trends Demonstrate the effect of hardware trends (past and predicted future) on the efficiency of algorithms. Example: A rough estimation of the running time of CA O(n3) dense linear algebra on a sequential machine is γn3 + βn3/M1/2, where n is the matrix dimension, M the fast memory size, γ is time per flop, and β the reciprocal bandwidth. Thus, the arithmetic cost dominates the running time as long as γM1/2 ≥ β Collect historical data from various available sources, regarding hardware parameters of interests (such as the above β,γ, and M) as well as predicted parameters of future machines (also available from various sources), and present their impact on the scalability of selected algorithms. Collect data on energy and power performance. Effects of lower precision on communication and energy performance.

More Related