CS 267: Applications of Parallel Computers Final Project Suggestions

CS 267: Applications of Parallel ComputersFinal Project Suggestions James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr06 CS267 Lecture 22a

Outline • Kinds of projects • Evaluating and improving the performance of a parallel application • “Application” could be full scientific application, or important kernel • Parallelizing a sequential application • other kinds of performance improvements possible too, eg memory hierarchy tuning • Devise a new parallel algorithm for some problem • Porting parallel application or systems software to new architecture • Example of previous projects (all on-line) • Upcoming guest lecturers • See their previous lectures, or contact them, for project ideas • Suggested projects CS267 Lecture 22a

CS267 Class Projects from 2004 • BLAST Implementation on BEE2 — Chen Chang • PFLAMELET; An Unsteady Flamelet Solver for Parallel Computers — Fabrizio Bisetti • Parallel Pattern Matcher — Frank Gennari, Shariq Rizvi, and Guille Díez-Cañas • Parallel Simulation in Metropolis — Guang Yang • A Survey of Performance Optimizations for Titanium Immersed Boundary Simulation — Hormozd Gahvari, Omair Kamil, Benjamin Lee, Meling Ngo, and Armando Solar • Parallelization of oopd1 — Jeff Hammel • Optimization and Evaluation of a Titanium Adaptive Mesh Refinement Code — Amir Kamil, Ben Schwarz, and Jimmy Su CS267 Lecture 22a

CS267 Class Projects from 2004 (cont) • Communication Savings With Ghost Cell Expansion For Domain Decompositions Of Finite Difference Grids — C. Zambrana Rojas and Mark Hoemmen • Parallelization of Phylogenetic Tree Construction — Michael Tung • UPC Implementation of the Sparse Triangular Solve and NAS FT — Christian Bell and Rajesh Nishtala • Widescale Load Balanced Shared Memory Model for Parallel Computing — Sonesh Surana, Yatish Patel, and Dan Adkins CS267 Lecture 22a

Planned Guest Lecturers • Katherine Yelick (UPC, heart modeling) • David Anderson (volunteer computing) • Kimmen Sjolander (phylogenetic analysis of proteins – SATCHMO – Bonnie Kirkpatrick) • Julian Borrill, (astrophysical data analysis) • Wes Bethel, (graphics and data visualization) • Phil Colella, (adaptive mesh refinement) • David Skinner, (tools for scaling up applications) • Xiaoye Li, (sparse linear algebra) • Osni Marques and Tony Drummond, (ACTS Toolkit) • Andrew Canning (computational neuroscience) • Michael Wehner (climate modeling) CS267 Lecture 22a

Suggested projects (1) • Weekly research group meetings on these and related topics (see J. Demmel and K. Yelick) • Contribute to upcoming ScaLAPACK release (JD) • Proposal, talk at www.cs.berkeley.edu/~demmel; ask me for latest • Performance evaluation of existing parallel algorithms • Ex: New eigensolvers based on successive band reduction • Improved implementations of existing parallel algorithms • Ex: Use UPC to overlap communication, computation • Many serial algorithms to be parallelized • See following slides CS267 Lecture 22a

Missing Drivers in Sca/LAPACK CS267 Lecture 22a

More missing drivers CS267 Lecture 22a

Suggested projects (2) • Contribute to sparse linear algebra (JD & KY) • Performance tuning to minimize latency and bandwidth costs, both to memory and between processors (sparse => few flops per memory reference or word communicated) • Typical methods (eg CG = conjugate gradient) do some number of dot projects, saxpys for each SpMV, so communication cost is O(# iterations) • Our goal: Make latency cost O(1)! • Requires reorganizing algorithms drastically, including replacing SpMV by new kernel [Ax, A2x, A3x, … , Akx], which can be done with O(1) messages • Projects • Study scalability bottlenecks of current CG on real, large matrices • Optimize [Ax, A2x, A3x, … , Akx] on sequential machines • Optimize [Ax, A2x, A3x, … , Akx] on parallel machines CS267 Lecture 22a

Suggested projects (3) • Evaluate new languages on applications (KY) • UPC or Titanium • UPC for asynchrony, overlapping communication & computation • ScaLAPACK in UPC • Use UPC-based 3D FFT in your application • Optimize existing 1D FFT in UPC, to use 3D techniques • Porting, Evaluating parallel systems software (KY) • Port UPC to RAMP • Port GASNET to Blue Gene, evaluate performance CS267 Lecture 22a

CS 267: Applications of Parallel Computers Final Project Suggestions