Erin Carson Nicholas Knight, James Demmel Univ. of California, Berkeley

Efficient Deflation-Based Preconditioning for the Communication-Avoiding Conjugate Gradient Method Erin Carson Nicholas Knight, James Demmel Univ. of California, Berkeley Monday, March 16, SIAM CSE 2015, Salt Lake City, Utah

Why Avoid “Communication”? • Algorithms have two costs: computation andcommunication • Communication : moving databetween levels of memory hierarchy (sequential), between processors (parallel) sequential parallel CPU cache CPU DRAM CPU DRAM CPU DRAM CPU DRAM DRAM • On today’s computers, communication is expensive, computation is cheap, in terms of both time and energy • Need to redesign algorithms to avoid communication!

Future Exascale Systems *Sources: from P. Beckman (ANL), J. Shalf (LBL), and D. Unat (LBL) • Gaps between communication/computation cost only growing larger in future systems • Avoiding communication will be essential for applications at exascale!

Krylov Solvers: Limited by Communication • Orthogonalize •  Inner products • Parallel: global reduction (Allreduce) • Sequential: multiple reads/writes to slow memory SpMV Dependencies between communication-bound kernels in each iteration limit performance! orthogonalize

Example: Classical Conjugate Gradient (CG) SpMV Inner products

Communication-avoiding Krylov methods • Communication-avoiding Krylov subspace methods (CA-KSMs) can asymptotically reduce parallel latency • First known reference: (Van Rosendale, 1983); lots of work by Chronopoulos on “s-step” methods • Many methods and variations created since; see Hoemmen’s 2010 PhD thesis for thorough overview

Outer Loop

No communication required!

Example: CA-Conjugate Gradient via CA Matrix Powers Kernel Local computations within inner loop require no communication!

Deflation • Deflation: technique to increase convergence rate • Krylov solvers very efficient for reducing high frequency modes but slow to smooth out low frequency ones • Idea: solve in a separate manner high freq. and low freq. errors • Explicitly solve the linear system in the known eigenspace, use CG to solve for the remaining invariant subspace • Deflated CG first due to Nicolaides (1987) • Used in many applications • Pressure-Poisson equation within an incompressible flow solver (e.g., blood flow): (Mut, Aubry, Löhner, and Cebral, 2010) • Bubbly flow problems (Tang and Vuik, 2007) • Magnetostatic FEM problems (De Gersem and Hameyer, 2001) • Structural dynamics (Perotti and Simoncini, 2003)

Deflation Can deflation techniques be applied to CA-CG while maintaining asymptotic reduction in communication cost?

Deflated CG Algorithm (Saad et al., 2000) SpMVs and dot products required in each inner loop, as in CG New computations due to deflation

Avoiding Communication in Deflation

Outer Loop Construct s-step bases for Krylov subspaces (Nearest neighbor communication)

Deflated CA-CG Local operation, requires no communication

Computation and Communication Complexity

Rough Performance Model Modeled speedup per iteration of CA vs. classical method on model problem (2D 5pt stencil) Not the whole story…

Is This Efficient in Practice?

Extensions: 2-Level Preconditioning • Which preconditioners will still allow communication-avoiding techniques is another question • No communication for diagonal scaling…

A Potential Application • (Vuik, Segal, and Meijerink, 1999) Oil drilling: determine fluid pressures to predict presence of oil and gas in reservoirs • Solve time-dependent diffusion equation using finite elements Earth surface sandstone • Underground consists of layers with very large differences in permeability • Coefficient matrix very ill-conditioned shale sandstone shale • Many small eigenvalues in the matrix, but with diagonal preconditioning: • # of extreme eigenvalues = number of layers with a high permeability (sandstone) sandstone shale sandstone

Conclusions and Future Work • Summary • Deflation can be implemented in CA-CG in a way that still avoids communication • Nontrivial tradeoffs between speed per iteration and convergence ratefor different methods – requires further study and application-specific analysis • Related work • Deflated restarting in CA-GMRES variant (Wakam and Erhel, 2013) • Deflation in CA-GMRES (Yamazaki, Tomov, and Dongarra, 2014) • Deflated pipelined CG (Ghysels, Vanroose, and Meerbergen, 2014) • Future Work & Extensions • Performance studies and applications • Further exploration of 2-level CA preconditioning strategies • Solving (slowly-changing) series of linear systems (recycling Krylov subspaces (Parks et al., 2006))

Thank you! email: erin@cs.berkeley.edu

Erin Carson Nicholas Knight, James Demmel Univ. of California, Berkeley