220 likes | 239 Views
Efficient Deflation-Based Preconditioning for the Communication-Avoiding Conjugate Gradient Method. Erin Carson Nicholas Knight, James Demmel Univ. of California, Berkeley Monday, March 16, SIAM CSE 2015, Salt Lake City, Utah. Why Avoid “Communication”?.
E N D
Efficient Deflation-Based Preconditioning for the Communication-Avoiding Conjugate Gradient Method Erin Carson Nicholas Knight, James Demmel Univ. of California, Berkeley Monday, March 16, SIAM CSE 2015, Salt Lake City, Utah
Why Avoid “Communication”? • Algorithms have two costs: computation andcommunication • Communication : moving databetween levels of memory hierarchy (sequential), between processors (parallel) sequential parallel CPU cache CPU DRAM CPU DRAM CPU DRAM CPU DRAM DRAM • On today’s computers, communication is expensive, computation is cheap, in terms of both time and energy • Need to redesign algorithms to avoid communication!
Future Exascale Systems *Sources: from P. Beckman (ANL), J. Shalf (LBL), and D. Unat (LBL) • Gaps between communication/computation cost only growing larger in future systems • Avoiding communication will be essential for applications at exascale!
Krylov Solvers: Limited by Communication • Orthogonalize • Inner products • Parallel: global reduction (Allreduce) • Sequential: multiple reads/writes to slow memory SpMV Dependencies between communication-bound kernels in each iteration limit performance! orthogonalize
Example: Classical Conjugate Gradient (CG) SpMV Inner products
Communication-avoiding Krylov methods • Communication-avoiding Krylov subspace methods (CA-KSMs) can asymptotically reduce parallel latency • First known reference: (Van Rosendale, 1983); lots of work by Chronopoulos on “s-step” methods • Many methods and variations created since; see Hoemmen’s 2010 PhD thesis for thorough overview
Example: CA-Conjugate Gradient via CA Matrix Powers Kernel Local computations within inner loop require no communication!
Deflation • Deflation: technique to increase convergence rate • Krylov solvers very efficient for reducing high frequency modes but slow to smooth out low frequency ones • Idea: solve in a separate manner high freq. and low freq. errors • Explicitly solve the linear system in the known eigenspace, use CG to solve for the remaining invariant subspace • Deflated CG first due to Nicolaides (1987) • Used in many applications • Pressure-Poisson equation within an incompressible flow solver (e.g., blood flow): (Mut, Aubry, Löhner, and Cebral, 2010) • Bubbly flow problems (Tang and Vuik, 2007) • Magnetostatic FEM problems (De Gersem and Hameyer, 2001) • Structural dynamics (Perotti and Simoncini, 2003)
Deflation Can deflation techniques be applied to CA-CG while maintaining asymptotic reduction in communication cost?
Deflated CG Algorithm (Saad et al., 2000) SpMVs and dot products required in each inner loop, as in CG New computations due to deflation
Outer Loop Construct s-step bases for Krylov subspaces (Nearest neighbor communication)
Deflated CA-CG Local operation, requires no communication
Rough Performance Model Modeled speedup per iteration of CA vs. classical method on model problem (2D 5pt stencil) Not the whole story…
Extensions: 2-Level Preconditioning • Which preconditioners will still allow communication-avoiding techniques is another question • No communication for diagonal scaling…
A Potential Application • (Vuik, Segal, and Meijerink, 1999) Oil drilling: determine fluid pressures to predict presence of oil and gas in reservoirs • Solve time-dependent diffusion equation using finite elements Earth surface sandstone • Underground consists of layers with very large differences in permeability • Coefficient matrix very ill-conditioned shale sandstone shale • Many small eigenvalues in the matrix, but with diagonal preconditioning: • # of extreme eigenvalues = number of layers with a high permeability (sandstone) sandstone shale sandstone
Conclusions and Future Work • Summary • Deflation can be implemented in CA-CG in a way that still avoids communication • Nontrivial tradeoffs between speed per iteration and convergence ratefor different methods – requires further study and application-specific analysis • Related work • Deflated restarting in CA-GMRES variant (Wakam and Erhel, 2013) • Deflation in CA-GMRES (Yamazaki, Tomov, and Dongarra, 2014) • Deflated pipelined CG (Ghysels, Vanroose, and Meerbergen, 2014) • Future Work & Extensions • Performance studies and applications • Further exploration of 2-level CA preconditioning strategies • Solving (slowly-changing) series of linear systems (recycling Krylov subspaces (Parks et al., 2006))
Thank you! email: erin@cs.berkeley.edu