1 / 26

Department ICEA

Department ICEA. A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato , Carlo Janna, Giuseppe Gambolati, Flavio Sartoretto. Sparse Days 2014 June 5-6. Outline.

yaron
Download Presentation

Department ICEA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Department ICEA A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallelpreconditioning of linear systems Massimiliano Ferronato, Carlo Janna, Giuseppe Gambolati, Flavio Sartoretto Sparse Days 2014 June 5-6

  2. Outline • Introduction: preconditioning techniques for high performance computing • Approximate inverse preconditioning for Symmetric Positive Definite matrices: the FSAI-based approach • FSAIPACK: a software package for high performance FSAI preconditioning • Numerical results • Conclusions and future work

  3. Introduction Preconditioningtechniquesfor high performance computing • The implementation of large models is becoming quite a popular effort in several applications, with the the use of parallel computational resources almost mandatory • One of the most expensive and memory-consuming tasks in any numerical application is the solution of large and sparse linear systems • Conjugate Gradient-like solution methods can be efficiently implemented on parallel computers provided that an effective parallel preconditioner is available • Algebraic preconditioners: robust algorithms that generate a preconditioner from the knowledge of the system matrix only, independently of the problem it arises from • Most popular and successful classes of preconditioners: • Incomplete LU factorizations • Approximate inverses • Algebraic multigrid

  4. Introduction Preconditioningtechniquesfor high performance computing • For parallel computations the Factorized Sparse Approximate Inverse (FSAI) approach is quite attractive, as it is «naturally» parallel • FSAIPACK: a parallel software package for high performance FSAI preconditioning in the solution of Symmetric Positive Definite linear systems • Collection of routinesthatimplementseveraldifferentexistingmethods for computing an FSAI-basedpreconditioner • Allows for a veryflexibleuser-specifiedconstruction of a parallel FSAI preconditioner • General purpose package easy to be includedas an externallibraryintoanyexisting code • Currentlycoded in FORTRAN90 with Open MP directives for sharedmemorymachines • Freelyavailable online atwww.dmsa.unipd.it/~janna/software.html

  5. The FSAI-basedapproach FSAI definition • Factorized Sparse Approximate Inverse (FSAI): an almost perfectly parallel factored preconditioner for SPD problems [Kolotilina & Yeremin, 1993] : with G a lower triangular matrix such that: over the set of matrices with a prescribed lower triangular sparsity pattern SL, e.g. the pattern of A or A2, where L is the exact Cholesky factor of A L is not actually required for computing G! • Computed via the solution of n independent small dense systems and applied via matrix-vector products • Nice features: (1) ideally perfect parallel construction and application of the preconditioner; (2) preservation of the positive definiteness of the native matrix

  6. The FSAI-basedapproach FSAI definition • The key property for the quality of any FSAI-based parallel preconditioner is the selection of the sparsity pattern SL • Historically, the first idea to build SL is to define it a priori, but more effective strategies can be developed dynamically selecting the position of the non-zero entries in SL • Static FSAI: SL is defined a priori, e.g., as the pattern of Ak, possibly after a sparsification of A[Huckle 1999; Chow 2000, 2001] • Dynamic FSAI: SL is defined dynamically during the computation of G using some optimization algorithm [Huckle 2003; Janna & Ferronato, 2011] • Recurrent FSAI: the FSAI factor G is defined as the product of several factors, computed either statically or dynamically [Wang & Zhang 2003; Bergamaschi & Martinez 2012] • Post-filtration: it is generally recommended to apply an a posteriori sparsification of G dropping the smallest entries [Kolotilina & Yeremin, 1999]

  7. FSAIPACK Static FSAI construction • FSAIPACK is a software library that collects several different ways for computing an FSAI preconditioner in a shared memory environment and allows for combining the construction techniques into original user-specified strategies • Assuming that SL is given, it is possible to compute G • Static FSAI: denote by Pi the set of column indices belonging to the i-th row of SL Compute the vector by solving the mi×mi linear system: and scale to obtain the dense i-th row of G:

  8. FSAIPACK Static pattern generation • The non-zero pattern for the Static FSAI computation can be generated with the aid of the following recurrence • Static pattern generation:SL is the lower triangular pattern of a power k of A or of a sparsified A with: and: • User-specified parameters needed: k (integer), t (real)

  9. FSAIPACK Dynamic FSAI construction • For ill-conditioned problems high values of k may be needed to properly decrease the iteration count, or even to allow for convergence, and the preconditioner construction and application can become quite heavy • A most efficient option relies on selecting the pattern dynamically by an adaptive procedure which uses somewhat the “best” available positions for the non-zero coefficients • The Kaporin conditioning number b of an SPD matrix is defined as: where: and iff

  10. FSAIPACK Dynamic FSAI construction • The Kaporin conditioning number of an FSAI preconditioned matrix reads [Janna & Ferronato 2011; Janna et al. 2014] : where yi depends on the non-zero entries in the i-th row of G: • The scalar yi is a quadratic form of A in • Idea fo generating the pattern dynamically: for each row select the non-zero positions in providing the largest decrease in the yi value • Compute the gradient of yi with respect to and retain the positions containing the largest entries • The procedure can be iterated until either a maximum number of iterations or some exit tolerance is met

  11. FSAIPACK Dynamic FSAI construction • Dynamic construction of FSAI by an adaptive pattern generation row-by-row: • Adaptive FSAI:SL is built dynamically and G immediately computed, choosing s entries per step, with a maximum number of kmax steps, into the i-th row such that: until the exit tolerance e is achieved: • User-specified parameters needed: kmax (integer), s (integer), e (real) • The default initial guess G0 is diag(A)-1/2, but any other user-specified lower triangular matrix is possible

  12. FSAIPACK Dynamic FSAI construction • As yi is a quadratic form of A in the i-th row of G, it can be minimized by using a gradient method • This gives rise to an iterative construction of SL and G, another kind of Dynamic FSAI • Iterative FSAI: the i-th row of G is computed by minimizing yi with an incomplete Steepest Descent method: retaining the s largest entries per row for kiter iterations until the exit tolerance e is achieved • User-specified parameters needed: kiter (integer), s (integer), e (real) • The default initial guess G0 is diag(A)-1/2, but any other user-specified lower triangular matrix is possible • The use of an inner preconditioner M-1 is also allowed

  13. FSAIPACK Recurrent FSAI construction • Implicit construction of the sparsity pattern SL, writing the FSAI preconditioner as a product of factors: • Recurrent FSAI: the final factor G is obtained as the product of nl factors: where Gk is the k-level preconditioning factor for: with A0=A and G0=I. Even if each factor is very sparse and computationally very cheap, the resulting preconditioner is actually very dense and never formed explicitly:

  14. FSAIPACK Numericalresults • Analysis of the properties of each single method on a structural test case (size = 190,581, no. of non-zeroes: 7,531,389): • Static FSAI

  15. FSAIPACK Numericalresults • Adaptive FSAI

  16. FSAIPACK Numericalresults • Iterative FSAI

  17. FSAIPACK Numericalresults • Recurrent FSAI

  18. FSAIPACK Numericalresults • Comparison between the different methods on a Linux Cluster with 24 processors: • The most efficient option is combining the different methods so as to maximize the pros and minimize the cons • FSAIPACK implements all the methods for building a FSAI-based preconditioner following a user-specified strategy that can be prescribed by a pseudo-programming language

  19. FSAIPACK Numericalresults • Examples and numerical results (Linux Cluster, 24 processors) EMILIA (reservoir mechanics): size = 923,136 non-zeroes = 41,005,206 Note: Post-filtration is used anyway

  20. FSAIPACK Numericalresults STOCF (porous media flow): size = 1,465,137 non-zeroes = 21,005,389 Note: Post-filtration is used anyway

  21. FSAIPACK Numericalresults MECH (structural mechanics): size = 1,102,614 non-zeroes = 48,987,558 Note: Post-filtration is used anyway

  22. FSAIPACK Numericalresults • Example of strategy prescribed using the pseudo-programming language > MK_PATTERN [ A : patt ] -t -k 1e-2 2 > STATIC_FSAI [ A, patt : F ] > TRANSP_FSAI [ F : Ft ] > PROJ_FSAI [ A, F, Ft : F ] -n -s -e 1 10 1e-8 > ADAPT_FSAI [ A : F ] -n -s -e 10 1 1e-3 > POST_FILT [ A : F ] -t 0.01 > TRANSP_FSAI [ F : Ft ] > APPEND_FSAI [ F, Ft : PREC ] Easy management also of complex strategies

  23. FSAIPACK Numericalresults • FSAIPACK scalability on the largest example • Test on an IBM-Bluegene/Q node equipped with 16 cores • Between 16 and 64 threads the ideal profile is flat because all physical cores are saturated • Using more threads than cores is convenient as we hide memory access latencies

  24. Conclusions Results… • FSAI-based approaches are attractive preconditioners for an efficient solution of SPD linear systems on parallel computers • The traditional static pattern generation is fast and cheap, but can give rise to poor preconditioners • The dynamic pattern generation can improve considerably the FSAI quality, especially in ill-conditioned problems, but its cost typically increases quite rapidly with the density of the preconditioner • FSAIPACK is a high performance software package that has been implemented for building a FSAI-based preconditioner using a user-specified strategy that combines different methods for selecting the sparsity pattern • A smart combination of static and dynamic pattern generation techniques is probably the most efficient way to build an effective preconditioner even for very ill-conditioned problems

  25. Conclusions … and future work • Generalizing the results also for non-symmetric linear systems: difficulties with existence and uniqueness of the preconditioner, and with an efficient dynamic pattern generation • Implementing the FSAIPACK library also for distributed memory computers and GPU accelerators mixing OpenMP, MPI and CUDA • Studying in more detail the Iterative FSAI construction: • Analysis of the theoretical properties of Incomplete gradient methods • Replace the Incomplete Steepest Descent method with an Incomplete Self-Preconditioned Conjugate Gradient method • Understand why the pattern is generally good, even though the computed coefficients could be inaccurate • FSAIPACK is freely available online at: http://www.dmsa.unipd.it/~janna/software.html

  26. Department ICEA Thankyouforyourattention Sparse Days 2014 June 5-6

More Related