480 likes | 588 Views
The Use of The Sparse Direct Solver in The Engineering Applications of The Finite Element Method Pou žití přímého řídkého řešiče v inženýrských aplikacích metody konečných prvků. Richard Vondráček Department of Structural Mechanics Faculty of Civil Engineering
E N D
The Use of The Sparse Direct Solver in The Engineering Applications of The Finite Element MethodPoužití přímého řídkého řešiče v inženýrských aplikacích metody konečných prvků Richard Vondráček Department of Structural Mechanics Faculty of Civil Engineering Czech Technical University in Prague
Outline • Solutions of SLE • Direct sparse solver • Matrix storage formats • Ordering algorithms in general • Block storage • Symbolical factorization • Sparse numerical factorization • Preliminary results • Future goals
Methods for the solution of the sparse system of the linear equations • Iterative • Jacobi, Gauss-Seidel, SOR • CG, GMRES • Multigrid • Direct • Factorization type • LU, LDLT, LLT ( left-looking / right-looking ) • QR • Algorithm (matrix storage scheme) • Skyline (L1) • Frontal (L3) • Multifrontal (L3) • Sparse (L0..L3) • Combined • Incomplete factorizations ( preconditioned CG ) • Domain decompositions - FETI, DP-FETI
Sparse direct solver motivation • Fast factorization • Robust implementation • Unsymmetrical, indefinite, poorly conditioned problems • Lower memory consumption • Enable factorization of bigger tasks • Practical with existing FEM solvers • Compatible with domain decomposition algorithms • Freely available source and documentation
Sparse direct solver • Sparse LDLT, LLT, LU decomposition • Compressed columns storage • Block storage scheme • less indices • efficiency in floating point operations • locality of data is good for memory access caching • Symbolic factorization • AMD ordering (T.A.Davis, P.Amestoy and I.S.Duff) • Minimizes directly the fill-in • Quotient graph • Pattern of matrix L
Matrix ordering algorithms • Profile minimizing • Cuthill-McKee • Sloan • Spectral envelope reduction • Fill-in minimizing • Minimum degree (local → global) • AMD • MMD • Nested Disection (global → local) • Recursive Graph Bisection • Recursive Coordinate Bisection • Recursive Spectral Bisection • Multisection
Matrix storage schemes - orderings Example 2D plane stress mechanical problem Sparse stiffness matrix
Matrix storage schemes - orderings Original ordering + fill-in skyline Original ordering + fill-in sparse direct
Matrix storage schemes - orderings Reversed Cuthill-McKee skyline AMD + fill-in sparse direct
Matrix storage schemes - orderings Recursive Graph Bisection sparse direct Principle of the bisection
Block sparsematrix storage scheme The size of the blocks corresponds with the number of DOFs ofmesh node. The size depends on the physics of the underlying problem. • less indices • efficiency in floating point operations • locality of data is good for memory access caching • performance is tradeoff ( fast arith. vs. wasted arith. ) b×n 2×2, 3×3,.. ..,b×b
Block sparsematrix storage scheme Care must be taken about how the individual equations of the sparse system fit into the regular block structure. b×n Equations eliminated due to the DBC should not destroy the block structure !!
Block sparsematrix storage scheme u1a u2a pa k11ab k12ab c1ab u1b Indefinite systems which have regular “indefinite contributions” from each mesh node can be a-priory preordered to ensure the numerical stability of the decomposition k21ab k22ab c2ab u2b c1ab c2ab 0 pb b
Sparse solution strategy clear matrix L select DOFs permutation and blocksize create coarse connectivity matrix load nonzero matrix A entries into prepared matrix L symbolical factorization finds the block ordering and L pattern numerical factorization forward / backward substitutions for different RHSs allocate memory for the block sparse matrix L
Symbolical factorization • Based on a modified Approximate Minimum Degree algorithm • AMD ordering (T.A.Davis, P.Amestoy and I.S.Duff) (colamd) • Principle of the minimum degree • Finds the optimal equations preordering • Determines the final memory requirements • Gives us the structure of the factored matrix L
Symbolical factorization Connectivity graph used for symbolic factorization Original graph Elimination graph Quotient graph eliminate
Approximate minimum degree Quotient graph in k-th step of factorization
Approximate minimum degree Important sets True external degree (minimum degree)
Numerical factorization • Left-looking decomposition • Three embeded “for” loops • Indirect addressing in the inner loop due to the compressed column format • No further memory allocation • Preoptimized block multiplications • Use of temporary index maps
Numerical factorization Order in which the left-looking factorization updates the matrix entries in the two outer loops
Numerical factorization The purpose of the factorization inner loop in a dense matrix
Numerical factorization Computational effort of the skyline solver in the inner loop
Numerical factorization Computational effort of the sparse direct solver in the inner loop
Numerical experiments • Regular domains and structured meshes • General plate problem (Mindlin theory) • General threedimensional problem
Numerical experiments Parameters of used cluster
Regular domains and structuredmeshes Regular domains and structuredmeshes Memory demands of the Schur complement method
Regular domains and structuredmeshes Regular domains and structuredmeshes Time demands of the Schur complement method in seconds
Regular domains and structuredmeshes Regular domains and structuredmeshes Memory demands of the DP-FETI method
Regular domains and structuredmeshes Regular domains and structuredmeshes Time requirements of the DP-FETI method in seconds
General threedimensional problem General domain and unstructuredmeshes The Schur complement method
Preliminary conclusions • Competitive with skyline already for small tasks (but not for all) • Clear winner for larger tasks • If the system matrix fits well into the blocks we obtain optimal performance • Always faster RHS solution (forw., back. substitution) than skyline • Can be used as IC preconditioner for CG • Lower memory consumption • Allows bigger tasks without pagefile swapping
Future work and possible research • Study and find best practices for • block size influence to the performance • domain shape influence to the sparse direct vs. skyline effectivity • performance in nonlinear mechanical problems • performance in indefinite problems (incompressible media) • Try variable block size storage schema • Implement automatic block utilization algorithm • LU for primary domain decomposition • Implement block skyline and compare performance • ? Combined storage schema sparse + skyline ?