Accelerating generalized Cholesky decomposition using multiple processors

Accelerating generalized Cholesky decomposition using multiple processors C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Application in Least-Squares Collocation C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Error-covariance estimation C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Cholesky Factorization • L: lower triangular matrix C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Generalized Cholesky C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

More Generalized Cholesky C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Parallization • When diagonal element has been computed may each element in the row be reduced separately: • Hence each processor may take care of one column. C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Blockwise factorization • Should one row be factorized at at time ? • Or should we make the factorization of blocks of elements ? • Out-of-core factorization needed for large matrices, so let the processors work on blocked matrices. C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

c11 c12 c13 c14 c15 c16 c11 c12 c13 c14 c15 c16 c22 c25 c22 c25 c21 c23 c24 c26 c21 c23 c24 c26 c33 c34 c35 c36 c33 c34 c35 c36 c31 c32 c31 c32 c41 c42 c43 c44 c45 c46 c41 c42 c43 c44 c45 c46 c51 c52 c53 c54 c55 c56 c51 c52 c53 c54 c55 c56 c61 c62 c63 c64 c65 c66 c61 c62 c63 c64 c65 c66 Block division Column-wise and rectangular Blocks 1 2 3 Blocks 1 2 Block 3 3 blocks ‘Column-wise’ 1-dim. of size 9 3 blocks rectangular 2-dim. of size 3*3 C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Blocksize tests NEQ = 10000, Nproc = 4 NEQ = 20000, Nproc = 2 C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Parallelization Flowchart over the Choleski factorisation with NES_MP and related subroutine(s) C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Parallelization Results Results (Perf. test on two PCs, Compiler PGF90) C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Integration in GEOCOL18 Geocol integration tests: Timing (in s) for equation solving only. C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Performance Increase C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Conclusion • Generalized Cholesky-factorization enables the use of parallelization for solution and error-covariance computation. • Time gain using parallelization depends on number of processors, block-size and how busy the computer is doing other things. C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Note: further use of multiprocessing • Evaluation of spherical harmonic series (N.Pavlis et al.). • Establishing the normal-equation matrix or computing a column of covariances • Factorisation may start as soon as a row of blocks has been established. • Gives realistic speeds of LSC applications (minutes instead of days). C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008

Accelerating generalized Cholesky decomposition using multiple processors