160 likes | 296 Views
Accelerating generalized Cholesky decomposition using multiple processors. Application in Least-Squares Collocation. Error-covariance estimation. Cholesky Factorization. L : lower triangular matrix. Generalized Cholesky. More Generalized Cholesky. Parallization.
E N D
Accelerating generalized Cholesky decomposition using multiple processors C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Application in Least-Squares Collocation C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Error-covariance estimation C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Cholesky Factorization • L: lower triangular matrix C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Generalized Cholesky C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
More Generalized Cholesky C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Parallization • When diagonal element has been computed may each element in the row be reduced separately: • Hence each processor may take care of one column. C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Blockwise factorization • Should one row be factorized at at time ? • Or should we make the factorization of blocks of elements ? • Out-of-core factorization needed for large matrices, so let the processors work on blocked matrices. C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
c11 c12 c13 c14 c15 c16 c11 c12 c13 c14 c15 c16 c22 c25 c22 c25 c21 c23 c24 c26 c21 c23 c24 c26 c33 c34 c35 c36 c33 c34 c35 c36 c31 c32 c31 c32 c41 c42 c43 c44 c45 c46 c41 c42 c43 c44 c45 c46 c51 c52 c53 c54 c55 c56 c51 c52 c53 c54 c55 c56 c61 c62 c63 c64 c65 c66 c61 c62 c63 c64 c65 c66 Block division Column-wise and rectangular Blocks 1 2 3 Blocks 1 2 Block 3 3 blocks ‘Column-wise’ 1-dim. of size 9 3 blocks rectangular 2-dim. of size 3*3 C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Blocksize tests NEQ = 10000, Nproc = 4 NEQ = 20000, Nproc = 2 C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Parallelization Flowchart over the Choleski factorisation with NES_MP and related subroutine(s) C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Parallelization Results Results (Perf. test on two PCs, Compiler PGF90) C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Integration in GEOCOL18 Geocol integration tests: Timing (in s) for equation solving only. C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Performance Increase C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Conclusion • Generalized Cholesky-factorization enables the use of parallelization for solution and error-covariance computation. • Time gain using parallelization depends on number of processors, block-size and how busy the computer is doing other things. C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008
Note: further use of multiprocessing • Evaluation of spherical harmonic series (N.Pavlis et al.). • Establishing the normal-equation matrix or computing a column of covariances • Factorisation may start as soon as a row of blocks has been established. • Gives realistic speeds of LSC applications (minutes instead of days). C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008