230 likes | 400 Views
A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory. Pierre Carrier University of Minnesota. Yousef Saad University of Minnesota. James K. Freericks Georgetown University. Inhomogeneous-DMFT Low-Rank Lanczos for finding diagonal of inverse
E N D
A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory Pierre Carrier University of Minnesota Yousef Saad University of Minnesota James K. Freericks Georgetown University • Inhomogeneous-DMFT • Low-Rank Lanczos for finding diagonal of inverse • Implementation using Co-Array Fortran=fortran2008 • Towards 3D petascale computations (CPU time vs memory) NISE-2011 Slides available at http://www.msi.umn.edu/~carrierp/images/APS2012.pdf Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Inhomogeneous-DMFT Diagonal of inverse Dyson equation ... ... Mapping fixed-point iteration -self-energies -hopping matrix -density -potentials -... effective medium The Green’s functions are complex-symmetric matrices and, in general, non-diagonally dominant W. Metzner & D. Vollhardt, Phys. Rev. Lett. 62, 324 (1989) M.-T. Tran, Phys. Rev. B 73, 205110 (2006) Transport in multilayered nanostructures, the dynamical mean-field theory (Imperial college Press, 2006) Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Low-Rank Lanczos for finding diagonal of inverse -Standard Lanczos for Hermitian matrices: ( )( T )( ) G = leads to G-1 -Lanczos for complex-symmetric matrices: R. W. Freund, SIAM J. Sci. Stat. Comput. 13, 425 (1992): Use indefinite dot-product into standard Lanczos ( )( T )( ) Need to deal with possible “breakdowns”... G = [See also Proc. of the HPCMP Users Group Conference, page 223 (2010)] Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Low-Rank Lanczos for finding diagonal of inverse -Standard Lanczos for Hermitian matrices: ( )( T )( ) G = leads to G-1 -Lanczos for complex-symmetric matrices: R. W. Freund, SIAM J. Sci. Stat. Comput. 13, 425 (1992): Use indefinite dot-product into standard Lanczos ( )( T )( ) Need to deal with possible “breakdowns”... G = [See also other approach shown in Proc. of the HPCMP Users Group Conference, page 223 (2010)] Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Low-Rank Lanczos for finding diagonal of inverse Standard Lanczos is prohibitive when N is large ( N ~100X100X100 sites) ( )( T )( ) G = Large, dense + requires re-orthogonalization Alternative: reduce the number of Lanczos steps and keep the solution exact use a low-rank matrix Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Low-Rank Lanczos for finding diagonal of inverse 9 interface sites 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1234567890123456789012345 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1 1 1 Gij= 1 1 1 1 Example in 2D: decomposition into 4 domains gives block diagonal matrices 1 1 1 2 9 interface sites 2 2 2 2 2 Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Low-Rank Lanczos for finding diagonal of inverse ( ) G= interface sites ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Low-Rank Lanczos for finding diagonal of inverse ( ) G= interface sites -1 ( ) number of interface sites = Schur complement’s rank - = ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Low-Rank Lanczos for finding diagonal of inverse ( ) G= interface sites -1 ( ) number of interface sites = Schur complement’s rank - = -1 ( ) - [G]-1 = DomainDecomp( ) Domain decomposition Rather complicated expression ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Low-Rank Lanczos for finding diagonal of inverse ( ) G= interface sites -1 ( ) number of interface sites = Schur complement’s rank - = -1 ( ) - [G]-1 = DomainDecomp( ) Domain decomposition swap -1 ( ) = X=[G]-1 - ( )( )( ) Lanczos on interface sites Lanczos-based low-rank ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Low-Rank Lanczos for finding diagonal of inverse ( ) G= interface sites -1 ( ) number of interface sites = Schur complement’s rank - = -1 ( ) - [G]-1 = DomainDecomp( ) Domain decomposition swap -1 ( ) = X=[G]-1 - ( )( )( ) Lanczos on interface sites SAME LOW-RANK AS OR Lanczos-based low-rank ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Low-Rank Lanczos for finding diagonal of inverse ( ) G= interface sites -1 ( ) number of interface sites = Schur complement’s rank - = -1 ( ) - [G]-1 = DomainDecomp( ) Domain decomposition -1 ( ) = X=[G]-1 - T ( )( )( ) Lanczos on interface sites Lanczos-based low-rank This is the low-rank matrix ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
-1 ( ) • Implementation using Co-Array Fortran=fortran2008 -1 ( ) = X=[G]-1 - T ( )( )( ) Lanczos on interface sites Lanczos-based low-rank (rank of Schur complement) S << N (rank of full G matrix) X AT EACH LANCZOS STEP Parallel complex-GMRES on Gu = qj Complex-GMRES per each block B ALSO USED AS PRECONDITIONER - -1 ( ) ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Implementation using Co-Array Fortran=fortran2008 • IDMFT SCF loop; Falicov-Kimball; 3D • Distributed sparse matrices (CSR format) • Includes parallel complex- GMRES, Block-GMRES and Lanczos • Written entirely in CAF- fortran 2008: -CAF is available on any Cray machines using PrgEnv-cray “ftn -hcaf <routine.f08>” -“allocate( GreensFunction(subarray)[domain, 0:Matsubara]” -CO_SUM = MPI_Allreduce(..., SUM, ...) Example: “Greens(34)[1,0] = Greens(56)[3,62]” Matsubara=0 Matsubara=1 Matsubara=2 Matsubara=62 Greens(34) domain=1 domain=2 ... Greens(56) domain=3 domain=4 domain=5 (Schur) ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Implementation using Co-Array Fortran=fortran2008 Freund’s Lanczos breaking down behavior: Freunds’ algorithm works well, ~20% more iterations can be required before convergence Hopper Diagonally dominant matrices require relatively more additional iterations; Null space is more often visited Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Towards 3D petascale computations (CPU time vs memory) Time Scaling is equivalent to that of domain decomposition Goal for 3D: More domains and larger systems Hopper Tested in the past in 2D, sequential, F90, see: http://www.sciencedirect.com/science/article/pii/S1875389211003257 Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Towards 3D petascale computations (CPU time vs memory) MEMORY conflict with the low-rank Lanczos algorithm: -Small number of Lanczos iteration -Large memory per block Memory too large due to the size of each blocks -Large number of Lanczos iteration -Small memory per block Memory too large due to the size of Lanczos re-orthogonalization Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Towards 3D petascale computations (CPU time vs memory) The cusp appears at (331) in 2D case Hopper optimum subdomain size for low-rank Lanczos Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
100000 • Towards 3D petascale computations (CPU time vs memory) Using 2X2X2 Lanczos domains only Maximum on Cray XE6 (1333MB/processor) 41X41X41 sites max ~4800 Lanczos steps 31X31X31 TOTAL MEMORY/proc. number of Lanczos steps Initialization MEMORY/proc. ~2883 ~1875 ~1323 Indexing MEMORY/proc. ~867 ~675 ~507 ~363 Hopper Number of sites Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Towards 3D petascale computations (CPU time vs memory) 160X160X160= 4,173,281 sites 81X81X81=531,441 sites ~77,763 Lanczos steps ~19,683 Lanczos steps 2(4)X2(4)X2(4) = 512 Lanczos (subdomains) 2(2)X2(2)X2(2) = 64 Lanczos (subdomains) -other patterns of interface -out-of-core Lanczos vectors allocate( GreensFunction(subarray)[subdomains, Lanczos,Matsubara]) SLIDES available at http://www.msi.umn.edu/~carrierp/images/APS2012.pdf CODES available at:http://www.msi.umn.edu/~carrierp/coarrayFortran/index.htm Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
Thank you • Bob Numrich (CUNY) • Jok Tang (Vortech) • Woo-Sun Yang (NERSC) • Haw-ren Fang (U Minnesota) SLIDES available at http://www.msi.umn.edu/~carrierp/images/APS2012.pdf CODES available at:http://www.msi.umn.edu/~carrierp/coarrayFortran/index.htm Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
What is a general IDMFT loop implementation IDMFT_loop:Do -Define the diagonal from the Dyson equation. Matsubara=0 Matsubara=1 Matsubara=2 Matsubara=nmax -Solve: ... Fixed-point iteration sync all (MPI_barrier) ... -Density( ) -Impurity solvers: ... -Update the self-energy End do IDMFT_loop Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks
What is a general IDMFT loop implementation Mapping Fixed-point iteration full lattice local noninteracting self-energy Complex or real Matsubara frequencies Hopping + Chemical and trap potentials effective medium THE MAIN DIFFICULTY OF THE ALGORITHM IS TO FIND (nmax+1) SIMULTANEOUSLYSEVERAL DIAGONAL OF THE INVERSE OF THE LATTICE DYSON EQUATION (COMPLEX SYMMETRIC SPARSE MATRICES) Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks