1 / 23

A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory. Pierre Carrier University of Minnesota. Yousef Saad University of Minnesota. James K. Freericks Georgetown University. Inhomogeneous-DMFT Low-Rank Lanczos for finding diagonal of inverse

binta
Download Presentation

A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A New Lanczos-Based Low Rank Algorithm for Inhomogeneous Dynamical Mean-Field Theory Pierre Carrier University of Minnesota Yousef Saad University of Minnesota James K. Freericks Georgetown University • Inhomogeneous-DMFT • Low-Rank Lanczos for finding diagonal of inverse • Implementation using Co-Array Fortran=fortran2008 • Towards 3D petascale computations (CPU time vs memory) NISE-2011 Slides available at http://www.msi.umn.edu/~carrierp/images/APS2012.pdf Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  2. Inhomogeneous-DMFT Diagonal of inverse Dyson equation ... ... Mapping fixed-point iteration -self-energies -hopping matrix -density -potentials -... effective medium The Green’s functions are complex-symmetric matrices and, in general, non-diagonally dominant W. Metzner & D. Vollhardt, Phys. Rev. Lett. 62, 324 (1989) M.-T. Tran, Phys. Rev. B 73, 205110 (2006) Transport in multilayered nanostructures, the dynamical mean-field theory (Imperial college Press, 2006) Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  3. Low-Rank Lanczos for finding diagonal of inverse -Standard Lanczos for Hermitian matrices: ( )( T )( ) G = leads to G-1 -Lanczos for complex-symmetric matrices: R. W. Freund, SIAM J. Sci. Stat. Comput. 13, 425 (1992): Use indefinite dot-product into standard Lanczos ( )( T )( ) Need to deal with possible “breakdowns”... G = [See also Proc. of the HPCMP Users Group Conference, page 223 (2010)] Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  4. Low-Rank Lanczos for finding diagonal of inverse -Standard Lanczos for Hermitian matrices: ( )( T )( ) G = leads to G-1 -Lanczos for complex-symmetric matrices: R. W. Freund, SIAM J. Sci. Stat. Comput. 13, 425 (1992): Use indefinite dot-product into standard Lanczos ( )( T )( ) Need to deal with possible “breakdowns”... G = [See also other approach shown in Proc. of the HPCMP Users Group Conference, page 223 (2010)] Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  5. Low-Rank Lanczos for finding diagonal of inverse Standard Lanczos is prohibitive when N is large ( N ~100X100X100 sites) ( )( T )( ) G = Large, dense + requires re-orthogonalization Alternative: reduce the number of Lanczos steps and keep the solution exact use a low-rank matrix Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  6. Low-Rank Lanczos for finding diagonal of inverse 9 interface sites 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1234567890123456789012345 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1 1 1 Gij= 1 1 1 1 Example in 2D: decomposition into 4 domains gives block diagonal matrices 1 1 1 2 9 interface sites 2 2 2 2 2 Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  7. Low-Rank Lanczos for finding diagonal of inverse ( ) G= interface sites ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  8. Low-Rank Lanczos for finding diagonal of inverse ( ) G= interface sites -1 ( ) number of interface sites = Schur complement’s rank - = ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  9. Low-Rank Lanczos for finding diagonal of inverse ( ) G= interface sites -1 ( ) number of interface sites = Schur complement’s rank - = -1 ( ) - [G]-1 = DomainDecomp( ) Domain decomposition Rather complicated expression ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  10. Low-Rank Lanczos for finding diagonal of inverse ( ) G= interface sites -1 ( ) number of interface sites = Schur complement’s rank - = -1 ( ) - [G]-1 = DomainDecomp( ) Domain decomposition swap -1 ( ) = X=[G]-1 - ( )( )( ) Lanczos on interface sites Lanczos-based low-rank ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  11. Low-Rank Lanczos for finding diagonal of inverse ( ) G= interface sites -1 ( ) number of interface sites = Schur complement’s rank - = -1 ( ) - [G]-1 = DomainDecomp( ) Domain decomposition swap -1 ( ) = X=[G]-1 - ( )( )( ) Lanczos on interface sites SAME LOW-RANK AS OR Lanczos-based low-rank ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  12. Low-Rank Lanczos for finding diagonal of inverse ( ) G= interface sites -1 ( ) number of interface sites = Schur complement’s rank - = -1 ( ) - [G]-1 = DomainDecomp( ) Domain decomposition -1 ( ) = X=[G]-1 - T ( )( )( ) Lanczos on interface sites Lanczos-based low-rank This is the low-rank matrix ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  13. -1 ( ) • Implementation using Co-Array Fortran=fortran2008 -1 ( ) = X=[G]-1 - T ( )( )( ) Lanczos on interface sites Lanczos-based low-rank (rank of Schur complement) S << N (rank of full G matrix) X AT EACH LANCZOS STEP Parallel complex-GMRES on Gu = qj Complex-GMRES per each block B ALSO USED AS PRECONDITIONER - -1 ( ) ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  14. Implementation using Co-Array Fortran=fortran2008 • IDMFT SCF loop; Falicov-Kimball; 3D • Distributed sparse matrices (CSR format) • Includes parallel complex- GMRES, Block-GMRES and Lanczos • Written entirely in CAF- fortran 2008: -CAF is available on any Cray machines using PrgEnv-cray “ftn -hcaf <routine.f08>” -“allocate( GreensFunction(subarray)[domain, 0:Matsubara]” -CO_SUM = MPI_Allreduce(..., SUM, ...) Example: “Greens(34)[1,0] = Greens(56)[3,62]” Matsubara=0 Matsubara=1 Matsubara=2 Matsubara=62 Greens(34) domain=1 domain=2 ... Greens(56) domain=3 domain=4 domain=5 (Schur) ALL DETAILS: http://www.csi.cuny.edu/cunyhpc/workshops2011/september_2011/workshops.php Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  15. Implementation using Co-Array Fortran=fortran2008 Freund’s Lanczos breaking down behavior: Freunds’ algorithm works well, ~20% more iterations can be required before convergence Hopper Diagonally dominant matrices require relatively more additional iterations; Null space is more often visited Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  16. Towards 3D petascale computations (CPU time vs memory) Time Scaling is equivalent to that of domain decomposition Goal for 3D: More domains and larger systems Hopper Tested in the past in 2D, sequential, F90, see: http://www.sciencedirect.com/science/article/pii/S1875389211003257 Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  17. Towards 3D petascale computations (CPU time vs memory) MEMORY conflict with the low-rank Lanczos algorithm: -Small number of Lanczos iteration -Large memory per block Memory too large due to the size of each blocks -Large number of Lanczos iteration -Small memory per block Memory too large due to the size of Lanczos re-orthogonalization Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  18. Towards 3D petascale computations (CPU time vs memory) The cusp appears at (331) in 2D case Hopper optimum subdomain size for low-rank Lanczos Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  19. 100000 • Towards 3D petascale computations (CPU time vs memory) Using 2X2X2 Lanczos domains only Maximum on Cray XE6 (1333MB/processor) 41X41X41 sites max ~4800 Lanczos steps 31X31X31 TOTAL MEMORY/proc. number of Lanczos steps Initialization MEMORY/proc. ~2883 ~1875 ~1323 Indexing MEMORY/proc. ~867 ~675 ~507 ~363 Hopper Number of sites Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  20. Towards 3D petascale computations (CPU time vs memory) 160X160X160= 4,173,281 sites 81X81X81=531,441 sites ~77,763 Lanczos steps ~19,683 Lanczos steps 2(4)X2(4)X2(4) = 512 Lanczos (subdomains) 2(2)X2(2)X2(2) = 64 Lanczos (subdomains) -other patterns of interface -out-of-core Lanczos vectors allocate( GreensFunction(subarray)[subdomains, Lanczos,Matsubara]) SLIDES available at http://www.msi.umn.edu/~carrierp/images/APS2012.pdf CODES available at:http://www.msi.umn.edu/~carrierp/coarrayFortran/index.htm Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  21. Thank you • Bob Numrich (CUNY) • Jok Tang (Vortech) • Woo-Sun Yang (NERSC) • Haw-ren Fang (U Minnesota) SLIDES available at http://www.msi.umn.edu/~carrierp/images/APS2012.pdf CODES available at:http://www.msi.umn.edu/~carrierp/coarrayFortran/index.htm Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  22. What is a general IDMFT loop implementation IDMFT_loop:Do -Define the diagonal from the Dyson equation. Matsubara=0 Matsubara=1 Matsubara=2 Matsubara=nmax -Solve: ... Fixed-point iteration sync all (MPI_barrier) ... -Density( ) -Impurity solvers: ... -Update the self-energy End do IDMFT_loop Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

  23. What is a general IDMFT loop implementation Mapping Fixed-point iteration full lattice local noninteracting self-energy Complex or real Matsubara frequencies Hopping + Chemical and trap potentials effective medium THE MAIN DIFFICULTY OF THE ALGORITHM IS TO FIND (nmax+1) SIMULTANEOUSLYSEVERAL DIAGONAL OF THE INVERSE OF THE LATTICE DYSON EQUATION (COMPLEX SYMMETRIC SPARSE MATRICES) Y24.00001: APS March Meeting 2012, Pierre Carrier, Yousef Saad, James K. Freericks

More Related