300 likes | 429 Views
ACES III and SIAL: technologies for petascale computing in chemistry and materials physics. Erik Deumens, Victor Lotrich, Mark Ponton, Tomasz Kus, Norbert Flocke, Ajith Perera, Rod Bartlett AcesQC, LLC QTP, University of Florida Gainesville, Florida. Outline of the talk. Performance results
E N D
ACES III and SIAL: technologies for petascale computing in chemistry and materials physics Erik Deumens, Victor Lotrich, Mark Ponton, Tomasz Kus, Norbert Flocke, Ajith Perera, Rod Bartlett AcesQC, LLC QTP, University of Florida Gainesville, Florida ACES III and SIAL
Outline of the talk • Performance results • What can be done today? • Design of petascale capable software • How does SIAL work? • What makes it different? • Outlook ACES III and SIAL
ACES III software • Developed under CHSSI CBD-03 • Parallel for shared and distributed memory • Capabilities • Hartree-Fock (RHF, UHF) • MBPT(2) energy, gradient, hessian • CCSD(T) energy and gradient (DROPMO) • EOM-CC excited state energies ACES III and SIAL
Luciferin(C11H8O3S2N2) RHF C1 symmetry Basis = aug-cc-pvdz (494 bf) Ncorrocc = 46 Sucrose (C12H22O11) RHF C1 symmetry Basis = 6-311G** (546 bf) =91 Two examples ACES III and SIAL
Luciferin CCSD(T) • CCSD on 128 processors • One iteration: 23 min • Total 12 iterations: 275 min • (T) • Hardest 8 occupied orbitals: 420 min on 128 processors • Total 48 correlated orbitals: 420 min on 768 processors ACES III and SIAL
Luciferin CCSD scalingmin per iter; 12 iterations; two versions; ACES III and SIAL
Sucrose CCSD scalingmin per iter, 8 iterations, on Cray XT4 ACES III and SIAL
(H2O)21H+ scalingmin per iter; 657 bf 84 corr occ ACES III and SIAL
Outline of the talk • Performance results • What can be done today? • Design of petascale capable software • How does SIAL work? • What makes it different? • Outlook ACES III and SIAL
A computer with a single CPU • Basic data item: 64 bit number • High level language: Fortran, C • c = a + b • Assembly language • ADD dest,src • ADD is an operation code • dest and src are registers ACES III and SIAL
The ACES III parallel machine • Basic data item: data block 10,000 64 bit numbers -> super number • High level language: being developed • Assembly language: SIAL super instruction assembly language • R(I,J,K,L) += V(I,J,C,D) * T(C,D,K,L) • xaces3-> super instruction processor ACES III and SIAL
User level execution flow algo.sio input algo.sial ACES III SIAL compiler ACES III and SIAL
Coarse grain parallelism • Executing super instructions in SIAL algorithm • Example: memory super instruction • GET block • Can be from • Local node RAM • Other node RAM • Time for data to become available differs ACES III and SIAL
Fine grain parallelism • Inside super instructions • Example: Compute super instruction • * (contractions) • compute_integrals • Can use multiple cores • Can use accelerators • GPGPUs and Cell processors • FPGAs (field programmable gate arrays) ACES III and SIAL
Worker i GET a -> ask j … d=b*c … wait for a? a arrives <- e=a*d … Worker j … <- send a … … … … … Super instruction flow ACES III and SIAL
Super instruction performance • Super instructions are asynchronous • Makes execution very elastic • Helps maintain consistent performance on many parallel architectures ACES III and SIAL
Distributed data • N worker tasks, each with local RAM • Data distributed in RAM of workers • AO-based: direct use of integrals • MO-based: use transformed integrals • Array blocks are spread over all workers ACES III and SIAL
Served (disk resident) data • M server tasks • have access to local or global disk storage • accept, store and retrieve blocks • also can compute integrals when asked • Data served to and from disk ACES III and SIAL
ACESIII design High level Problem Performance Low level concepts communication Data structures algorithms Input/output Super instruction Assembly language SIAL Super instruction Processor SIP (xaces3) input output ACES III and SIAL
Outline of the talk • Performance results • What can be done today? • Design of petascale capable software • How does SIAL work? • What makes it different? • Outlook ACES III and SIAL
Clear divisions • Extreme object oriented approach • High level = problem domain specific • Concepts • Data structures • Algorithms • Low level = focus on performance • Processor and memory speed • Communication latency and bandwidth ACES III and SIAL
Super Instruction Coding • Write algorithm in high level super instruction assembly language • Declare (block) arrays, (block) indices • DO - END DO construct • PARDO – END PARDO construct • Basic operations: add and multiply and contract • SIP_BARRIER • Each line maps to a few super instructions ACES III and SIAL
Optimize and Tune • Optimize with traditional techniques • optimize the basic contraction operations by mapping them to DGEMM calls • create fast code to generate integrals • optimize memory allocation by using multiple block stacks • optimize execution and data movement ACES III and SIAL
Programmer productivity: Other • Other tools for parallel development • UPC (Universal Parallel C) • CAF (Co-Array Fortran) • GA (Global Array Tools) • DDI (Distributed Data Interface) • Simple syntax • Specify precise data layout • PGAS partitioned global address space • Rigorous array blocking ACES III and SIAL
Programmer productivity: SIAL • SIAL has simple syntax • Experience shows it is more expressive • Exact data layout is done by SIP • Allows runtime tuning and optimization • SIAL has rich set of data structures • Distributed array • Served array • Temporary array • Local array ACES III and SIAL
Outline of the talk • Performance results • What can be done today? • Design of petascale capable software • How does SIAL work? • What makes it different? • Outlook ACES III and SIAL
New SIAL developer tools coming • Develop higher level programming language • Programmer support • Eclipse as IDE (integrated development environment) for SIAL coding • Understands SIAL syntax • Code refactoring tools • Rewrite code • Help improve performance ACES III and SIAL
New algorithms being explored • SIAL: Data staging • Huge served array • Copy section in distributed array • Work efficiently on distributed array • Similar to BLAS-3 management of cache • ACES III: Linear scaling • Localized orbitals ACES III and SIAL
New domains being explored • Need • A domain specialist, or a few of them • Willingness and expertise to explore alternative algorithms • Apply “super instruction” design pattern • Find “super number”, the basic data item in the domain • “Super instructions” then follow ACES III and SIAL
Towards petascale computing • ACES III • Ready for real work • Has run on 8,192 processors • SIAL • Useful in electronic structure • Can be used in other domains ACES III and SIAL