1 / 16

SciDAC Software Infrastructure for Lattice Gauge Theory

This review outlines the software infrastructure for lattice gauge theory, focusing on API design, completed components, performance, and future development. Major participants and fundamental software components are discussed, along with optimizations for Dirac solvers and inverters. Performance metrics and future optimization plans are detailed, demonstrating the ongoing efforts to enhance computational efficiency and advance algorithm research in lattice QCD.

Download Presentation

SciDAC Software Infrastructure for Lattice Gauge Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SciDAC Software Infrastructurefor Lattice Gauge Theory Richard C. Brower QCD Project Review May 24-25, 2005 Code distribution see http://www.usqcd.org/usqcd-software

  2. Major Participants in Software Project UK Peter Boyle Balint Joo * Software Coordinating Committee

  3. OUTLINE • QCD APIDesign • Completed Components • Performance • Future

  4. Lattice QCD – extremely uniform • (Perfect) Load Balancing: Uniform periodic lattices & identical sublattices per processor. • (Complete) Latency hiding: overlap computation /communications • Data Parallel:operations on Small 3x3 Complex Matrices per link. • Critical kernel : Dirac Solver is ~90% of Flops. Lattice Dirac operator:

  5. Optimised for P4 and QCDOC Optimized Dirac Operators, Inverters ILDG collab Level 3 Exists in C/C++ C/C++, implemented over MPI, native QCDOC, M-via GigE mesh SciDAC QCD API QDP (QCD Data Parallel) QIO Binary DataFiles / XML Metadata Level 2 Lattice Wide Operations, Data shifts QLA (QCD Linear Algebra) Level 1 QMP (QCD Message Passing)

  6. Fundamental Software Components are in Place: • QMP (Message Passing) version 2.0 GigE, in native QCDOC & over MPI. • QLA (Linear Algebra) • C routines (24,000 generated by Perl with test code). • Optimized (subset) for P4 Clusters in SSE assembly. • QCDOC optimized using BAGEL -- assembly generator for PowerPC, IBM power, UltraSparc, Alpha… by P. Boyle UK • QDP (Data Parallel) API • C version for MILC applications code • C++ version for new code base Chroma • I/0 and ILDG: Read/Write for XML/binary files compliant with ILDG • Level 3 (Optimized kernels) Dirac inverters for QCDOC and P4 clusters

  7. Example of QDP++ Expression Typical for Dirac Operator: • QDP/C++ code: • Portable Expression Template Engine (PETE) Temporaries eliminated, expressions optimized

  8. Performance of Optimized Dirac Inverters • QCDOC Level 3 inverters on 1024 processor racks • Level 2 on Infiniband P4 clusters at FNAL • Level 3 GigE P4 at JLab and Myrinet P4 at FNAL • Future Optimization on BlueGene/L et al.

  9. Level 3 on QCDOC

  10. Level 3 Aqstad QCDOC (double precision)

  11. Level 2 Asqtad on single P4 node

  12. Asqtad on Infiniband Cluster

  13. Level 3 Domain Wall CG Inverter y (32 nodes) 42 6 16 (4) 52 20 (6)102 40 6 163 y Ls = 16, 4g is P4 2.8MHz, 800MHz FSB

  14. Alpha Testers on the 1K QCDOC Racks • MILC C code: Level 3 Asqtad HMD: 403 * 96 lattice for 2 + 1 flavors • Chroma C++ code: Level 3 domain wall HMC: 243 * 32 (Ls = 8) for 2+1 flavors • I/O: Reading and writing lattices to RAID and host AIX box • File transfers: 1 hop file transfers between BNL, FNAL & JLab

  15. BlueGene/L Assessment • June ’04: 1024 node BG/L Wilson HMC sustain 1 Teraflop (peak 5.6) Bhanot, Chen, Gara, Sexton & Vranas hep-lat/0409042 • Feb ’05: QDP++/Chroma running on UK BG/L Machine • June ’05: BU and MIT each acquire 1024 Node BG/L • ’05-’06 Software Development: Optimize Linear Algebra (BAGEL assembler from P. Boyle) at UK Optimize Inverter (Level 3) at MIT Optimize QMP at BU/JLab

  16. Future Software Development • On going: Optimization, Testing and Hardening of components. New Level 3 and application routines. Executables for BG/L, X1E, XT3, Power 3, PNNL cluster, APEmille… • Integration: Uniform Interfaces, Uniform Execution Environment for 3 Labs. Middleware for data sharing with ILDG. • Management: Data, Code, Job control as 3 Lab Metacenter (leverage PPDG) • Algorithm Research: Increased performance (leverage TOPS& NSF IRT). New physics (Scattering, resonances, SUSY, Fermion signs, etc?)

More Related