150 likes | 322 Views
Vision for OSC Computing and Computational Sciences. Thomas Zacharia Associate Laboratory Director Computing and Computational Sciences Oak Ridge National Laboratory. http://www.ccs.ornl.gov/. Earth Simulator Rapid Response Meeting May 15-16, 2002. Charge from Dr. Orbach.
E N D
Vision for OSC Computing and Computational Sciences Thomas Zacharia Associate Laboratory Director Computing and Computational Sciences Oak Ridge National Laboratory http://www.ccs.ornl.gov/ Earth Simulator Rapid Response Meeting May 15-16, 2002
Charge from Dr. Orbach • Review “. . . current state of the national computer vendor community relative to high performance computing” • “. . . Vision for what realistically should be accomplished in the next five years within the Office of Science in high performance computing”
Dr. Orbach’s Vision for OSC ComputingStatement to ASCAC Committee, May 8, 2002 • “… there is a centrality of computation in everything that we do” • “… large scale computation is the future of every program in the Office of Science” • “… we want to have our own computing program in non-defense computational science” Astrophysics Biology Computer Scientists Applied Mathematics COMPUTING INFRASTRUCTURE Theoretical and Computational Scientists Materials Climate Chemistry Fusion
FY 03 Budget Request for OSC Computing Considerably Lower than Required to Meet Goals 900,000,000 800,000,000 700,000,000 600,000,000 500,000,000 Budget Dollars 400,000,000 300,000,000 200,000,000 100,000,000 0 DOE-SC NNSA NSF Fiscal Year
As Fraction of Total Budget, OSC is Half NNSA and NSF and Needs Significant Increase to Meet Goals 12 10 8 6 Computing Budget / Total Budget (%) 4 2 0 DOE-SC NNSA NSF
Critical Steps: Invest in critical software with integrated science, and computer science development teams Deploy scientific computing hardware infrastructure in support of “large scale computation” Cray, HP, IBM, SGI IBM is the largest US installation Develop new initiative to support advanced architecture research Earth Simulator has Heightened Urgency for Infrastructure Strategy for Scientific Computing Top 500 Supercomputers US has been #1 in 12 of 19 lists A concerted effort will be required to regain US leadership in high performance computing. The LINPACK benchmark generally overestimates the effectiveness of an architecture for applications such as climate by a substantial factor. Stability and reliability are also important system properties.
Scientific Applications Climate Simulation Computational Chemistry Fusion – 5 Topics High Energy Nuclear Physics – 5 Topics Collaboratories Four Projects Middleware & Network Research Six Projects Computer Science Scalable Systems Software Common Component Architecture Performance Science and Engineering Scientific Data Management Applied Mathematics PDE Linear/Nonlinear Solvers and Libraries Structured Grids/AMR Unstructured Grids Invest in Critical Software with Integrated Science and Computer Science Development Teams SciDAC: a Good Start Towards Scientific Computing Software Dave Bader, SciDAC PI Meeting, Jan 15, 2002, Washington DC
Deploy Scientific Computing Hardware Infrastructure to Support “Large-Scale Computation” • Provide most effective and efficient computing resourcesfor a set of scientific applications • Serve as focal point for scientific research community as it adapts to new computing technologies • Provide organizational framework needed for multidisciplinary activities • Addressing software challenges requires strong, long term collaborations among disciplinary computational scientists, computer scientists, and applied mathematicians • Provide organizational framework needed for development of community codes • Implementing many scientific codes requires wide range of disciplinary expertise • Organizational needs will continue to grow as computers advance to petaflops scale Dave Bader, SciDAC PI Meeting, Jan. 15, 2002, Washington, DC
Earth Simulator has Widened Gap with DOE Scientific Computing Hardware Infrastructure 7,000 7,000 Technology Gap 5,000 5,000 Simulations years/day Simulations years/day Widening Gap 3,000 3,000 1,000 1,000 Earth Simulator POWER4 H+ (40TFlops) Power5 (50 TFlops) Earth Simulator SEABORG CHEETAH 10,000 • Top left: comparison between ES and SC resources – highlights widening gap between SC capabilities and others • Top right: comparison between ES and US resources of comparable peak performance – highlights architectural difference and need for new initiative to close the gap • Right: comparison between ES and US resources of comparable cost 8,000 6,000 Simulations years/day 4,000 2,000 0 Earth Simulator POWER4 H+ (3*40TFlops) Power5 (3*50 TFlops)
40 TFlops Peak 5120 Vector Processors 8 GFlops Processor 8 Processors per Node $500 M Procurement $50M/yr Maintenance Limited Software Investment to date Significant Ancillary Impact on Biology, Nanoscience, Astrophysics, HENP, Fusion 40 TFlops Peak 5120 Power5 Processors 8 GFlops Processor 64 Processors per Node $100 M Procurement $10M/yr Maintenance SciDAC Investment in Computational Science and related ISICs Significant Ancillary Impact on Biology, Nanoscience, Astrophysics, HENP, Fusion Possible U.S. Response in the Near Term for Increased Computing Capacity Earth Simulator US Alternative
Best Performance of High Resolution Atmospheric Model Performance of Hi-Resolution Atmospheric Model 105 104 Earth Simulator (2560) 103 GFlops AlphaES45 (2048) 102 AlphaES40 (256) SP3 WHII (512) T3E (512) 101 102 103 104 105 Inter-node bandwidth (Mb/s)
Develop New Initiative to Support Advanced Architecture: BlueGene Offers Possible Option 1000000 C/L/D ASCI Beowulfs 100000 T3E COTS JPL ASCI Blue Dollars/GFlops ASCI White ASCI Compaq 10000 QCDSP QCDOC Columbia 1000 Columbia/IBM Blue Gene/L 100 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Year
BlueGene Architecture is a (more) General Purpose Machine that builds on QCDOC • QCDSP (600GF based on Texas Instruments DSP C31) • Gordon Bell Prize for Most Cost Effective Supercomputer in '98 • Columbia University Designed and Built • Optimized for Quantum Chromodynamics (QCD) • 12,000 50MF Processors • Commodity 2MB DRAM • QCDOC (20TF based on IBM System-on-a-Chip) • Collaboration between Columbia University and IBM Research • Optimized for QCD • IBM 7SF Technology (ASIC Foundry Technology) • 20,000 1GF processors (nominal) • 4MB Embedded DRAM + External Commodity DDR/SDR SDRAM • BlueGene L/D (180TF based on IBM System-on-a-Chip) • Designed by IBM Research in IBM CMOS 8SF Technology • 64,000 2.8GF processors (nominal) • 4MB Embedded DRAM + External Commodity DDR SDRAM
Top View of system Host Computer 50 Feet System Organization (conceptual) • Host System: • Diagnostics, booting, archive • Application dependent requirements • File Server Array • ~ 500 RAID PC servers • Gb Ethernet and/or Infiniband • Application dependent requirements • BlueGene/L Processing Nodes • 81920 Nodes • Two major partitions • 65536 nodes production • Platform (256 TFlops peak) • 16384 nodes partitioned into code development platforms
Summary • Continue investment in critical software with integrated science, and computer science development teams • Deploy scientific computing hardware infrastructure in support of “large scale computation” • Develop new initiative to support advanced architecture research • Develop a bold new facilities strategy for OSC computing • Increase OSC computing budget to support outlined strategy Without sustained commitment to scientific computing, key computing and computational sciences capabilities, including personnel, will erode beyond recovery.