150 likes | 257 Views
Future Requirements for NSF ATM Computing. Jim Kinter, co-chair Presentation to CCSM Advisory Board 9 January 2008. Charge to Committee.
E N D
Future Requirements for NSF ATM Computing Jim Kinter, co-chair Presentation to CCSM Advisory Board 9 January 2008
Charge to Committee Committee: Formed as sub-committee of AC-GEO to advise ATM on the future of computing; members selected by ATM from atmospheric sciences research community and supercomputing community Background: Recognizing that both advances in technology and the recent creation of the NSF Office of Cyberinfrastructure have opened up a wide range of opportunities for providing the computational services needed by atmospheric research, ATM wishes to survey the possible approaches. ATM seeks input from the atmospheric research community on how best to meet future needs, including how to minimize any potential disruption to individual research programs. Charge: The panel is charged with advising ATM on the merits of different possible strategies for ensuring access to adequate high-end computing services for the atmospheric sciences community over the next decade. In particular, the panel is asked to: • Review relevant materials describing the anticipated computational requirements of the atmospheric science research community; • Develop a list of different possible strategies for meeting the atmospheric sciences’ computing needs over the period 2011-2016; • Provide an analysis of the merits of the various strategies developed under (2), including a discussion of the costs and benefits; • Provide a recommendation to ATM about the best strategy to pursue. Report: Preliminary advice early Feb; report to AC-GEO in April 2008
THEN Top speed: 104 FLOPS In 18 months, my computer will have 2X transistors and be 2X faster Cold War drives computing industry Needed big building with lots of cheap power, cooling Apps: NWP, missile trajectories ATM computing done at NCAR (FFRDC) NOW Top speed: 1013 FLOPS (+109X) In 18 months my chips will have 2X transistors Video gaming and finance sector drive computing industry Need big building with lots of expensive power, cooling Apps: NWP, bomb design, circuit design, aerospace design, molecular dynamics, solar interior, human blood flow, earthquake modeling, N-body problem, QCD, … AR4: CCSM development at NCAR, production elsewhere (DOE, Earth Simulator) 40 Years of Supercomputing
THEN Top speed: 104 FLOPS In 18 months, my computer will have 2X transistors (Moore’s Law) and be 2X faster Cold War drives computing industry Needed big building with lots of cheap power, cooling Apps: NWP, missile trajectories ATM computing done at NCAR (FFRDC) NOW Top speed: 1013 FLOPS (+109X) In 18 months my chips will have 2X transistors (Moore’s Law) Video gaming and finance sector drive computing industry Need big building with lots of expensive power, cooling Apps: NWP, bomb design, circuit design, aerospace design, molecular dynamics, solar interior, human blood flow, earthquake modeling, N-body problem, QCD, … AR4: CCSM development at NCAR, production elsewhere (DOE, Earth Simulator) 40 Years of Supercomputing
THEN Top speed: 104 FLOPS In 18 months, my computer will have 2X transistors (Moore’s Law) and be 2X faster Cold War drives computing industry Needed big building with lots of cheap power, cooling Apps: NWP, missile trajectories ATM computing done at NCAR (FFRDC) NOW Top speed: 1013 FLOPS (+109X) In 18 months my chips will have 2X transistors (Moore’s Law) Entertainment and finance sectors drive computing industry Need big building with lots of expensive power, cooling Apps: NWP, bomb design, circuit design, aerospace design, molecular dynamics, solar interior, human blood flow, earthquake modeling, N-body problem, QCD, … AR4: CCSM development at NCAR, production elsewhere (DOE, Earth Simulator) 40 Years of Supercomputing
THEN Top speed: 104 FLOPS In 18 months, my computer will have 2X transistors (Moore’s Law) and be 2X faster Cold War drives computing industry Needed big building with lots of cheap power, cooling Apps: NWP, missile trajectories ATM computing done at NCAR (FFRDC) NOW Top speed: 1013 FLOPS (+109X) In 18 months my chips will have 2X transistors (Moore’s Law) Entertainment and finance sectors drive computing industry Need big building with lots of expensive power, cooling Apps: NWP, bomb design, circuit design, aerospace design, molecular dynamics, solar interior, human blood flow, earthquake modeling, N-body problem, QCD, … AR4: CCSM development at NCAR, production elsewhere (DOE, Earth Simulator) 40 Years of Supercomputing In between … client-server model and commodity clusters significantly reduced power/cooling requirement BUT … Near future: power cost = system cost
THEN Top speed: 104 FLOPS In 18 months, my computer will have 2X transistors (Moore’s Law) and be 2X faster Cold War drives computing industry Needed big building with lots of cheap power, cooling Apps: NWP, missile trajectories ATM computing done at NCAR (FFRDC) NOW Top speed: 1013 FLOPS (+109X) In 18 months my chips will have 2X transistors (Moore’s Law) Entertainment and finance sectors drive computing industry Need big building with lots of expensive power, cooling Apps: NWP, climate modeling, bomb design, circuit design, aerospace design, molecular dynamics, solar interior, human blood flow, earthquake modeling, N-body problem, QCD, … AR4: CCSM development at NCAR, production elsewhere (DOE, Earth Simulator) 40 Years of Supercomputing
THEN Top speed: 104 FLOPS In 18 months, my computer will have 2X transistors (Moore’s Law) and be 2X faster Cold War drives computing industry Needed big building with lots of cheap power, cooling Apps: NWP, missile trajectories ATM computing done at NCAR (FFRDC) NOW Top speed: 1013 FLOPS (+109X) In 18 months my chips will have 2X transistors (Moore’s Law) Entertainment and finance sectors drive computing industry Need big building with lots of expensive power, cooling Apps: NWP, climate modeling, bomb design, circuit design, aerospace design, molecular dynamics, solar interior, human blood flow, earthquake modeling, N-body problem, QCD, … AR4: CCSM development at NCAR-CSL, production elsewhere(DOE, Earth Simulator) 40 Years of Supercomputing
CSL Resources for CCSM • Process • CCSM Production gets special status • All other requests reviewed for computational appropriateness, readiness, criticality and relevance to climate simulation and own scientific goals • Overall “correction” if merit-based allocations are above/below available resources • CCSM Production • 6.5M CPU-hrs over Dec’07 - May’09 (~750 CPU-yrs) • Issues: • Flat sub-allocation of resources to CCSM working groups - why no priorities? • Insufficient interaction among WGs to coordinate numerical experiments • CCSM Development • 3.1M CPU-hrs over Dec’07 - May’09 (~350 CPU-yrs) • Issues: • Too little effort to move toward petascale computing • Worries about sub-critical human resources for algorithms, HEC etc. • Same concerns as expressed for Production request
AR5 Production Elsewhere … • DOE - NERSC and ORNL (~40K CPU-yrs/yr) • NASA - Columbia (~10K CPU-yrs/yr) • NOAA - GFDL??? • International - Earth Simulator? (ES-2???) • Industry - IBM? Cray?? SGI???
AR5 Production Elsewhere … • DOE - NERSC and ORNL (~40K CPU-yrs/yr) • NASA - Columbia (~10K CPU-yrs/yr) • International - Earth Simulator? (ES-2???) • Industry - IBM? Cray?? SGI??? • NSF - TeraGrid • 12K CPU-yrs/yr in 2007 • 80K CPU-yrs/yr in 2008 • 150K CPU-yrs/yr in 2009 • 250K CPU-yrs/yr in 2010 • 500K CPU-yrs/yr in 2011
CIS Other 4% PHY 1% ENG 13% 7% PHY AST Industry 8% DMR AST DMS 14% CHE BIO GEO 8% GEO Industry DMR ENG 7% CIS DMS Other 0% BIO CHE 24% 15% TeraGrid FY2007 Usage MPS 49% (incl. ATM)
TeraGrid HPC Usage by SiteFY2006.5 (4/01/06 through 3/31/07) ANL Purdue ORNL Indiana 1% 2% +80K CPU-yrs in 2009 (Cray Opteron 8-core) 0% 2% TACC 15% +69K CPU-yrs in 2008 (Sun Opteron 4-core) NCSA 37% +400K CPU-yrs in 2011 (IBM Power7 8-core) PSC 20% SDSC 23% FY2006.5 Total: ~110M CPU-hrs or ~12.5K CPU-yrs Dell Intel64 linux (9600) Dell PowerEdge linux (5840) IBM e1350 (3072) Cray XT3 (4136) IBM Blue Gene (6144 + 2048) IBM Power 4+ (2176) … NCSA: 24 ATM-related projects 13 universities 42 CPU-years NCAR: 400 ATM-related projects 100 universities 1.4K CPU-years