510 likes | 809 Views
Review of NCAR Al Kellie SCD Director November 01, 2001. Introduction to UCAR NCAR SCD Overview of divisional activities Research data sets (Worley) Mass Storage System (Harano) Extracting model performance (Hammond) Visualization & Earth System GRiD (Middleton) Computing RFP (ARCS).
E N D
Review of NCAR Al Kellie SCD Director November 01, 2001
Introduction to UCAR NCAR SCD Overview of divisional activities Research data sets (Worley) Mass Storage System (Harano) Extracting model performance (Hammond) Visualization & Earth System GRiD (Middleton) Computing RFP (ARCS) Outline of Presentation
INTRODUCTION Overview of three divisional aspects Computing RFP (ARCS) Outline of Presentation
University Corporation for Atmospheric Research Member Institutions Board of Trustees President Richard Anthes Corporate Affairs Jack Fellows, VP Finance & Administration Katy Schmoll, VP NCAR Tim Killeen, Director UCAR Programs Jack Fellows, Director Information Infrastructure Technology & Applications (IITA) Richard Chinman Cooperative Program for Optional Meteorology Education and Training (COMET) Constellation Observing System for Meteorology Ionosphere Climate (COSMIC) Digital Library for Earth System Science (DLESE) Atmospheric Chemistry Division (ACD) Atmospheric Technology Division (ATD) Advanced Study Program (ASP) Climate & Global Dynamics Diviion (CGD) Daniel McKenna David Carlson Al Cooper Maurice Blackmon Timothy Spangler Bill Kuo Mary Marlino Environmental & Societal Impacts Group (ESIG) High Altitude Observatory (HAO) Mesoscale & Microscale Meteorological Division (MMM) Scientific Computing Division (SCD) GPS Science and Technology Program (GST) Unidata Visiting Scientists Programs (VSP) Research Applications Programs (RAP) Joint Office for Science Support (JOSS) Robert Harriss Randolph Ware David Fulker Meg Austin Karyn Sawyer Robert Gall Brant Foote Michael Knölker Al Kellie Denotes President’s Office 12/07/98
Atmospheric Chemistry Dan McKenna Climate & Global Dynamics Maurice Blackmon Mesoscale & Microscale Meteorology Bob Gall High Altitude Observatory Michael Knolker Scientific Computing Al Kellie Atmospheric Technology Dave Carlson Research Applications Brant Foote NCAR Tim Killeen ESIG Bob Harriss ASP Al Cooper Associate Director Steve Dickson UCAR Rick Anthes ISS K. Kelly B&P R.Brasher UCAR Board of Trustees NCAR Organization
NCAR at a Glance • 41 years; 850 Staff – 135 Scientists • $128M budget for FY2001 • 9 divisions and programs • Research tools, facilities, and visitor programs for the NSF and university communities
1959 “Blue Book” Link “There are four compelling reasons for establishing a National Institute for Atmospheric Research” 2. The requirement for facilities and technological assistance beyond those that can properly be made available at individual universities Where did SCD come from?
SCD Mission Enable the best atmospheric & related research, no matter where the investigator is located through the provision of high performance computing technologies and related services
SCIENTIFIC COMPUTING DIVISION DIRECTOR’S OFFICE Al Kellie, Director (12) Data Support Roy Jenne (9) Computational Science Steve Hammond (8) High Performance Systems Gene Harano (13) Data Archives Data Catalogs User Assistance Supercomputer Systems Mass Storage Systems Algorithmic Software Development Model performance Research Science Collaboration Frameworks Standards & Benchmarking User Support Section Operations and Infrastructure Support Aaron Andersen (18) Ginger Caldwell (21) Networking Engineering & Telecommunications Marla Meehl (25) Training/Outreach/Consulting Digital Information Distributed Servers & Workstations Allocations & Account Management Operations Room Facility Management & Reporting Database Applications Site Licenses LAN MAN WAN Dial-up Access Network Infrastructure Visualization & Enabling Technologies Don Middleton (12) Data Access Data Analysis Visualization Base $24,874 Ucar $4,027 Outside $2,020 Overhead $1,063
Operates two distinct computational facilities. Climate simulations University community Governance of these SCD resources in the hands of the users - two external allocation committees. Computing leverages a common infrastructure for access, networking, data storage & analysis, research data sets, and support services including software development, and consulting. Computing Services for Research
Climate Simulation Laboratory (CSL) The CSL is a national, multi-agency, special-use, computing facility for climate system modeling for the U.S. Global Change Research Program (USGCRP). Priority projects that require very large amounts of computer time. CSL resources are available to U.S. individual researchers with a preference for research teams regardless of sponsorship. An inter-agency panel selects the projects that use the CSL.
Community Facility The Community Facility is used primarily by university-based NSF grantees and NCAR Scientists. Community resources are allocated evenly between NCAR and the university community. NCAR resources are allocated by the NCAR Director to the various NCAR divisions. University resources are allocated by the SCD Advisory Panel. Open to areas of atmospheric and related sciences.
History of Supercomputing at NCAR IBM SP/1308 IBM SP/604 IBM SP/296 IBM SP/64 Compaq ES40/36 Cluster IBM SP/32 Beowulf/16 SGI Origin2000/128 Cray J90se/24 HP SPP-2000/64 Cray T3D/128 Cray J90se/24 Cray Y-MP/8I Cray J90/20 Cray J90/16 Cray T3D/64 Cray C90/16 CCC Cray 3/4 IBM SP1/8 TMC CM5/32 IBM RS/6000 Cluster Cray Y-MP/8 Cray X-MP/4 TMC CM2/8192 Cray 1-A S/N 14 Cray Y-MP/2 Production Machines Cray 1-A S/N 3 Non-Production Machines CDC 7600 Currently in Production CDC 6600 CDC 3600 1960 1970 1980 1990 1995 1999 2000 2001
STK 9940 #5 #4 2001
OC3 (155Mbps) to the Front Range GigaPop - OC12 (622Mbps) on 1/1/2002 OC3 to AT&T Commodity Internet OC3 to C&W Commodity Internet OC3 to Abilene (OC12 on 1/1/2002) OC3 to the vBNS+ OC12 (622Mbps) to University of Colorado at Boulder intra-site research and back-up link to FRGP OC12 to NOAA/NIST in Boulder Intra-site research and UUNET Commodity Internet Dark fiber metropolitan area network at GigE (1000Mbps) to other NCAR campus sites NCAR Wide Area Connectivity
TeraGrid Wide Area Network StarLight International Optical Peering Point (see www.startap.net) Abilene Chicago DTF Backbone Indianapolis Urbana * DENVER Los Angeles Starlight / NW Univ UIC San Diego I-WIRE Multiple Carrier Hubs Ill Inst of Tech ANL OC-48 (2.5 Gb/s, Abilene) Univ of Chicago Indianapolis (Abilene NOC) Multiple 10 GbE (Qwest) Multiple 10 GbE (I-WIRE Dark Fiber) NCSA/UIUC • Solid lines in place and/or available by October 2001 • Dashed I-WIRE lines planned for summer 2002
ARCS Synopsis Credit: Tom Engel
BEST VALUE PROCUREMENT Technical evaluation Delivery schedule Production disruption Allocation ready state Infrastructure Maintenance Cost impact – i.e. existing equipment Past performance of bidders Business proposal review Other considerations - invitation to partner ARCS RFP Overview
Production-level Availability, robust batch capacity, operational sustainability and support Integrated software engineering and development environment High performance execution of existing applications Additionally – environment conducive to development of next-generation models ARCS Procurement
Jobs using > 32 nodes 0.4 % of workload Average 44 nodes or 176 pes Jobs using < 32 nodes 99.6 % of workload Average 6 nodes or 24 pes Workload profile context
A production-level, high-performance computing system providing for both capability and capacity computing A stable and upwardly compatible system architecture, user environment, and software engineering & development environments ARCS – The Goal • Initial equipment:At least double current capacity at NCAR • Long Term:Achieve 1 TFLOPs sustained by 2005
SCD began technical requirements draft Feb 2000 RFP process (including scientific reps from NCAR divisions, UCAR Contracts, & external review panel) formally began Mar 2000; RFP released Nov 2000 Offeror proposal reviews, BAFOs, & Supplemental proposals Jan-May 2001 Technical Evaluations, Performance projections, Risk Assessment, etc. Feb-Jun 2001 SCD Recommendation for Negotiations 21 Jun; NCAR/ UCAR acceptance of recommendation 25 Jun Negotiations 24-26 Jul; tech. Ts&Cs completed 14 Aug Contract submitted to the NSF 01 Oct NSF Approval 5 Oct … Joint Press Releaseweek SC01 ARCS – The Process
Hardware (processors, nodes, memory, disk, interconnect, network, HIPPI) Software (OS, user environment, filesystems, batch subsystem) System admin., resource mgmt., user limits, accounting, network/HIPPI, security Documentation & training System maintenance & support services Facilities (power, cooling, space) ARCS RFP Technical Attributes
Critical Resource ratios: Disk 6 Bytes/peak-FLOP: 64+ MB/sec single-stream & 2+ GB/sec bandwidth - sustainable Memory 0.4 Bytes/peak-FLOP “Full-featured” product set (cluster-aware compilers, debuggers, performance tools, administrative tools, monitoring) Hardware & Software stability Hardware & Software vendor support & responsiveness (on-site, call center, development organization, escalation procedures) Resource allocation (processor(s), node(s), memory, disk; user limits & disk quotas) Batch Subsystem and NCAR job scheduler (BPS) Major Requirements
Kernels (Hammond Harkness, Loft) Single Processor (COPY, IA, XPOSE, SHAL, RADABS, ELEFUNT, STREAMC) Multi-processor shared memory (PSTREAM) Message-Passing Performance (XPAIR, BISECT, XGLOB, COMMS[1,2,3], STRIDED[1,2], SYNCH, ALLGATHER) Parallel Shared Memory Applications CCM3.10.16 (T42 30-days & T170 1-day) – CGD, Rosinski WRF Prototype (b_wave 5-days) - MMM, Michalakes ARCS – Benchmarks (1) more >
Parallel (MPI & Hybrid) models CCM3.10.16 (T42 30-day & T170 1-day – CGD, Rosinski MM5 3.3 (t3a 6-hr & “large” 1-hr) – MMM, Michalakes POP 1.0 (medium & large) – CGD, Craig MHD3D (medium & large) – HAO, Fox MOZART2 (medium & large) – ACD, Walters PCM 1.2 (T42) – CGD, Craig WRF Prototype (b_wave 5-day) – MMM, Michalakes System Tests HIPPI – SCD, Merrill I/O-tester – SCD, Anderson Network – SCD, Mitchell Batch Workloadincludes: 2 I/O-tester, 4 Hybrid MM5 3.3 large, 2 Hybrid MM5 3.3 t3a, 2 POP 1.0 medium & large, ccm3.10.16 T170, MOZART2 medium, PCM 1.2 T42, 2 MHD3D medium & large, WRF Prototype – SCD, Engel ARCS – Benchmarks (2) < return
Vendor ability to meet commitments Hardware (processor architecture, clock speed boosts, memory architecture) Software (OS, filesystems, processor-aware compilers/libraries, tools [3rd party]) Service, Support, Responsiveness Vendor stability (product set, financial) Vendor promises vs. reality Risks
Hardware & Software SCD/NCAR experience Other customers’ experience “Missed Promises” Vendor X ~ 2 yr slip, product line changes Vendor Y ~ on target Vendor Z ~ 1.5 yr slip, product line changes Past Performance
“Blue Light” project invitation to develop of models for an exploratory supercomputer Invitation to a partnership development. Offer for an industrial partnership 256 Tflops peak, 8TB mem, 200TB disk on 64k nodes. True MPP with Torus interconnect. Node-64 Gflops, 128 MB mem, 32 kB L1 cache, 4MB L2 cache Columbia, LLNL, SDSC, Oak Ridge Other Considerations
IBM was chosen to supply the NCAR Advanced Research Computing System (ARCS) … … will exceed the articulated purpose and goals A world-class system to provide reliable production supercomputing to the NCAR Community and Climate Simulation Laboratory A phased introduction of new, state-of-the-art computational, storage and communications technologies through the life of the contract (3-5 years) First equipment delivered Friday, 5 October ARCS Award
ARCS Capacities Minimum + Negotiated capability commitments may require installation of additional capacity.
Minimum Model Capability Commitments blackforest upgrade 1.0x (defines ‘x’) bluesky 3.1x bluesky upgrade 4.6x Failure to meet these commitments will result in IBM installing additional computational capacity Improved user environment functionality, support and problem resolution response Early access to new hardware & software technologies NCAR’s participation in IBM’s “Blue Light” exploratory supercomputer project (PFLOPs) ARCS Commitments
Proposed Equipment - IBM †Federation switch (2400 MB/s, 4 usec) option 2H03