200 likes | 217 Views
NERSC and Blue Planet. William T.C. Kramer NERSC/LBNL May 28, 2003 NERSC User Group Meeting, Chicago IL. What is Blue Planet.
E N D
NERSC and Blue Planet William T.C. Kramer NERSC/LBNL May 28, 2003 NERSC User Group Meeting, Chicago IL
What is Blue Planet • “Blue Planet” is a “science driven” design process to develop systems that are simultaneously more effective for science and sustainable and cost effective for vendors. • White Paper uses IBM as an example of what can be done with this process • Can be applied to a number of vendors • Blue Planet is a new concept for a sustainable computer architecture more effective for science and engineering applications • A specific implementation leveraging the IBM roadmap that better balances scientific processing needs and the commercial viability • Described as a “ultrascale” scale system on the order of the Earth Simulator http://www.nersc.gov/news/blueplanet.html and http://www.nersc.gov/news/ArchDevProposal.5.01.pdf NERSC User Group Meeting
Signposts of Change in HPC • In early 2002 there were several signposts that signal a fundamental change in HPC in the US. For NERSC they were: • Poor benchmarks for the NERSC workload for our latest procurement (March 2002) • Impressive early performance results of the Earth Simulator System (April 2002) • Increasing indications that commodity clusters will not be as easy, scalable or cost-effective as first projected. • Lack of progress in computer architecture research evident at Petaflops Workshop (WIMPS, February 2002) • Detailed evaluation of current and future processors and systems • System designers did not truly understand current and future scientific applications • Design target codes are not current and future methods NERSC User Group Meeting
The Conclusion • The community has pursued the logical extreme of the COTS systems • The commodity building block was the microprocessor, but is now the entire server (SMP). • Communications and memory bandwidth are not scaling with peak processor power. • Near football-field size computers consuming megawatts of electricity. • DOE Office of Science is the natural leader to address this and is making it a priority • This is happening against the backdrop of: • Decreasing interest in HPC by some U.S. vendors • Further consolidation/reduction of the number of vendors • Reduced profitability and reduced technology investments • New ways to define commodity So, we are in the middle of a fundamental change of the basic premises of the HPC market in the U.S. NERSC User Group Meeting
The Divergence Problem • The requirements of high performance computing for science and engineering and the requirements of the commercial market are diverging. • The commercial-clusters-of-COTS-SMPs approach is no longer sufficient to provide the highest level of performance • Lack of memory bandwidth • High interconnect latency • Lack of interconnect bandwidth • Lack of high performance parallel I/O • High cost of ownership for large scale systems NERSC User Group Meeting
The Divergence Problem • In response, NERSC, ANL, IBM developed a Science Driven Computer Architecture proposal. • Includes a new architecture co-defined with IBM called Blue Planet • "Creating Science-Driven Computer Architecture: A New Path to Scientific Leadership“ - http://www.nersc.gov/news/ArchDevProposal.5.01.pdf • Expanding this process with other vendors NERSC User Group Meeting
Overall Goals • Restore American leadership in “capability” scientific computing by 2005 • Define a sustainable path for efficient scientific computing • Focus on achieving high sustained performance rather than peak • The first step in a long term strategy • Petaflop peak by the end of the decade with 40% sustained • An initial system with 2x sustained performance over the ES at 50% the cost • On at least a modest number of strategic large applications • Sustained performance of 30-40% on key benchmarks • Needs to be 4X peak performance of the current ES (assume Moore's law performance scaling) • 160TF peak performance • Phased delivery plan with the final system available in 2H05,1Q06 • Assumes a significant funding profile can be developed • Low risk (build off existing roadmap to the extent possible) • Full strategy has multiple solutions proposed • ANL with Blue Gene/L and ORNL with Cray X1 • Proposed to be a cooperative development effort between NERSC and IBM NERSC User Group Meeting
Approach • Study applications critical to DOE Office of Science and others. For example: • Material Science • Combustion simulation and adaptive methods • Computational astrophysics • Nanoscience (new drugs and also new microchip technologies) • Biochemical and Biosciences (protein folding/interactions) • Climate modeling • High Energy Physics (particle accelerators and astrophysics) • Multi-grid Eigen solvers and LA methods • Identify key bottlenecks found in these critical applications • Outline a high level approach to address the challenges • Follow-up meetings for detailed drill down by the IBM computer scientists and application scientists at NERSC • Iterate on proposed solution NERSC User Group Meeting
Other Ideas for Consideration • Finely tuned libraries for FFT, FMA, Matrix ops (ESSL, PESSL) • Hardware acceleration engines for performance critical ops especially when software tuning is inhibited by other constraints • MPI Lite (avoid some of the performance inhibiting semantics that are seldom used) • Trade-offs would be ordering rules and maybe repeatability of results • Hardware acceleration engines for MPI (in processor & adapter) • Unified Parallel C programming model and other new languages • Microkernel OS on compute nodes • Other OS enhancements for HPC • E.g. no paging of well behaved applications • better hooks for daemon control • Advanced Cooling to address floor space • New CPU’s clock is limited by packaging and cooling not chip technology • Compiler technologies for Viva, VMX, etc. NERSC User Group Meeting
New Class of Computer Architectures for Science • Sustained cooperative development of new computer architectures • Engaging the scientists with the developers • A focus on sustained performance of scientific applications – not on peak performance • Addressing the key bottlenecks of bandwidth and latency for memory and processor interconnection • A new investment in the computer science research and scientific research communities • Cost Matters • If effective scientific supercomputing is only available at high cost, it will have impact on only a small part of the scientific community. • So, we need to leverage the resources of mainstream IT companies like IBM, HP and Intel. • And our national science policy should motivate them to participate durably. NERSC User Group Meeting
Full IBM Blue Planet System Components • New IH++ Wide Node - 8 CPUs per node • POWER5 GS Processor - 2.5GHz • Single core MCM • 2048 node system (8 Nodes per frame) • 16K processors @ 10GF per CPU = 160TF Peak • Virtual Vector Architecture - VIVA • Federation Switch - 3 stage topology • 8GB/s per server for the uni direction communication bandwidth. • 40-50 TF Sustained on 2-3 selected applications • 256 TB of memory = 16GB per CPU • May reduced to 128TB of memory if it can sustain full memory BW • 2.5PB disk in I/O system [approximately 48 IO nodes] • Approximately 600 Frames • 256 compute racks, 250 Disk racks, 160 Switch racks • 12,000-15,000 Sq Feet; 5-7 MWatts Power • Scientists will focus on application optimization NERSC User Group Meeting
Blue Planet: A Conceptual View • Increasing memory bandwidth – single core • 8 single CPUs are matched with memory address bus limits for full memory bandwidth • Increasing switch bandwidth – 8-way nodes • Decreased switch latency while increasing span • Enabling vector programming model inside each SMP node • Sustained performance on science applications at a sustainable cost and development model NERSC User Group Meeting
Issues Under Study • Issues Covered • Memory Bandwidth • Especially for small scattered accesses • MPI communication latency • Protocol path length overhead hurts performance • Example: UPC very sensitive to this overhead • MPI Collective Communication Performance • MPI IO Performance • MPI scaling to a large number of tasks • New Issues • Microkernel option • VIVA follow-up with the Compiler team NERSC User Group Meeting
Progress Already • Additional changes Power 5 CPU • Scaling the switch larger than first planned • Software performance changes • Close to committing small, more memory intensive node NERSC User Group Meeting
Progress and Status • LLNL/ASCI has become very interested in Blue Planet • ~8 meetings with IBM and LLNL to develop the ideas and narrow down design choices • CPU Design • Node Design • Switch/Interconnect • Software • System Level • Libraries • Special devices • Modeling Performance NERSC User Group Meeting
Node Discussion • Large SMP vs Small SMP • Impacts switch scaling and hence cost • Partitioned vs Not partitioning • Impacts memory latencies and switch scaling • MCM based vs DCM based nodes • Impacts cost and performance • Memory Latency • Performance sensitivity • Memory Bandwidth • Streams performance sensitivity • Cost NERSC User Group Meeting
Interconnect Discussion • Importance of Bisection bandwidth • Collectives vs point to point • Application classification • Fat Tree (multi stage networks) vs 3D mesh designs • Is a 3D mesh or hypercube based approach cost effective? • Sensitivity to communication patterns • Need to model the communication patterns to corresponding bottlenecks • What to prioritize: • Global bandwidth versus local bandwidth • Infiniband switch costs: • Depends on the scale of the system NERSC User Group Meeting
Progress Already • System Software • Next on the list start consideration • Currently IBM is most interested in limited full blown system software • Studying Microkernel and minimum OSs • Modeling • IBM Research, NCAR, SDSC, PERC, NERSC, LLNL • Developing tools and methods NERSC User Group Meeting
Other Progress • Other sites very interested • Over 30 sites asked IBM for briefing • A number have asked to participate in discussions • Blue Planet is the basis for several other activities • A lot of the basis for recent White Paper for HECRTF • Continued discussions with DOE/SC • Having depth discussions with SGI, Intel, Cray and HP • SGI has some interesting plans based on DARPA HPCS • Considering holding a workshop on Blue Planet NERSC User Group Meeting
Summary • Think of Blue Planet as a new process as well as a single instantiation of a computer architecture • Waiting for vendors to produce a product, and then evaluating and purchasing will only increase the divergence • Products are already designed to a different design point • Ideas for new, sustainable architectures are sparse • Commodity clustering has more than reached its limits of effectiveness • Need a modified approach to improve this situation for capability science Lou Gerstner’s book title was “Who said Elephants Can’t Dance”. Blue Planet is a collaborative effort by some users and some parts of IBM, Cray, SGI, and other vendors to do do some fancy dancing for Scientific computing. NERSC User Group Meeting