160 likes | 322 Views
ALCF at Argonne . Opened in 2006 Operated by the Department of Energy’s Office of Science Located at Argonne National Laboratory (30 miles southwest of Chicago). IBM Blue Gene/P, Intrepid (2007). 163,840 processors 80 terabytes of memory 557 teraflops
E N D
ALCF at Argonne • Opened in 2006 • Operated by the Department of Energy’s Office of Science • Located at Argonne National Laboratory (30 miles southwest of Chicago)
IBM Blue Gene/P, Intrepid (2007) • 163,840 processors • 80 terabytes of memory • 557 teraflops • Energy-efficient system uses one-third the electricity of machines built with conventional parts • #38 on Top500 (June 2012) • #15 on Graph500 (June 2012) The groundbreaking Blue Gene • General-purpose architecture excels in virtually all areas of computational science • Presents an essentially standard Linux/PowerPC programming environment • Significant impact on HPC – Blue Gene systems are consistently found in the top ten list • Delivers excellent performance per watt • High reliability and availability
IBM Blue Gene/Q, Mira • IBM Blue Gene/Q, Mira • 768,000 processors • 768 terabytes of memory • 10 petaflops • #3 on Top500 (June 2012) • #1 on Graph500 (June 2012) Blue Gene/Q Prototype 2 ranked #1June 2011
Programs for Obtaining System Allocations For more information, visit: http://www.alcf.anl.gov/collaborations/index.php)
The U.S. Department of Energy’s INCITE Program INCITE seeks out large, computationally intensive research projects and awards more than a billion processing hours to enable high-impact scientific advances. • Open to researchers in academia, industry, and other organizations • Proposed projects undergo scientific and computational readiness reviews • More than a billion total hours are awarded to a small number of projects • Sixty percent of the ALCF’s processing hours go to INCITE projects • Call for proposals issued once per year
World-Changing Science Underway at the ALCF • Research that will lead to improved, emissions-reducing catalytic systems for industry (Greeley) • Enhancing pubic safety through more accurate earthquake forecasting (Jordan) • Designing more efficient nuclear reactors that are less susceptible to dangerous, costly failures (Fischer) • Accelerating research that may improve diagnosis and treatment for patients with blood-flow complications (Karniadakis) • Protein studies that will apply to a broad range of problems, such as a finding a cure for Alzheimer’s disease, creating inhibitors of pandemic influenza, or engineering a step in the production of biofuels (Baker) • Furthering research to bring green energy sources, like hydrogen fuel, safely into our everyday lives, reducing our dependence on foreign fuels (Khokhlov)
ALCF Service Offerings • Scientific liaison (“Catalyst”) for INCITE and ALCC projects, providing collaboration along with assistance with proposals and planning • Startup assistance and technical support • Performance engineering and application tuning • Data analysis and visualization experts • MPI and MPI-I/O experts • Workshops and Seminars
A single node • Can be carved up into multiple MPI ranks, or as a single MPI rank with threads • Up to 4 MPI ranks/node on intrepid, up to 64 MPI ranks/node on mira • SIMD available on the cores, required to reach peak flop rate • 2-way FPU on intrepid, 4-way FPU on mira • Runs a Compute Node Kernel, requires cross-compiling from the front-end login nodes • Forwards I/O operations to an I/O node, which aggregates requests from multiple compute nodes • No virtual memory • 2 GB/node on intrepid, 16 GB/node on mira • No fork()/system() calls
A partition • Partitions come in pre-defined sizes that gain you isolation from other users • Additionally, you get the I/O nodes connecting you to the GPFS filesystems – requires a scalable I/O strategy! • Partitions can be as small as 512 nodes (16 on development rack), up to the size of the full machine • At the small scale, this is governed by the ratio of I/O nodes to compute nodes • At the large scale, this is governed by the network links required to make a torus, rather than a mesh
Node Card (32 chips 4x4x2) 32 compute, 0-2 IO cards Blue Gene/P hierarchy: IntrepidSystem Rack 40 Racks 32 Node Cards 1024 chips, 4096 procs 556 TF/s 82TB 14 TF/s 2 TB Compute Card 1 chip, 20 DRAMs 435 GF/s 64 GB 13.6 GF/s 2.0 GB DDR Supports 4-way SMP Front End Node / Service Node System p Servers Linux SLES10
Visualization and Data Analytics • Both systems come with a linux cluster attached to the same GPFS filesystems and network infrastructure • The GPUs on these machines can be used for straight visualization, or to perform data analysis • Software includes VISIT, ParaView, and other viz toolkits
Programming Models and Development Environment • Basically, all of the lessons from this week apply: MPI, pthreads, OpenMP, using any of C, C++, Fortran • Also have access to things like Global Arrays and other lower-level communication protocols if that’s your thing • Can use XL or GNU compilers, along with LLVM (beta) • I/O using HDF, NetCDF, MPI-I/O, … • Debugging with TAU, HPCToolkit, DDT, TotalView, … • Many supported libraries, like BLAS, PetSc, Scalapack, …
How do you get involved? • Send email to support@alcf.anl.gov requesting access to the CScADs project • Or, go to https://accounts.alcf.anl.gov and request a project of your own