680 likes | 874 Views
CSE 190. Honors Seminar in High Performance Computing, Spring 2000 Prof. Sid Karin skarin@ucsd.edu x45075. Definitions History SDSC/NPACI Applications. Definitions of Supercomputers. The most powerful machines available. Machines that cost about 25M$ in year 2000 $.
E N D
CSE 190 Honors Seminar in High Performance Computing, Spring 2000 Prof. Sid Karin skarin@ucsd.edu x45075
Definitions • History • SDSC/NPACI • Applications
Definitions of Supercomputers • The most powerful machines available. • Machines that cost about 25M$ in year 2000 $. • Machines sufficiently powerful to model physical processes including accurate laws of nature and realistic geometry, and including large quantities of observational/experimental data.
Supercomputer Performance Metrics • Benchmarks • Applications • Kernels • Selected Algorithms • Theoretical Peak Speed • (Guaranteed not to exceed speed) • TOP 500 List
Misleading Performance Specifications in the Supercomputer Field David H.Bailey RNR Technical Report RNR-92-005 December 1,1992 http://www.nas.nasa.gov/Pubs/TechReports/RNRreports/dbailey/RNR-92-005/RNR-92-005.html
Definitions • History • SDSC/NPACI • Applications
Applications • Cryptography • Nuclear Weapons Design • Weather / Climate • Scientific Simulation • Petroleum Exploration • Aerospace Design • Automotive Design • Pharmaceutical Design • Data Mining • Data Assimilation
Applications cont’d. • Processes too complex to instrument • Automotive crash testing • Air flow • Processes too fast to observe • Molecular interactions • Processes too small to observe • Molecular interactions • Processes too slow to observe • Astrophysics
Applications cont’d. • Performance • Price • Performance / Price
Data-intensive computing (mining) • Theory Experiment Data-intensive computing (assimilation) Numericallyintensive computing Simulation
Supercomputer Architectures • Vector • Parallel Vector, Shared Memory • Parallel • Hypercubes • Meshes • Clusters • SIMD vs. MIMD • Shared vs. Distributed Memory • Cache Coherent Memory vs. Message Passing • Clusters of Shared Memory Parallel Systems
The Cray - 1 • A vector computer that worked • A balanced computing system • CPU • Memory • I/O • A photogenic computer
1976: The Supercomputing “Island” Today:A Continuum Number of machines Performance
The Cray X-MP • Shared Memory • Parallel Vector • Followed by Cray Y-MP, C-90, J-90, T90…..
The Cray -2 • Parallel Vector Shared Memory • Very Large Memory (256 MW) • Actually 256K MW = 262 MW • One word = 8 Bytes • Liquid Immersion cooling
Cray Companies • Control Data • Cray Research Inc. • Cray Computer Company Inc. • SRC Inc.
Thinking Machines • SIMD vs. MIMD • Evolution from CM-1 to CM-2 • ARPA Involvement
1st Teraflops System for US Academia“Blue Horizon” Nov 1999 • 1 TFLOPs IBM SP • 144 8-processor compute nodes • 12 2-processor service nodes • 1,176 Power3 processors at 222 MHz • > 640 GB memory (4 GB/node), 10.6GB/s bandwidth, upgrade to > 1 TB later • 6.8 TB switch-attached disk storage • Largest SP with 8-way nodes • High-performance access to HPSS • Trailblazer switch (current ~115MB/s bandwidth) interconnect with subsequent upgrade
UCSD Currently #10 on Dongarra’sTop 500 List • Actual Linpack benchmark sustained 558 Gflops on 120 nodes • Projected Linpack benchmark is 650 Gflops on 144 nodes • Theoretical peak 1.023 Tflops
Tera MTA • Architectural Characteristics • Multithreaded architecture • Randomized, flat, shared memory • 8 CPUs, 8 GB RAM now going to 16 (later this year) • High bandwidth to memory (word per cycle per CPU) • Benefits • Reduced programming effort: single parallel model for one or many processors • Good scalability
12,000 sq ft ASCI Blue Mountain Site Prep 120 ft 100 ft
12,000 sq ft ASCI Blue Mountain Site Prep 120 ft 100 ft
ASCI Blue Mountain Facilities Accomplishments • 12,000 sq. ft. of floor space • 1.6 MWatts of power • 530 tons of cooling capability • 384 cabinets to house the 6144 CPU’s • 48 cabinets for metarouters • 96 cabinets for disks • 9 cabinets for 36 HIPPI switches • about 348 miles of fiber cable
ASCI Blue MountainSST System Final Configuration • Cray Origin 2000 - 3.072 TeraFLOPS peak • 48X128 CPU Origin 2000 (250MHz R10K) • 6144 CPUs: 48 X 128 CPU SMPs • 1536 GB memory total: • 32 GB memory per 128 CPU SMP • 76 TB Fibre Channel RAID disks • 36 x HIPPI-800 switch Cluster Interconnect • To be deployed later this year: • 9 x HIPPI-6400 32-way switch Cluster Interconnect
ASCI Blue MountainAccomplishments • On-site integration of 48X128 system completed (including upgrades) • HiPPI-800 Interconnect completed • 18GB Fiber Channel Disk completed • Integrated Visualization (16 IR Pipes) • Most Site Prep completed • System integrated into LANL secure computing environment • Web based tool for tracking status
ASCI Blue MountainAccomplishments-cont • Linpack - achieved 1.608TeraFLOPs • accelerated schedule-2 weeks after install • system validation • run on 40x126 configuration • f90/MPI version run of over 6 hours • sPPM - turbulence modeling code • validated full system integration • used all 12 HiPPI boards/SMP and 36 switches • used special “MPI” HiPPI bypass library • ASCI codes scaling
Summary • Installed ASCI Blue Mountain computer ahead of schedule and achieved Linpack record two weeks after install. ASCI application codes are being developed and used.
Network Design Principles • Connect any pair of the DSM computers through the crossbar switches • Connect directly only computers to switches, optimizing latency and bandwidth (there are no direct links DSM<==>DSM or switch<===>switch) • Support a 3-D toroidal 4x4x3 DSM configuration by establishing non-blocking simultaneous links across all sets of 6 faces of the computer grid • Maintain full interconnect bandwidth for subsets of DSM computers (48 DSM computers divided into 2, 3, 4, 6, 8,12, 24, or 48 separate, non-interacting groups)
18 16x16 Crossbar Switches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1 2 3 4 5 6 6 Groups of 8 Computers each 18 Separate Networks
Problem Domain: (4x4x3 DSM layout 48 DSMs/6144 CPUs) sPPM Hydro on 6144 CPUs 1-HiPPI-800 NIC Router CPU Problem Subdomain: 8x4x4 process layout 128 CPUs/1 DSM) 12 HiPPI-800 NICs Router CPU on Neighbor SMP
Definitions • History • SDSC/NPACI • Applications
SDSC A National Laboratory for Computational Science and Engineering
A Distributed National Laboratory for Computational Science and Engineering
Continuing Evolution NPACI NPACI SDSC Resources Resources Education Outreach & Training Enabling technologies Technology & applications thrusts Applications Individuals Partners 1985 2000
NPACI is a Highly Leveraged National Partnership of Partnerships 46 institutions 20 states 4 countries 5 national labs Many projects Vendors and industry Government agencies
Mission Accelerate Scientific Discovery Through the development and implementationof computationaland computerscience techniques
Vision Changing How Science is Done • Collect data from digital libraries, laboratories, and observation • Analyze the data with models run on the grid • Visualize and share data over the Web • Publish results in a digital library
Goals: Fulfilling the Mission Embracing the Scientific Community • Capability Computing • Provide compute and information resources of exceptional capability • Discovery Environments • Develop and deploy novel, integrated, easy-to-use computational environments • Computational Literacy • Extend the excitement, benefits, and opportunities of computational science