1 / 20

The Ranger Supercomputer and it ’ s legacy

The Ranger Supercomputer and it ’ s legacy. Dan Stanzione Texas Advanced Computing Center The University of Texas at Austin December 2, 2013 dan@tacc.utexas.edu. The Texas Advanced Computing Center: A World Leader in High Performance Computing. PS Computation 1Mx – network 100x.

michi
Download Presentation

The Ranger Supercomputer and it ’ s legacy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Ranger Supercomputer and it’s legacy Dan Stanzione Texas Advanced Computing Center The University of Texas at Austin December 2, 2013 dan@tacc.utexas.edu

  2. The Texas Advanced Computing Center: A World Leader in High Performance Computing PS Computation 1Mx – network 100x 1,000,000x performance increase in UT computing capability in 10 years. Ranger: 62,976 Processor Cores, 123TB RAM, 579 TeraFlops, Fastest Open Science Machine in the World, 2008 Lonestar: 23,000 processors, 44TB RAM, Shared Mem and GPU subsystems, #25 in the world 2011 Stampede: #7 in the world today. Somewhere around half a million processor cores with Intel Sandy Bridge and Intel MIC, Dell: >10 Petaflops.

  3. NSF Cyberinfrastructure Strategic Plancirca 2007 – much of this never happened • NSF Cyberinfrastructure Strategic Planreleased March 2007 • Articulates importance of CI overall • Chapters on computing, data, collaboration,and workforce development • NSF investing in world-class computing • Annual “Track2” HPC systems ($30M) • Single “Track1” HPC system in 2011 ($200M) • Complementary solicitations for software, applications, education • Software Development for CI (SDCI) • Strategic Technologies for CI (STCI) • Petascale Applications (PetaApps) • CI-Training, Education, Advancement, Mentoring (CI-TEAM) • Cyber-enabled Discovery & Innovation(CDI) starting in 2008: $0.75B! http://www.nsf.gov/od/oci/CI_Vision_March07.pdf

  4. First NSF Track2 System: 1/2 Petaflop • TACC selected for first NSF ‘Track2’ HPC system • $30M system acquisition • Sun Constellation Cluster • AMD Opteron processors • Project included 4 years operations and support • System maintenance • User support • Technology insertion • Extended to 5 years

  5. Ranger System Summary • Compute power - 579 Teraflops • 3,936 Sun four-socket blades • 15,744 AMD Opteron “Barcelona” processors • Quad-core, 2.0 GHz, four flops/cycle (dual pipelines) • Memory - 125 Terabytes • 2 GB/core, 32 GB/node • 132 GB/s aggregate bandwidth • Disk subsystem - 1.7 Petabytes • 72 Sun x4500 “Thumper” I/O servers, 24TB each • ~72 GB/sec total aggregate bandwidth • 1 PB in largest /work filesystem • Interconnect - 10 Gbps / 2.3 sec latency • Sun InfiniBand-based switches (2) with 3456 ports each • Full non-blocking 7-stage Clos fabric • Mellanox ConnectX IB cards

  6. Ranger I/O Subsystem • Disk Object Storage Servers (OSS) based on Sun x4500 “Thumper” servers • Each x4500: • 48 SATA II 500GB drives (24TB total) • running internal software RAID • Dual Socket/Dual-Core Opterons @ 2.6 GHz • Downside is that these nodes have PCI-X - raw I/O bandwidth can exceed a single PCI-X 4X InfiniBand HCA • We use dual PCI-X • 72 Servers Total: 1.7 PB raw storage • Metadata Servers (MDS) based on Sun Fire x4600s • MDS is Fibre-channel connected to 9TB Flexline Storage • Target Performance • Aggregate bandwidth: 70+ GB/sec • To largest $WORK filesystem: ~40 GB/sec

  7. Ranger Space, Power, and Cooling • Total Project Power: 3.4 MW • System: 2.4 MW • 96 racks – 82 compute, 12 support, plus 2 switches • 116 APC In-Row cooling units • 2,054 sqft total footprint (~4,500 sqft including PDUs) • Cooling: ~1 MW • In-row units fed by three 350-ton chillers (N+1) • Enclosed hot-aisles by APC • Supplemental 280-tons of cooling from CRAC units • Observations: • Space less an issue than power • Cooling > 25kW per rack difficult • Power distribution a challenge, almost 1,400 circuits

  8. Interconnect Architecture NEM NEM NEM NEM NEM NEM NEM NEM NEM NEM NEM NEM NEM NEM NEM NEM Ranger InfiniBand Topology “Magnum” Switch …78… 12x InfiniBand 3 cables combined

  9. Who Used Ranger? • On Ranger alone, TACC has ~6,000 users who have run about three million simulations over the last four years. • UT-Austin • UT System (through UT Research Cyberinfrastructure) • Texas A&M and Texas Tech (through Lonestar Partnership) • Industry (through the STAR program) • Users from around the nation and world (through NSF’s TeraGrid/XSEDE)

  10. Japanese Earthquake Simulation • Simulation of the seismic wave from the earthquake in Japan, propagating through an earth model • Researchers using TACC’s Ranger supercomputer have modeled the processes responsible for continental drift and plate tectonics in greater detail than any previous simulation. • Modeling the propagation of seismic waves through the earth is an essential first step to inferring the structure of earth's interior. • This research is led by Omar Ghattas at The University of Texas at Austin

  11. Studying H1N1 (“Swine Flu”) Researchers at the University of Illinois and the University of Utah used Ranger to simulate the molecular dynamics of antiviral drugs interacting with different kinds of flu. Image produced by Brandt Westing, TACC • They discovered how commercial medications reach the “binding pocket” – and why Tamiflu wasn’t working on the new swine flu strain. • UT researcher Lauren Meyers also used Lonestar to predict the best course of action in the event of an outbreak

  12. Science at the Center of the Storm Using the Ranger supercomputer at the Texas Advanced Computing Center, National Oceanic and Atmospheric Administration (NOAA) scientists and their university colleagues, tracked Hurricane Ike and Gustav during the recent storms. The real-time, high-resolution global and mesoscale (regional) weather predictions they produced used up to 40,000 processing cores at once — nearly two-thirds of Ranger — and included for the first time data streamed directly from NOAA planes inside the storm. The forecasts also took advantage of ensemble modeling, a method of prediction that runs dozens of simulations with slightly different starting points in order to determine the most likely path and intensity forecasts. This new method and workflow was only possible because of the massive parallel processing power that TeraGrid resources can devote to complex scientific problems and the interagency collaboration that brought scientists, resources and infrastructure together seamlessly. A simulation of Hurricane Ike on TACC's Ranger supercomputer shortly before the storm made landfall in Galveston, Texas, on Sept. 13, 2008. Credit: NOAA; Bill Barth, John Cazes, Greg P. Johnson, Romy Schneider and Karl Schulz, TACC

  13. Large Eddy Simulation of the Near-Nozzle Region of Jets Exhausting from Chevron Nozzles Noise from jet engines causes hearing damage in the military and angers communities near airports. With funding from NASA, Ali Uzun (Florida State University) is using Ranger to simulate new exhaust designs that may significantly reduce jet noise. One way to minimize jet noise is to modify the turbulent mixing process using special control devices, such as chevrons—triangle-shaped protrusion at the end of the nozzle. Since noise is a by-product of the turbulent mixing of jet exhaust with ambient air, one can reduce the noise by modifying the mixing process. To determine how a given design would react to high-speed jet exhaust, Uzun first created a computer model of the chevron-shaped exhaust nozzle. This was then integrated into a parallel simulation code that calculated the turbulence of the air as exhaust was forced through the nozzle. Uzun’s simulations had unprecedented resolution and detail. They proved that computational simulations can match experimental results, while supplying much more detailed information about minute physical processes. A picture depicting a two-dimensional cut through the jet flow. The picture visualizes the turbulence in the jet flow and the resulting noise radiation away from the jet.

  14. Ranger Project Costs • NSF Award: $59M • Purchases full system, plus initial test equipment • Includes 4 years of system maintenance • Covers 4 years of operations and scientific support • UT Austin providing power: $1M/year • UT Austin upgraded data center infrastructure: $10-15M • TACC upgrading storage archival system: $1M • Total cost $75-80M • Thus, system cost > $50K/operational day • Must enable user to conduct world-class science every day!

  15. Ranger-Era TeraGrid HPC Systems

  16. Big Deployments Always Have Challenges • We’ve gotten extremely good in bringing in large deployments on time, but it is not an easy process. • Impossible to rely solely on vendors, must be a cooperative process. • Ranger slipped several months, and was changed from the original proposed plan: • Original 2 phase deployment scrapped in favor of a single larger phase. • Several “early product” design flaws detected and corrected through the course of the project.

  17. Cable Manufacturing Defect Illustration of example problematic InfiniBand 12X cables as a results of kinks imposed by the initial manufacturing process: (left) dismantled cable with inner foil removed and (b) cracked twinax as seen through a microscope.

  18. Ranger: Circa 2007

  19. Ranger Lives On • 20 Ranger cabinets have been sent to CHPC for distribution to South African Universities • 16 more racks have been shipped to Tanzania. • 4 awaiting shipment to Botswana • Other components are at Texas A&M, Baylor College of Medicine, ARL (UT classified facility). • Original Ranger user community now migrated to Stampede. • After a remarkably successful production run, Ranger will continue to deliver science and educate HPC researchers around the world.

  20. Ongoing Partnerships • We at TACC are eager to use Ranger as a basis for building sustained and meaningful collaborations • Hardware is a start (and there is always the *next* system) but training, staff development, data sharing, etc. provide new opportunities as well.

More Related