260 likes | 427 Views
Oak Ridge Leadership Computing Facility. www.olcf.ornl.gov. Don Maxwell HPC Technical Coordinator October 8, 2010 Presented To: HPC User Forum, Stuttgart. Oak Ridge Leadership Computing Facility. Mission: Deploy and operate the computational resources required to tackle global challenges
E N D
Oak Ridge Leadership Computing Facility www.olcf.ornl.gov Don Maxwell HPC Technical Coordinator October 8, 2010 Presented To: HPC User Forum, Stuttgart
Oak Ridge LeadershipComputing Facility • Mission: Deploy and operate the computational resources required to tackle global challenges • Providing world-class computational resources and specialized services for the most computationally intensive problems • Providing stable hardware/software path of increasing scale to maximize productive applications development • Deliver transforming discoveries in materials, biology, climate, energy technologies, etc. • Provide the ability to investigate otherwise inaccessible systems, from supernovae to nuclear reactors to energy grid dynamics 2 Managed by UT-Battellefor the Department of Energy
Our vision for sustained leadershipand scientific impact Provide the world’s most powerful open resourcefor capability computing Follow a well-defined path for maintaining world leadershipin this critical area Attract the brightest talent and partnerships from all over the world Deliver cutting-edge science relevant to the missionsof DOE and key federal and state agencies Unique opportunity for multi-agencycollaboration for science basedon synergy of requirements and technology
With UT, we are NSF’s National Institute for Computational Sciences for academia • 1 PF system to the UT-ORNL Joint Institute for Computational Sciences • Largest grant in UT history • Other partners: Texas Advanced Computing Center, National Centerfor Atmospheric Research, ORAU, and core universities • 1 of up to 4 leading-edge computing systems plannedto increase the availability of computing resourcesto U.S. researchers • A new phase in our relationship with UT • Computational Science Initiative • Governor’s Chair and joint faculty • Engagement with the scientific community • Research, education, and training mission 4 Managed by UT-Battellefor the Department of Energy 4
Oak Ridge National LaboratoryLeadership Computing Systems Jaguar World’s most powerful computer NOAA’s most powerful computer NOAA CMRS NSF’s most powerful computer Kraken
Jaguar History • March 2007 • XT3 and XT4 Combined for total of 124 cabinets • 100TF • July 2006 • XT3 Dual Core 2.6 GHz • 50TF • Dec 2008 • 200 cabinet Quad Core XT5 • 1PF • April 2005 • +30 XT3 Cabinets • Jan 2005 • XT3 Dev Cabinet • Nov 2006 • XT4 Dual Core 2.6GHz • 32 then 36 cabinets • Jun 2005 • +16 cabinets for total of 56 XT3 • 25TF • May 2008 • XT4 68 cabinets Quad Core • 250TF • Nov 2009 • 200 cabinet • Six Core XT5 • 2PF • Mar 2005 • 10 Cabinet Single Core
What is Jaguar Today? Jaguar combines a 263 TF Cray XT4 system at ORNL’s OLCF with a 2,332 TF Cray XT5 to create a 2.5 PF system
“Spider”: Center-wide High Speed Parallel File System • “Spider” provides a shared, parallel file system for all systems • Based on Lustre file system • Demonstrated bandwidth of over 240 GB/s • Over 10 PB of RAID-6 Capacity • DDN 9900 storage controllers with 8+2 disks per RAID group • 13,440 1-TB SATA Drives • 192 Dell PowerEdge Storage servers • 3 TB of memory • Available from all systems via our high-performance scalable I/O network • Over 3,000 InfiniBand ports • Over 3 miles of cables • Scales as storage grows • Spider is the parallel file system for Jaguar • Spider uses approximately 400 KW of power
Jaguar combines a 2.33 PF Cray XT5 with a 263 TF Cray XT4 Cray XT5 Spider External Logins System components are linked by 4×-DDR InfiniBand (IB) using three Cisco 7024D switches • XT5 has 192 IB links • XT4 has 48 IB links • Spider has 192 IB links Cray XT4
Building an Exabyte Archive High-Performance Storage System adds capacity and speed • Supercomputers addressing Grand Challenges need to quickly store massive amounts of data • The High-Performance Storage System meets the big-storage demands of big science • 25PB of Tape Storage • Planning for 750PB by 2012 “Fifteen years ago, [national] labs realized they needed something of this size. They recognized Grand Challenge problems were coming up that would require petaflops of computing power. And they realized those jobs had to have a place to put the data.” Stanley White, National Center for Computational Sciences
Scheduling to Maximize Capability Computing Capability jobs get maximum priority and walltime Jobs are prioritized using several factors to meet DOE goals and to provide flexibility
Job Failure Trends • MPI Forum OpenMPI HWPOISON
ORNL’s Current and Planned Data Centers • Computational Sciences Building (40,000 ft2) • Maximum building power to 25 MW • 6,600 ton chiller plant • 1.5 MW UPS and 2.25 MW generator • LEED Certified • Multiprogram Research Facility (30,000 ft2) • Capability computing for national defense • 25 MW of power and 8,000 ton chillers • LEED Gold Certification • Multiprogram Computing & Data Center (140,000 ft2) • Up to 100 MW of power • Lights out facility • Planned for LEED Gold certification
Advisory Committee J. Dongarra T. Dunning K. Droegemeier S. Karin D. Reed J. Tomkins User Assistance And Outreach A. Barker A. Fields T. Barron D. Dillow D. Fuller R. Gunasekaran S. Hicks5 Y. Kim K. Matney R. Miller S. Oral J. Buchanan J. Eady5 D. Frederick C. Fuson E. Gedenk1 B. Gajus5 M. Griffith S. Hempfling J. Hines# S. Jones C. Kerns1 National Center for Computational Sciences J. Hack, Director A. Bland, OLCF Project Director L. Gregg, Division Secretary Operations Council W. McCrosky, Finance Officer H. George, HR Rep. K. Carter, Recruiting M. Richardson*, Facility Mgmt. M. Disney, ES&H Officer R. Adamson, M. Disney, Cyber Security Cray Supercomputing Center of Excellence Deputy Project Director K. Boudwin B. Hammontree, Site Preparation J. Rogers, Hardware Acquisition R. Kendall, Test & Acceptance Development A. Baker, Commissioning D. Hudson, Project Management K. Stelljes, Cray Project Director Director of Science B. Messer, Acting Industrial Partnerships S. Tichenor Director of Operations J. Rogers INCITE Program J. White J. Levesque N. Wichmann J. Larkin D. Kiefer L. DeRose Chief Technology Officer A. Geist OLCF System Architect S. Poole Technology Integration G. Shipman S. Mowery Scientific Computing R. Kendall A. Fields High-Performance Computing Operations A. Baker S. Allen Application Performance Tools5 R. Graham T. Darland S. Ahern# E. Apra5 R. H. Baker D. Banks3 M. Brown J. Daniel M. Eisenbach M. Fahey J. Gergel5 S. Hampton7 W. Joubert# S. Klasky# A. Lopez-Bezanilla7 Q. Liu7 B. Mintz7 M. Matheson R. Mills5 B. Mintz7 H. Nam G.Ostrouchov5 N. Podhorszki D. Pugmire R. Sisneros7 R. Sankaran R. Tchoua A. Tharrington# R. Toedte R. Adamson M. Bast J. Becklehimer4 J. Breazeale6 J. Brown6 M. Disney A. Enger4 C. England J. Evanko4 A. Funk4 D. Garman4 D. Giles M. Hermanson2 J. Hill S. Koch H. Kuehn C. Leach6 D. Leverman D. Londo4 J. Lothian D. Maxwell@ M. McNamara4 J. Miller6 D. Pelfrey G. Phipps, Jr.6 R. Ray S. Shpanskiy C. St. Pierre B. Tennessen4 K. Thach T. Watts4 S. White C. Willis4 T. Wilson6 B. Settlemyer5 D. Steinert J. Simmons V. Tipparaju5 S. Vazhkudai5 F. Wang V. White Z. Zhang D. Levy5 M. Miller L. Rael B. Renaud C. Rockett1 D. Rose5 J. Smith W. Wade1 B. Whitten L. Williams5 R. Barrett W. Bland L. Broto7 O. Hernandez S. Hodson T. Jones R. Keller G. Koenig J. Kuehn 1Student 2Post Graduate 3JICS 4Cray, Inc. 5Matrixed 6Subcontract 7 Post Doc *Acting # Task Lead @ Technical Coordinator ORNL is managed and operated by UT-Battelle, LLC under contract with the DOE. 78 FTEs
Scientific Computing • Science team liaisons • Developing, tuning, and scaling current and future applications • Providing visualizations to present scientific results and augment discovery processes Scientific Computing facilitates the delivery of leadership science by partnering with users to effectively utilize computational science, visualization and workflow technologies on OLCF resources through: 15
We allocate time on the DOE systems through the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) Program Provides awards to academic, government, and industry organizations worldwide needing large allocations of computer time, supporting resources, and data storage to pursue transformational advances in science and industrial competitiveness.
User Demographics Active Users by Sponsor System time is allocated to each project. We do not charge for time except for proprietary work by commercial companies.
Next INCITE Call for Proposals: April 2011 Awards for 1-, 2-, or 3- years Average award > 20 million processor hours per year Contact us about discretionary time for INCITE preparation Some INCITE research topics • Glimpse into dark matter • Supernovae ignition • Protein structure • Creation of biofuels • Replicating enzyme functions • Protein folding • Chemical catalyst design • Efficient coal gasifiers • Combustion • Algorithm development • Global cloudiness • Regional earthquakes • Carbon sequestration • Airfoil optimization • Turbulent flow • Propulsor systems • Nano-devices • Batteries • Solar cells • Reactor design Contact information Julia C. White, INCITE Manager whitejc@DOEleadershipcomputing.org
Gordon Bell Prize Awarded to ORNL Team UPDATE: with upgraded Jaguar, DCA++ has exceeded 1.9 PF Three of six GB finalists ran on Jaguar • A team led by ORNL’s Thomas Schulthess received the prestigious 2008 Association for Computing Machinery (ACM) Gordon Bell Prize at SC08 • For attaining fastest performance ever in a scientific supercomputing application • Simulation of superconductors achieved 1.352 petaflops on ORNL’s Cray XT Jaguar supercomputer • By modifying the algorithms and software design of the DCA++ code, the team was able to boost its performance tenfold • Gordon Bell Finalists • DCA++ ORNL • LS3DF LBNL • SPECFEM3D SDSC • RHEA TACC • SPaSM LANL • VPIC LANL
OLCF is working with users to produce scalable, high-performance apps for the petascale 20 Managed by UT-Battelle for the U.S. Department of Energy
Scientific Progress at the Petascale Turbulence Understanding the statistical geometry of turbulent dispersion of pollutants in the environment. • Nuclear EnergyHigh-fidelity predictive simulation tools for the design of next-generation nuclear reactors to safely increase operating margins. Fusion EnergySubstantial progress in the understanding of anomalous electron energy loss in the National Spherical Torus Experiment (NSTX). Energy StorageUnderstanding the storage and flow of energy in next-generation nanostructured carbon tube supercapacitors • BiofuelsA comprehensive simulation model of lignocellulosic biomass to understand the bottleneck to sustainable and economical ethanol production. Nano ScienceUnderstanding the atomic and electronic properties of nanostructures in next-generation photovoltaic solar cell materials. 21 Managed by UT-Battelle for the U.S. Department of Energy
Coherent transport simulations in band-to-band tunneling devices with simulation times of less than an hour => rapidly explore design space Incoherent transport simulations coupling all energies through phonon-interactions. Production runs on 70,000 cores in 12 hours=> first atomistic incoherent transport simulations Identify next generation nano-transistor architectures, and reduce power consumption and increase manufacturability. Model, understand, and design carrier flow in nano-scale semiconductor transistors. Nanoscience / nanotechnologyPetascale simulations of nano-electronic devices Science Objectives and Impact OMEN: 3D, 2D, and 1D atomistic devices Research Team Science Results M. Luisier and G. Klimeck, Purdue University 3-year INCITE award, with 20 million hours in 2010
Unprecedented detail and accuracy of a Class 8 Tractor-Trailer aerodynamic simulation. Minimizes drag associated with trailer underside Compresses and accelerates incoming air flow and injecting high energy air into trailer wake => UT-6 Trailer Under Tray System reduces Tractor/Trailer drag by 12% Apply advanced computational techniques from aerospace industry to substantially improve fuel efficiency and reduce emissions of trucks by reducing drag / increasing aerodynamic efficiency If all 1.3 million long haul trucks operated with the drag of a passenger car, the US would annually: Save 6.8 billion gallons of diesel Reduce 75 million tons CO2 Save $19 billion in fuel costs Computational Fluid DynamicsSmart-Truck Optimization Science Objectives and Impact Aerodynamic Performance Testing Methods - Jaguar CFD analysis of truck and mirrors Research Team Science Results Mike Henderson, BMI Corp. Participant in the Industrial Partnerships Program
Examples of OLCF Industrial Projects Developing new add-on parts to reduce drag and increase fuel efficiency of Class 8 (18-wheeler) long haul trucks. This will reduce fuel consumption by up to 3,700 gallons per truck per year, and reduce CO2 by up to 41 tons (82,000 lb) per truck per year. BMI using NASA FUN3D and NASA team is assisting BMI with code refinement (OLCF Director’s Discretionary Award) Analyzing unsteady versus steady flows in low pressure turbomachinery and their potential effects on more energy efficient designs. (OLCF Director’s Discretionary Award) Studying at the nano scale catalysts that can selectively produce hydrogen from biomass (hydrogen to be used as energy for fuel cells)(OLCF Director’s Discretionary Award) Developing a unique CO2 compression technology for significantly lower cost carbon sequestration (ALCC award) INCITE awards
10 Year Strategy:Moving to the Exascale • The U.S. Department of Energy requires exaflops computing by 2018 to meet the needs of the science communities that depend on leadership computing • Our vision: Provide a series of increasingly powerful computer systems and work with user community to scale applications to each of the new computer systems • OLCF-3 Project: New 10-20 petaflops computer based on early hybrid multi-core technology 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Future systems 1 EF OLCF-3 300 PF System 100 PF Today 1020 PF 2 PF, 6-core 1 PF ORNL Extreme Scale Computing Facility(140,000 ft2) ORNL Multipurpose Research Facility ORNL Computational Sciences Building OLCF Roadmap from 10-year plan
OLCF-3 “Titan” System Description • Similar number of cabinets, cabinet design, and cooling as Jaguar • Operating system upgrade of today’s Cray Linux Environment • New Gemini interconnect • 3-D Torus • Globally addressable memory • Advanced synchronization features • New accelerated node design using GPUs • 20 PF peak performance • 9x performance of today’s XT5 • 3x larger memory • 3x larger and 4x faster file system