1 / 26

NERSC Status Update for NERSC User Group Meeting June 2006 William T.C Kramer kramer@nersc

NERSC Status Update for NERSC User Group Meeting June 2006 William T.C Kramer kramer@nersc.gov 510-486-7577 Ernest Orlando Lawrence Berkeley National Laboratory. Outline. Thanks for 10 Years of Help. This is the 20 th NUG meeting I have the privilege of attending

aurek
Download Presentation

NERSC Status Update for NERSC User Group Meeting June 2006 William T.C Kramer kramer@nersc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NERSC Status Update for NERSC User Group Meeting June 2006 William T.C Kramer kramer@nersc.gov 510-486-7577 Ernest Orlando Lawrence Berkeley National Laboratory

  2. Outline

  3. Thanks for 10 Years of Help • This is the 20th NUG meeting I have the privilege of attending • Throughout the past 10 years you all have provided NERSC invaluable help and guidance • NUG is very unique within the HPC community • NERSC and I are grateful for your help in making NERSC successful

  4. Science-Driven Computing Strategy 2006 -2010

  5. NERSC Must AddressThree Trends • The widening gap between application performance and peak performance of high-end computing systems • The recent emergence of large, multidisciplinary computational science teams in the DOE research community • The flood of scientific data from both simulations and experiments, and the convergence of computational simulation with experimental data collection and analysis in complex workflows

  6. Science-Driven Systems • Balanced and timely introduction of best new technology for complete computational systems (computing, storage, networking, analytics) • Engage and work directly with vendors in addressing the SC requirements in their roadmaps • Collaborate with DOE labs and other sites in technology evaluation and introduction

  7. Science-Driven Services • Provide the entire range of services from high-quality operations to direct scientific support • Enable a broad range of scientists to effectively use NERSC in their research • Concentrate on resources for scaling to large numbers of processors, and for supporting multidisciplinarycomputational science teams

  8. Science-Driven Analytics • Provide architectural and systems enhancements and services to more closely integrate computational and storage resources • Provide scientists with new tools to effectively manipulate, visualize and analyze the huge data sets from both simulations and experiments

  9. National Energy Research Scientific Computing (NERSC)Center Division NERSC CENTER DIVISION DIRECTOR HORST SIMON DIVISION DEPUTY WILLIAM KRAMER Draft NERSC CENTER GENERAL MANAGER & HIGH PERFORMANCE COMPUTING DEPARTMENT HEAD WILLIAM KRAMER SCIENCE DRIVEN SYSTEM ARCHITECTURE JOHN SHALF Team Leader SCIENCE DRIVEN SYSTEMS HOWARD WALTER Associate General Manager SCIENCE DRIVEN SERVICES FRANCESCA VERDIER Associate General Manager HENP COMPUTING CRAIG TULL Group Leader COMPUTATIONAL SYTEMS JAMES CRAW Group Leader USER SERVICES JONATHAN CARTER Group Leader MASS STORAGE JASON HICK Group Leader ANALYTICS WES BETHEL- TL (Matrixed - CRD) OPEN SOFTWARE & PROGRAMMING DAVID SKINNER Group Leader NETWORK, SECURITY & SERVERS BRENT DRANEY Group Leader COMPUTER OPERATIONS & ESnet SUPPORT STEVE LOWE Group Leader ACCOUNTS & ALLOCATION TEAM CLAYTON BAGWELL Team Leader

  10. NERSC Center NERSC CENTER GENERAL MANAGER WILLIAM KRAMER Draft SCIENCE DRIVEN SYSTEMS HOWARD WALTER Associate General Manager SCIENCE DRIVEN SERVICES FRANCESCA VERDIER Associate General Manager COMPUTATIONAL SYSTEMS JAMES CRAW Group Leader Matthew Andrews (.5) William Baird Nick Balthaser Scott Burrow (V) Greg Butler Tina Butler Nicholas Cardo Thomas Langley Rei Lee David Paul Iwona Sakrejda Jay Srinivasan Cary Whitney (HEP/NP) Open Positions (2) USER SERVICES JONATHAN CARTER Group Leader Harsh Anand Andrew Canning (.25- CRD) Richard Gerber Frank Hale Helen He Peter Nugent – (.25-CRD) David Skinner (.5) Mike Stewart David Turner (.75) ANALYTICS WES BETHEL Team Leader (.5-CRD) Cecilia Aragon (.2 - CRD) Julian Borrill (.5 - CRD) Chris Ding(.3 - CRD) Peter Nugent (.25 - CRD) Christina Siegrist (CRD) Dave Turner (.25) Open Positions (1.5) SCIENCE DRIVENSYTEM ARCHITECTURETEAM JOHN SHALF Team Leader Andrew Canning (.25- CRD) Chris Ding (.2 – CRD) Esmond Ng (.25-CRD) Lenny Oliker (.25-CRD) Hongzhang Shan (.5-CRD) David Skinner (.5)E.Strohmaier (.25-CRD) Lin Wang Wang (.5 – CRD) Harvey Wasserman Mike Welcome (.15-CRD) Katherine Yelick (.05-CRD) NETWORKING, SECURITY, SERVERS & WORKSTATIONS BRENT DRANEY Group Leader Elizabeth Bautista (DB) Scott Campbell Steve Chan Jed Donnelley Craig Lant Raymond Spence Tavia Stone Open Position (DB) COMPUTER OPERATIONS & ESnet SUPPORT STEVE LOWE Group Leader Richard Beard Del Black Aaron Garrett Russell Huie (ES) Yulok Lam Robert Neylan Tony Quan (ES) Alex Ubungen OPEN SOFTWARE & PROGRAMMING DAVID SKINNER Group Leader Mikhail Avrekh Tom Davis RK Owen Open Position (1) - Grid ACCOUNTS & ALLOCATIONS CLAYTON BAGWELL Team Leader Mark Heer Karen Zukor (.5) MASS STORAGE JASON HICK Group Leader Matthew Andrews (.5) Shreyas Cholia Damian Hazen Wayne Hurlbert Open Position (1) V- Vendor staff CRD – Matrixed staff from CRD ES – funded by ESnet HEP/NP – funded by LBNL HEP and NP Division DB – Division Burden

  11. 2005-2006 Accomplishments

  12. DOE Joule metric • Comprehensive Scientific Support: • 20-45% code performance improvements  2M extra hours • All projects reliedheavily on NERSC visualization services Large-Scale Capability Computing Is Addressing New Frontiers INCITE Program at NERSC in 2005: • Turbulent Angular Momentum Transport; Fausto Cattaneo, University of Chicago • Order of magnitude improvement in simulation of accretion in stars and in the lab. • Direct Numerical Simulation of Turbulent Non-premixed Combustion; Jackie Chen, Sandia Labs • The first 3D Direct Numerical Simulation of a turbulent H2/CO/N2-air flame with detailed chemistry. Found new flame phenomena unseen in 2D. • Molecular Dynameomics; Valerie Dagget, University of Washington • Simulated folds for 38% of all known proteins • 2 TB protein fold database created

  13. The Good • Deployed Bassi – January 2006 • One of the fastest installations and acceptances • Bassi providing exceptional service • Deployed NERSC Global File System – Sept 2005 • Upgraded – January 2006 • Excellent feedback from users • Stabilized Jacquard – October 2005 to April 2006 • Resolved MCE • errors • Installed 40 more nodes

  14. The Good • Improved PDSF • Added processing and storage • Converted 100’s of NSF file systems to a few GPFS file systems • Access to NGF • Increased Archive Storage function and performance • Upgraded to HPSS 5.1 – April 2006 • More tape drives • More Cache disk • 10 GE Servers • NERSC 5 procurement • On schedule and below cost (to do the procurement) • Continued Network tuning

  15. The Good • Deployed Bassi – January 2006 • One of the fastest installations and acceptances • Bassi providing exceptional service • Deployed NERSC Global File System – Sept 2005 • Upgraded – January 2006 • Excellent feedback from users • Stabilized Jacquard – October 2005 to April 2006 • Resolved MCM errors • Installed 40 more nodes

  16. The Good • Continued Network Tuning • Security • Continued to avoid major incidents • Good results from the “Site Assistance Visit” at LBNL • LBNL and NERSC “outstanding” • Still a lot of work to do – and some changes – before they return in a year • Over allocation issues (AY 05) solved • Better queue responsiveness • Stable time allocations

  17. The Good • Other • Thanks to ASCR the NERSC budget appears stabilized • Worked with others to help define HPC business practices • Continued progress in influencing advanced HPC concepts • Cell, Power, Interconnects, Software roadmaps, evaluation methods, working methods,…

  18. The Not So Good • Took a long time to stabilize Jacquard • Learned some lessons about light weight requirements • Upgrades on systems have not gone as well as we would have liked • Extremely complex – and much is not controlled by NERSC • Security attempts continue and increase in sophistication • Can expect continued evolution • User and NERSC data base usage will be a point of focus

  19. The Jury is still out • Analytics ramp-up taking longer than we desired • NGF major step • Some success stories, but we don’t have breadth • Scalability of Codes • DOE Expects significant (>50%?) of time to be for jobs > 2,048 way for the first full year of NERSC-5 • Many of the most scalable applications are migrating to LCFs – so some of the low hanging fruit is already harvested • Should be a continuing focus of NERSC and NUG

  20. 2005-2006 Progress On Goals

  21. FY 04-06 Overall Goals • (Support for DOE Office of Science) Support and assist DOE Office of Science in meeting its goals and obligations through the research, development, deployment and support of high performance computing and storage resourcesand advanced mathematical and  computer systems software. • (Systems and Services)Provide leading edge, open High Performance Computing (HPC) systems and services to enable scientific discovery. NERSC will use its expertise and leadership in HPC to provide reliable, timely, and excellent services to its users. • (Innovative assistance)Provide innovative scientific and technical assistance to NERSC's users. NERSC will work closely with the user community and together produce significant scientific results while making the best use of NERSC facilities. • (Respond to Scientific Needs)Be an advocate for NERSC users within the HPC community. Respond to science-driven needs with new and innovative services and systems.

  22. FY 04-06 Overall Goals • (Balanced integration of  new products and ideas)Judiciously integrate new products, technology, procedures, and practices into the NERSC production environment in order to enhance NERSC's ability to support scientific discovery. • (Advance technology)Develop future cutting-edge strategies and technologies that will advance high performance scientific computing capabilities and effectiveness, allowing  scientists to solve new and larger problems, and making HPC systems easier to use and manage. • (Export NERSC knowledge)Export knowledge, experience, and technology developed at NERSC to benefit computer science and the high performance scientific computing community. • (Culture)Provide a facility that enables and stimulates scientific discovery by continually improving our systems, services and processes. Cultivate a can-do approach to solving problems and making systems work, while maintaining high standards of ethics and integrity.

  23. 2005-2006 Progress 5 Year Plan Milestones

  24. 5 Year Plan Milestones • 2005 • NCS enters full service.- Completed • Focus is on modestly parallel and capacity computing. • >15–20% of Seaborg • WAN upgrade to 10 Gb/s .- Completed • Upgrade HPSS to 16 PB. Storage upgrade to support 10 GB/s for higher density and increased bandwidth. .- Completed • Quadruple the size of the visualization/post-processing server. .- Completed • 2006 • NCSb enters full service. .- Completed • Focus is on modestly parallel and capacity computing • >30–40% of Seaborg .- Completed – Actually > 85% of Seaborg SSP

  25. 5 Year Plan Milestones • 2006 • NERSC-5: initial delivery with possibly a phasing of delivery. – Expected – but most will be in FY 07 • 3 to 4 times Seaborg in delivered performance – Over Achieved – more later • Used for entire workload and has to be balanced • Replace the security infrastructure for HPSS and add native Grid capability to HPSS – Completed and Underway • Storage and Facility-Wide File System upgrade. .- Completed and Underway • 2007 • NERSC-5 enters full service. - Expected • Storage and Facility-Wide File System upgrade. - Expected • Double the size of the visualization/post processing server. – If usage dictates

  26. Summary • It is a good time to be in HPC • NERSC has far more success stories than issues • NERSC Users are doing an outstanding job producing leading edge science for the Nation • More than 1,200 peer reviewed papers for AY 05. • DOE is extremely support of NERSC and its users

More Related