400 likes | 579 Views
NCSA – Evolution of an HPC Center Infrastructure and Services for Scientific Analysis and Decision Support. Danny Powell Executive Director National Center for Supercomputing Applications University of Illinois at Urbana-Champaign. Talk Outline. About NCSA – Who we are now.. Basic numbers
E N D
NCSA – Evolution of an HPC CenterInfrastructure and Services for Scientific Analysis and Decision Support Danny Powell Executive Director National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
Talk Outline • About NCSA – Who we are now.. • Basic numbers • Mission • Basic methods of operation • Projects and Customers • Cyber-Infrastructure and Science Projects • Industry • Education • Government – Public Health • Evolving into a successful HPC Center • How we changed over the years • User service – centric focus • Your staff – it’s almost always about the people • Management – effective roles National Center for Supercomputing Applications
University of Illinois at Urbana-ChampaignNational Center for Supercomputing Applications • Applied Research Unit of University of Illinois • Origin: 1986 NSF-funded national supercomputing centers • Original Mission: Provide state-of-the-art computing and data capabilities to the nation’s scientists and engineers • Develop software tools and software systems needed to make full use of advanced computing and data systems (Mosaic, Apache Web Server, Telnet, D2K, MyProxy, numerous others…) • NCSA by the Numbers • Approximately 275 staff (250 technical/professional staff) • Two facilities (NCSA Building, NPCF) (>220k sq.ft)
Basic Facts about NCSA • Computing/Data Resources • Blue Waters: 11+ Petaflop (1+ PF sustained) computer (Cray) • Most powerful machine in NSF portfolio – NSF’s only Tier One machine • $350 million project ($200 million construction - $150 million operations) • Mid-Range Supercomputing systems: ~200 TF • Archival storage system: 500+ PB • Advanced visualization systems • Types of projects • Local, National and Global scale • Individual tools to large CI frameworks • Point solutions to systemic improvements • IP • Majority of work at NCSA is open source. • Can effectively deal with secure environments, proprietary codes, confidentiality National Center for Supercomputing Applications
It is All About Working with Others • Funding • Federal Agencies, Industry, State of Illinois, Foundations, International sources • Most projects are partnerships with others (88%) • Leveraging skills/resources of others • Goal to be viewed as the “Partner of Choice” • IACAT (Institute for Advanced Computing Applications and Technologies) • Integrates applied research of NCSA with basic research teams of Universities • International Program • 30+ institutions from 22+ countries • Faculty and student exchanges, joint projects, workshops, technology sharing • Industrial Program • Nationally/internationally recognized for it’s level of functional interaction, technology transfer, student engagement • 23+ companies (Fortune 50/100/500, smaller technology companies) National Center for Supercomputing Applications
NCSA Bridges Basic Research and Commercialization with Application Phase 2 Design/ Development Phase 3 Prototyping Phase 4 Production/ Deployment Phase 0 Concept/ Vision Phase 1 Feasibility Product Life Cycle Applied Prototyping & Development Optimization & Robustification Commercialization & Production (.com or .org) Theoretical & Basic Research NCSA Bridges the Gap BETWEEN Basic Research & Commercialization Application Universities & Labs Private Industry Economic Development
Mission: Enable Science/Engineering/Education Individual tools, System software, Analytics, Visualization, Integrated SW systems, Workflow, User Support, Training Effective Resource Utilization USERS: High End Computer & Data Needs NCSA Enables effective/efficient use of high end computer and data resources in support of science and education Scientific, Decision Support, Inquiry Results
Projects and Customers CyberInfrastructure Development A Collaboration/Partnership with a Broad Set of Communities National Center for Supercomputing Applications
Blue Waters Presentation Title
Blue Waters ProjectInput from Scientific Community • D. Baker, University of Washington • Protein structure refinement and determination • M. Campanelli, RIT • Computational relativity and gravitation • D. Ceperley, UIUC • Quantum Monte Carlo molecular dynamics • J. P. Draayer, LSU • Ab initio nuclear structure calculations • P. Fussell, Boeing • Aircraft design optimization • C. C. Goodrich • Space weather modeling • M. Gordon, T. Windus, Iowa State University • Electronic structure of molecules • S. Gottlieb, Indiana University • Lattice quantum chromodynamics • V. Govindaraju • Image processing and feature extraction • M. L. Klein, University of Pennsylvania • Biophysical and materials simulations • J. B. Klemp et al., NCAR • Weather forecasting/hurricane modeling • R. Luettich, University of North Carolina • Coastal circulation and storm surge modeling • W. K. Liu, Northwestern University • Multiscale materials simulations • M. Maxey, Brown University • Multiphase turbulent flow in channels • S. McKee, University of Michigan • Analysis of ATLAS data • M. L. Norman, UCSD • Simulations in astrophysics and cosmology • J. P. Ostriker, Princeton University • Virtual universe • J. P. Schaefer, LSST Corporation • Analysis of LSST datasets • P. Spentzouris, Fermilab • Design of new accelerators • W. M. Tang, Princeton University • Simulation of fine-scale plasma turbulence • A. W. Thomas, D. Richards, Jefferson Lab • Lattice QCD for hadronic and nuclear physics • J. Tromp, Caltech/Princeton • Global and regional seismic wave propagation • P. R. Woodward, University of Minnesota • Astrophysical fluid dynamics National Center for Supercomputing Applications
Optimized Scientific Libraries Languages Compilers Programming Models IO Libraries Tools Fortran/CAF (OpenACC) Environment setup Cray Compiling Environment (CCE) NetCDF • Distributed Memory (Cray MPT) • MPI • SHMEM LAPACK Modules HDF5 C (OpenACC) ScaLAPACK Debugging Support Tools C++ (OpenACC) ADIOS • Fast Track Debugger(CCE w/ DDT) • Abnormal Termination Processing BLAS (libgoto) Python GNU Resource Manager • Shared Memory • OpenMP 3.0 UPC Iterative Refinement Toolkit Performance Analysis Debuggers Adaptive/Other • PGAS & Global View • UPC (CCE) • CAF (CCE) Allinea DDT STAT Cray Adaptive FFTs (CRAFFT) Cray Performance Monitoring and Analysis Tool Visualization • Cray Comparative Debugger# lgdb FFTW Data Transfer Prog. Env. VisIt PAPI Cray PETSc(with CASK) GO Paraview Eclipse Charm++ PerfSuite HPSS Cray Trilinos (with CASK) RAIT YT Tau Traditional Cray Linux Environment (CLE)/SUSE Linux Cray developed Under development Licensed ISV SW 3rd party packaging NCSA supported Cray added value to 3rd party MWTCC - May 31, 2013
Blue Waters Designed to meet compute-intensive, memory-intensive, and data-intensive needs across a wide range of disciplines. • Peak performance: 11.61 PF • Cray XE6 cabinets: 237 • AMD Interlagos processors: >49,000 • 2.3 GHz • 22 640 compute nodes • 362,240 Bulldozer cores • Cray XK6 cabinets: >30 • NVIDIA GPUs: >3,000 • Interconnect: Cray Gemini / 3D torus • Usable storage: >25 PB • Usable storage bandwidth: >1 TB/s • Aggregate system memory: >1.5 PB • System Storage • Scaling to 500 petabytes • Bandwidth to near-line storage: 100 GB/s • Memory per core: 4 GB • Number of disks: >17,000 • Number of memory DIMMS: >190,000 • External network bandwidth: 100 Gb/s scaling to 300 Gb/s • Integrated near-line environment: Presentation Title
XSEDE – National Compute and Data CyberInfrastructure • Collaboration between multiple US CI centers with deep experience: a partnership led by NCSA • PI: John Towns NCSA/Univ of Illinois • Co-PIs: Jay Boisseau, TACC/Univ of Texas Austin Gregg Peterson, NICS/Univ of Tenn-Knoxville Ralph Roskies, PSC/CMU Nancy Wilkins-Diehr, SDSC/UC-San Diego • Partners who complement these CI centers with expertise in science, engineering, technology and education • Univ of Virginia Ohio Supercomputer Center SURA CornellIndiana Univ PurdueUniv of Chicago RiceBerkeley NCARShodor Jülich Supercomputing Centre
Advanced Information SystemsNational Cyberinfrastructure • Hardware • Computers • Data sources • Data stores • Networks • Software • Middleware • Portals • Grid-enabled • Applications • Visualization • Data analysis • Workflows National Center for Supercomputing Applications
CyberInfrastructure is also about the tools/systems that allow effective use • Workflow • Data management • Software models/simulations • Compute resources • Software/Hardware optimization • Visualization tools and resources • Analytic tools • Collaborative environments • Resource sharing • Publishing support tools National Center for Supercomputing Applications
Examples: Community Infrastructure Projects • Earthquake Engineering • Consequence based risk management for seismic events • Environmental Observatories • Ocean Observatories, Coupled Human/Natural Systems, BioDiversity • Atmospheric Modeling • Severe Weather Predictions, Regional Climate Modeling • Astronomy • Very large data transport, processing, and analysis pipelines • BioMedical Informatics • Multisource infectious disease surveillance and patient safety • Humanities/Social Science Research • Digital libraries, Text/Image analysis, social networks • Science Educational Support Systems • Teaching support and educational enhancement systems National Center for Supercomputing Applications
Projects and Customers Industrial Partnerships National Center for Supercomputing Applications
Industrial Interests in HPC • PDM (Product Development Management) • CRM (Customer Relationship Management) • ERP (Enterprise Resource Planning) • SCM (Supply Chain Management) • BENEFITS: • Reduced Time-to-Market • Improved Product Quality • Reduced Prototyping Costs • Re-use original data • Reduced Waste • Framework for Optimization • Global Collaboration Courtesy of TranscenData.com Imaginations unbound
Industrial Activities • Cycle provision • Overflow – when need exceeds their internal capacity • Testing – new architectures before purchasing • Research – testing new methods prior to large investments • Scalability, algorithms, optimization, security, … • Prototype tool/system development • Training • Peer discussions – on non-competitive basis • Stated as an important and unique reason for participating • Industrial park participation • Partners – proximity to expertise and students • New company spinoffs Imaginations unbound
Projects and Customers Education National Center for Supercomputing Applications
Training • Workshops • Train the trainer workshops • Targeted disciplinary/technology/techniques workshops • National conferences and other venues • Training materials • XSEDE https://www.xsede.org/training1 • Blue Waters – Petascale undergraduate education program http://www.shodor.org/petascale/ • Short courses • Virtual School of Computational Science and Engineering – petascale oriented (including big data) • http://www.vscse.org/ • Collaboration – multiple universities National Center for Supercomputing Applications
Outreach • Public awareness • Visualization of real scientific data in public venues • Planetariums – digital domes – astronomy • Hubble 3-D • Cosmic Voyage • Science and Technology Museums – weather, astronomy • Search for Life • Computational Tornado Science • Dynamic Earth • TV and Film • “Tree of Life” - Academy Award nomination – Cinematography and visual effects • “Hunt for the Supertwister” - a public television (NOVA) special • “Monster of the Milky Way” - NOVA PBS television special • Others … National Center for Supercomputing Applications
Educational TechnologyIn support of the learning process • Often - the technology used to support research is also valuable in supporting education • Digital informational resources • Books, references, lectures, photos, videos, audio • Virtual museums, artifacts • Data, experiments • Tools • Analysis, Inquiry, Applications, Visualization • Models and Simulations • Collaborative Environments • Virtual coordination, workflow spaces • Resource sharing – data, computation, visualization National Center for Supercomputing Applications
Projects and Customers Government and Public Health Informatics National Center for Supercomputing Applications
Examples of Uses of HPC / Data Analytics • Illinois State Police – analysis of historical data to help determine crime (and hence staffing) patterns • Policy makers – hazard risk assessments and planning (and response) • Public health officials – early warning on disease outbreaks, with informed options to manage • National Archives – data tools for long term preservation and for public analysis of the data • Economic Development – agricultural marketing enhancement and monitoring program • Policy Decision Support - Urban Planners, Environmental Monitoring, Socio-Economic Modeling, Social Network analysis… many others National Center for Supercomputing Applications
Evolving into a successful HPC Center How we have changed over time User focus Keeping your staff sharp – not complacent Management National Center for Supercomputing Applications
Mission: Enable Science/Engineering/Education Individual tools, System software, Analytics, Visualization, Integrated SW systems, Workflow, User Support, Training Effective Resource Utilization USERS: High End Computer & Data Needs NCSA Enables effective/efficient use of high end computer and data resources in support of science and education Scientific, Decision Support, Inquiry Results
Traditional Function: System Support • System Management • Resource and job scheduling • Storage Management • On-line and Near-line system and data administration • Information life cycle management • Cyber-protection • Networking provisioning and tuning • System Monitoring • System software upgrades and SW management. • Quality Assurance BW Full Service Overview
User Support Function: Basic and Beyond • Requirement Analysis • Service Request Management • Application Services • Application analysis • Porting and Tuning at scale • Bottleneck reduction • Client consulting • Application re-engineering • Library and tools creation and support • Third Party Application support • Visualization and Data Analysis • Information provisioning • Documentation, notification, training, community • Account/allocation management • Quality Assurance BW Full Service Overview
Community Engagement Function: Relationship Building • Partnership/Team Building • Structured Requirement Analysis • Workflow Systems • Business / operation rules • Collaborative environments • Intuitive user interfaces • Data storage, data management tools • Visualization and data analytics tools • Community engagement • Work Plan Management • Participation in evaluation and planning • Trust BW Full Service Overview
Staff Changes (estimated numbers) Technical staff breakdown Current Very Early Days Technical system administration 50 70 Applied R&D 100 40 User Support (from basic service to Customized disciplinary support) 50 20 Technical management (mid level to senior) 50 25 National Center for Supercomputing Applications
And Finally: Organizational Management • Hire and retain skilled staff • Continued professional development • Keep staff motivated and sharp • Proposals – competitions • Peer speaking engagements – personnel exchanges • Enable them to grow personally and professionally • Don’t micromanage – empower your staff to succeed, and let them • The MONEY – Always the Money!!! • Core funding – work closely with your core funding sources • Variety of competitive grant funding • Help your funding agencies understand the value of HPC and CyberInfrastructure, and what it takes to be successful. • It’s not cheap, and the ROI will take time to show value – but without a long term commitment from your core funding agency, it will be very, very difficult to accomplish. National Center for Supercomputing Applications
Questions? STEM Smart Workshop •10 April 2012 •Chicago, Illinois •http://iclcs.illinois.edu
Building Integrated Application/Decision Support Systems – It’s an Iterative Process of Teamwork User Representatives Team Participation Application Roadmaps Technology Roadmaps Requirements Analysis & Specification Cyberarchitecture Working Group Partners Integrated Project Teams TeraGrid Working Groups Advisory Committees Industrial Partners International Partners Portals & GUIs Workflow Mgmt S&E Applications Data Mining & Analysis Visualization Webservices Collaboratories Middleware Security Development & System Integration Prototype or Production Cyberenvironments Situation Analysis Knowledge and Decision Support National Center for Supercomputing Applications National Center for Supercomputing Applications
Science & Engineering Application Support Science Team (ST) Requirements and Challenges Gathering SEAS Staff and Points of Contact (PoC)
Advanced Information SystemsMajor New Data Sources Computers New high-end computers are producing massive amounts of data from ever more detailed computational models Sensors, Surveys and Satellites Sensor arrays, aerial surveys and satellite data will revolutionize our understanding of the environment Instruments New instruments, e.g., telescopes and detectors, are using advanced digital technologies to support increasingly detailed observations National Center for Supercomputing Applications
NDEMC - OVERVIEW • $5M, 18-month Public-Private Partnership (PPP) • 4 OEMs; 4 solution providers; • Phase 1: 8 manufacturing sector SMEs • Advanced modeling, simulation & analysis (MS&A) • Rationale: • MS&A adoption by OEMs is high and growing • SMMs’ use of advanced MS&A is suboptimal • ROI is definitely favorable • Objectives: • Boost MS&A adoption at SMMs • Simplified access to advanced MS&A • Demonstrate a scalable business model
Networks are Critical Infrastructure National Center for Supercomputing Applications