250 likes | 361 Views
Beyond Branscomb. September 11, 2006 Dr. Francine Berman Director, San Diego Supercomputer Center Professor and High Performance Computing Endowed Chair, UC San Diego. The Branscomb Committee.
E N D
Beyond Branscomb September 11, 2006 Dr. Francine Berman Director, San Diego Supercomputer Center Professor and High Performance Computing Endowed Chair, UC San Diego
The Branscomb Committee • Charge:The Branscomb Committee was to assess the role of HPC for NSF constituent communities. The Committee focused in particular on 4 challenges • Challenge 1: How can NSF remove existing barriers to the evolution of HPC and make it broadly usable? • Challenge 2: How can NSF provide scalable access to a pyramid of computing resources? What balance of computational resources should NSF anticipate and encourage? • Challenge 3: How should NSF encourage broad participation in HPC? • Challenge 4: How can NSF best create the intellectual and management leadership for the future of high performance computing in the U.S.? What role should NSF play wrt the HPCC program and other agencies? The Branscomb Report TITLE: From Desktop to TeraFlop: Exploiting the U.S. Lead in High Performance Computing AUTHORS: NSF Blue Ribbon Panel on High Performance Computing(Branscomb, Belytschko, Bridenbaugh, Chay, Dozier, Grest, Hays, Honig, Lane, Lester, McCrae, Sethian, Smith, Vernon DATE: August, 1993
The Branscomb Pyramid • Major Recommendations from the Branscomb Report • NSF should make investments at all levels of the Branscomb Pyramid as well as investments in aggregating technologies (today’s cluster and grid computing). NSF should make balanced investments. • Increase support of HPC-oriented SW. algorithm, and model development • Coordinate and continue to invest in Centers. Develop allocation committees to facilitate use of resources in community. • Develop an OSTP advisory committee representing states, HPC users, NSF Centers, computer manufacturers, computer and computational scientists to facilitate state-federal planning for HPC.
The Branscomb Pyramid, circa 2006 Leadership Class 100’s of TFs Large-scale campus/commercial resources, Center supercomputers 10’s of TFs Medium-scale Campus/Commercial Clusters 1’s of TFs 10’s of GFs Small-scale, desktop, home
The Branscomb Pyramid and U.S. Competitiveness Leader-ship Class Spots 1-10 Large-scale resources, center supercomputers Spots 11-50 Medium-scale Campus/Commercial Clusters Spots 51-500 Small-scale, desktop, home Everyone Else According to the last Top500 List (June 2006), • Leadership Class (1-10) – 6 US machines • 5 machines (1, 3, 4, 6, 9) at DOE national laboratories (LLNL, NASA Ames, Sandia) and 1 machine (2) at a U.S. corporation have spots • Large-scale (11-50) – 19 US machines • 3 machines (23, 26, 28) at U.S. academic institutions (IU, USC, Virginia Tech) • 2 machines (37, 44) at NSF centers (NCSA, SDSC) • 5 machines (13, 14, 24, 25, 50) at DOE national laboratories (ORNL, LLNL, LANL, PNNL) • 4 machines (20, 32, 33, 36) at other federal facilities (ERDC MSRC, Wright-Patterson, ARL, NAVOCEANO) • 5 machines (19, 21, 31, 39, 41) at US corporations (IBM, Geoscience, COLSA) • Medium-scale (51-500) – 273 US machines • 38 are in the academic sector
Who is Computing on the Branscomb Pyramid? Leader-ship Class Spots 1-10 100’s of TFs Large-scale resources, center supercomputers Spots 11-50 10’s of TFs Medium-scale Campus/Commercial Clusters Spots 51-500 1’s of TFs Small-scale, desktop, home Everyone Else, 10’s of GFs More than 15,000,000 students attend college. The number of degrees in Science and Engineering exceeds 500,000 There are ~2500 accredited institutions of higher education in the U.S. * * Ballpark numbers • Leadership Class (1-10) • DOE users, industry researchers, Japanese academics and researchers, German and French researchers • Large-scale (11-50) (5 academic) • Campus researchers, DOE and government users, industry users • National open academic community at SDSC, NCSA, IU (around 50 TF in aggregate) • Medium-scale (51-500) (38 academic) • Campus researchers, federal agency users, industry users • National open academic community on TeraGrid (not including above -- around 50 TF in aggregate)
Leader-ship Class Large-scale resources, center supercomputers Medium-scale Campus/Commercial Clusters Small-scale, desktop, home Competitiveness at all Levels Currently U.S. dominating. Top500 “bragging rights”. Federal support required Potential for breakthrough “pioneer” computational science discoveries No coordinated approach to national research infrastructure.Wide variability in coverage, use, service, support Mid-levels the focus of almost all academic and commercial R&D –lion’s share of new results and discoveries Cost-effective user supportedcommercial model IT-literate workforce
Balancing Investments in Branscomb Branscomb Recommendations Revisited NSF should make investments at all levels of the Branscomb Pyramid as well as investments in aggregating technologies (today’s cluster and grid computing). NSF should make balanced investments. Increase support of HPC-oriented SW. algorithm, and model development Coordinate and continue to invest in Centers. Develop allocation committees to facilitate use of resources in community. Develop an OSTP advisory committee representing states, HPC users, NSF Centers, computer manufacturers, computer and computational scientists to facilitate state-federal planning for HPC. • If HPC is to become the ubiquitous enabler of science and engineering envisioned in Branscomb Report (and every report since), we need to re-focus on providing • Enough cycles to cover the broad needs of academic researchers and educators on-demand and without high barriers to access • Usable and scalable software tools with useful documentation • “You’ve got 1024 processors and you can only smile and wave at them” HPC user • Professional-class strategy for SW sharing, standards, development environments
Fran’s “No User Left Behind” Initiative “No User Left Behind” Goal:Sufficient and usable computational resources to support computationally-oriented research and education throughout the U.S. academic community How (Fran’s 5 step program for computational health) • Do market research– what is adequate coverage for the university community? Where are the gaps in coverage in the US? • Get creative-- Work with the private sector and universities to develop a program for adequate coverage of computational cycles (we’re doing it with networking to K-12, no reason we can’t do it with computation for 12+) • Fund support professionals– every facility should have sys admins and help desk people – they should be part of a national organization which meets to exchange best practices and helps develop standards • Raise the bar on SW– private sector should step up and work with academia to improve HPC environments. Professors and grad students cannot provide robust SW tools with adequate documentation and evolutionary support • Get serious about data – many HPC applications involve significant data input or output – HPC efforts and data efforts must be coupled
NVO – 100+ TB SCEC – 153 TB Astronomy Geosciences Projected LHC Data – 10 PB/year JCSG/SLAC – 15.7 TB On the Horizon: Emerging Data Crisis will Increasingly Impact Computational Users • More academic, professional, public, and private users use their computers to access data than for computation • Data management, stewardship and preservation fundamental for new advances and discovery Physics Life Sciences
Data (more BYTES) Today’s Applications Cover the Spectrum Large-scale data required as input, intermediate, output for many modern HPC applications Applications vary with respect to how well they can perform in distributed mode (grid computing) Data-oriented Science and Engineering Applications Analogue of High Performance Computing (HPC)is High Reliability Data (HRD) TeraShake PDB applications NVO Home, Lab, Campus, Desktop Applications Medium, Large, and Leadership HPC Applications Everquest MolecularModeling Quicken Compute (more FLOPS)
Target Collections Reference, nationally important, and irreplaceable data collections. (PDB, PSID, Shoah, Presidential Libraries, etc.) NationalScale Research and project data collections. Regional Scale Personal data collections Local Scale Applying Branscomb to Data:The Data Pyramid Facilities National-scale data repositories, archives, and libraries. Maintained by professionals. High capacity, high reliability Regional libraries and targeted data centers. Maintained by professionals. Medium capacity, medium-high reliability Private repositories. Supported by users or their proxies. Low-medium reliability, low capacity
Local Scale Adapting to a Digital World Emerging commercial opportunities
Data Storage for Rent • Cheap commercial data storage is moving us from a “napster model” (data is accessible and free) to an “iTunes model” (data is accessible and inexpensive)
Amazon S3 (Simple Storage Service) • Storage for Rent: • Storage is $.15 per GB per month • $.20 per GB data transfer (to and from) • Write, read, delete objects containing 1 GB-5GB (number of objects is unlimited), access controlled by user • For $2.00 +, you can store for one year • Lots of high resolution family photos • Multiple videos of your children’s recitals • Personal documentation equivalent to up to 1000 novels, etc. Should we store the NVO with Amazon S3? The National Virtual Observatory (NVO) is a critical reference collection for the astronomy community of data from the world’s large telescopes and sky surveys.
A Thought Experiment • What would it cost to store the SDSC NVO collection (100 TB) on Amazon? • 100,000 GB X $2 (1 ingest, no accesses + storage for a year) = $200K/year • 100,000 GB X $3 (1 ingest, average 5 accesses per GB stored + storage for a year) = $300K/year • Not clear: • How many copies Amazon stores • Whether the format is well-suited for NVO • Whether the usage model would make the costs of data transfer, ingest, access, etc. infeasible, etc. • If Amazon constitutes a “trusted repository” • What happens to your data when you stop paying, etc. • What about the CERN LHC collection (10 PB/year)? • 10,000,000 GB X $2 (1 ingest, no accesses per item + storage for a year) = $20M/year
NationalScale Regional Scale Local Scale The most valuable research data is in the most danger Reference and irreplaceable data require long-term preservation and reliable stewardship No real sustainable plan Universities and libraries can provide greater support butthey need help Emerging commercial opportunities
Providing Sustainable and Reliable Data Infrastructure Incurs Real Costs Less risk means more replicants, more resources, more people
Supporting Long-lived Data: What Happens if We Don’t Preserve Our Most important Reference Collections? Life sciences research would have the resources available in roughly the 1970’s – no PDB, no Swiss-Prot, no PubMed, Etc. Federal, state, and local records would need to remain on paper. Without preservation, digital history is only as old as the current storage media. UCSD Libraries New discoveries from climate and other predictive simulation models which utilize longitudinal data would dramatically slow iTunes would store only current music, NetFlix would provide only current movies
Consortium Chronopolis: Using the Data Grid to support Long-Lived Data • Chronopolis provides a comprehensive approach to infrastructure for long-term preservationintegrating • Collection ingestion • Access and Services • Research and developmentfor new functionality and adaptation to evolving technologies • Business model, data policies, and managementissues critical to success of the infrastructure SDSC , the UCSD Libraries, NCAR, UMd , NARA working together on long-term preservation of digital collections
Chronopolis Federation architecture NCAR U Md UCSD Chronopolis Site Chronopolis – Replication and Distribution • 3 replicas of valuable collections considered reasonable mitigation for risk of data loss • Chronopolis Consortium will store 3 copies of preservation collections: • “Bright copy”– Chronopolis site supports ingestion, collection management, user access • “Dim copy”– Chronopolis site supports remote replica of bright copy and supports user access • “Dark copy”– Chronopolis site supports reference copy that may be used for disaster recovery but no user access • Each site may play different roles for different collections Dim copy C1 Dark copy C1 Dark copy C2 Bright copy C2 Bright copy C1 Dim copy C2
NationalScale Regional Scale Local Scale Creative Business Models Needed to Support Long-lived Data • Data preservation infrastructure need not be an infinite, increasing mortgage • Creative solutions are possible • Relay funding • Consortium support • Recharge • Use fees • Hybrid models, and other support mechanisms can be used to create sustainable business models
Whining Beyond Branscomb Current competitions providing a venue for a broader set of players and experts Our best and our brightest are becoming lean, mean competition machines – does this really serve the science and engineering community best? • We’re getting good at circling the wagons and pointing the guns inward, isn’t it time we turned things around? What will it take forall of USto take the leadership to better focus CS infrastructure, research, and development efforts?