1 / 40

Advanced Cyberinfrastructure: An Engine for Competitiveness

Advanced Cyberinfrastructure: An Engine for Competitiveness. Steve Meacham National Science Foundation CASC Workshop September 6, 2006. Outline. NSF CI Vision High-End Computing Portfolio Data, COVO and LWD The TeraGrid. What is cyberinfrastructure?.

axel
Download Presentation

Advanced Cyberinfrastructure: An Engine for Competitiveness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Cyberinfrastructure: An Engine for Competitiveness Steve Meacham National Science Foundation CASC Workshop September 6, 2006

  2. Outline • NSF CI Vision • High-End Computing Portfolio • Data, COVO and LWD • The TeraGrid

  3. What is cyberinfrastructure? • Cyberinfrastructure for Science and Engineering Research and Education • Is the integration of the components of information technology necessary to advance the frontiers of scientific and engineering knowledge • Is the use of information technology to integrate research and education • Makes possible new modes of experimentation, observation, modeling, analysis, and collaboration • Is built with contributions from experts in many fields:- e.g. computer science, engineering and social science • Examples of CI components: • Optical, wired electrical, and wireless networking; simulation tools; high-performance computing; data analysis tools; data curation; tele-operation and tele-presence; visualization hardware and software; semantic mediation and query tools; digital workflows; middleware and high-performance system software; portal technology; virtual organizations and gateways; …

  4. Strategic Plan (FY 2006 – 2010) • Ch. 1: Call to Action Strategic Plans for: • Ch. 2: High Performance Computing • Ch. 3: Data, Data Analysis & Visualization • Ch. 4: Collaboratories, Observatories and Virtual Organizations • Ch. 5: Learning & Workforce Development http://www.nsf.gov/dir/index.jsp?org=OCI

  5. DATA Principal components HPC Data, Data Analysis, and Visualization High-Performance Computing COVO LWD Collaboratories, Observatories and Virtual Organizations Learning and Workforce Development

  6. Cyberinfrastructure Components

  7. Inside NSF • Cyberinfrastructure Council • Office of Cyberinfrastructure • Directorate Cyberinfrastructure Working Groups • Directorate Cyberinfrastructure Programs

  8. Office of CyberInfrastructure Dan Atkins Office Director José Muñoz Dep. Office Dir. Judy Hayden Joann Alquisa Priscilla Bezdek Mary Daley Irene Lombardo Data POC: Chris Greer HPC POC: Steve Meacham COVO POC: Kevin Thompson LDW POC: Miriam Heller Program staff: Chris Greer, Miriam Heller, Fillia Makedon, Steve Meacham, Vittal Rao, Frank Scioli, Kevin Thompson

  9. CI Budgets HPC hardware acquisitions, O&M, and user support as a fraction of NSF’s overall CI budget

  10. NSF CI FY07 Budget Request

  11. Examples of FY07 Areas of Emphasis • Leadership-class HPC system acquisition • Data- and collaboration-intensive software services • Confidentiality protection and user-friendly access for major social and behavioral science data collections • National STEM Digital Library (NSDL) supporting learners at all levels • CI-TEAM, preparing undergraduates, graduate students, postdocs and faculty to use cyberinfrastructure in research and education • Support for the Protein Data Bank (PDB), the international repository for information about the structure of biological macromolecules, and the Arctic Systems Sciences (ARCSS) Data Coordination Center

  12. Principal components HPC Data, Data Analysis, and Visualization High-Performance Computing Collaboratories, Observatories and Virtual Organizations Learning and Workforce Development

  13. HEC-enabled science and engineering • Impacts in many research fields • E.g. model economies; analysis of multi-sensor astronomical data; linguistic analysis; QCD & HEP analysis; cosmology; role of dark matter; chemistry; materials science; engineering; geoscience; climate; biochemistry; systems biology; ecosystem dynamics; genomics; proteomics; epidemiology; agent-based models of societies to test policy impacts; optimization; multi-scale, multi-science models e.g. envt + soc sci, Earth system models, earthquake + structural engineering, … • Transforming industry • Aircraft manufacturing; pharmaceuticals; engineering (inc nano- & bio-); oil exploration; entertainment; automobile manufacturing, new industries based on information mining, … • Part of the American Competitiveness Initiative

  14. Why invest in HEC? • High-performance computing as a tool of research is becoming ever more important in more areas of research • An inexorable trend over the past few decades • Shows no sign of stopping • Current examples, future examples • Understanding life • Understanding matter • Understanding the environment • Understanding society

  15. Why invest in HEC? Understanding life Satellite tobacco mosaic virus, P. Freddolino et al. Aldehyde dehydrogenase, T. Wymore and S. Brown Imidazole glycerol phosphate synthase, R. Amaro et al.

  16. Why invest in HEC? Understanding matter I. Shipsey

  17. Why invest in HEC? Understanding the environment K. Droegemeier et al. CCSM

  18. Why invest in HEC? Understanding society MoSeS: A dynamical simulation of the UK population. http://www.ncess.ac.uk/nodes/moses/BirkinMoses.pdf M. Birkin et al. John Q Public: A computational model that simulates how voters' political opinions fluctuate during a campaign. S.-Y. Kim, M. Lodge, C. Taber.

  19. Fe0.5Pt0.5 random alloy L10-FePt nanoparticle Magnetic NanocompositesWang (PSC) • Direct quantum mechanical simulation on Cray XT3. • Goal: nano-structured material with potential applications in high density data storage: 1 particle/bit. • Need to understand influence of these nanoparticles on each other. • A petascale problem: realistic simulations for nanostructures of ~ 50nm (~ 5M atoms). • LSMS- locally self-consistent multiple scattering method is a linear scaling ab initio electronic structure method (Gordon Bell prize winner) • Achieves as high as 81% peak performance of CRAY-XT3 Wang (PSC), Stocks, Rusanu, Nicholson, Eisenbach (ORNL), Faulkner (FAU)

  20. VORTONICSBoghosian (Tufts) • Physical challenges: Reconnection and Dynamos • Vortical reconnection governs establishment of steady-state in Navier-Stokes turbulence • Magnetic reconnection governs heating of solar corona • The astrophysical dynamo problem. Exact mechanism and space/time scales unknown and represent important theoretical challenges • Computational challenges: Enormous problem sizes, memory requirements, and long run times • requires relaxation on space-time lattice of 5-15 Terabytes. • uses geographically distributed domain decomposition (GD3): DTF, TCS, Lonestar • Real time visualization at UC/ANL • Insley (UC/ANL), O’Neal (PSC), Guiang (TACC) Homogeneous turbulence driven by force of Arnold-Beltrami-Childress (ABC) form

  21. Largest and most detailed earthquake simulation of the southern San Andreas fault. Calculation of physics-based probabilistic hazard curves for Southern California using full waveform modeling. Computation and data analysis at multiple TeraGrid sites. Workflow tools automate the very large number of programs and files that must be managed. TeraGrid staff Cui (SDSC), Reddy (GIG/PSC) TeraShake / CyberShakeOlsen (SDSU), Okaya (USC) Major Earthquakes on the San Andreas Fault, 1680-present 1906 M 7.8 1857 M 7.8 1680 M 7.7 Simulation of a magnitude 7.7 seismic wave propagation on the San Andreas Fault. 47 TB data set.

  22. Searching for New Crystal Structures Deem (Rice) • Searching for new 3-D zeolite crystal structures in crystallographic space • Requires 10,000s of serial jobs through TeraGrid. • Using MyCluster/GridShell to aggregate the computational capacity of the TeraGrid for accelerating search. • TG staff Walker (TACC) and Cheeseman (Purdue)

  23. HEC Program Elements • Acquisitions • Track 1 - Petascale • Track 2 - Mid-range supercomputers • Operations • HEC System Software Development • Compilers, fault-tolerant OS, fault-survivability tools, system status monitoring, file-systems, PSEs, … • HEC Petascale Application Development • Scalable math libraries, scalable algorithms, data exploration tools, performance profiling and prediction, large application development • Coordinated with other agencies

  24. Acquisition Strategy Science and engineering capability (logrithmic scale) Track 1 system(s) Track 2 systems Track 3: Typical university HPC systems FY06 FY07 FY08 FY09 FY10

  25. Track 2 Acquisitions • Individual systems - provide capabilities beyond those obtainable with university or state funds • Collectively, as part of TeraGrid - provide a diverse HPC portfolio to meet the HPC needs of the academic research community • Annual competition: roughly $30M/year for acquisition costs • O&M costs via a TeraGrid RP award • Primary selection criterion: impact on science and engineering research

  26. Track 1 Acquisition (FY07-10) • A system that will permit revolutionary science and engineering research • Capable of delivering large numbers of cycles and large amounts of memory to individual problems • Capable of sustaining at least 1015 arithmetic ops/second on a range of interesting problems • Have a very large amount of memory and a very capable I/O system • An architecture that facilitates scaling of codes • Robust system software with fault tolerance and fault prediction features • Robust program development tools that simplify code development • A single physical system in a single location

  27. Track 1 Acquisition (FY07-10) Examples of research problems: • The origin and nature of intermittency in turbulence • The interaction of radiative, dynamic and nuclear physics in stars • The dynamics of the Earth’s coupled carbon, nitrogen and hydrologic cycles • Heterogeneous catalysis on semiconductor and metal surfaces • The properties and instabilities of burning plasmas and investigation of magnetic confinement techniques • The formation of planetary nebulae • The interaction of attosecond laser pulse trains with polyatomic molecules • The mechanisms of reactions involving large bio-molecules and bio-molecular assemblages • The structure of large viruses • The interactions between clouds, weather and the Earth’s climate

  28. HPC Operations Track 1 & 2 • O&M for projected useful life awarded with acquisition funds • O&M approach assessed in review process HPCOPS • An opportunity for universities w/o Track 1 or 2 funding but who can leverage other funding to acquire large HPC systems • Will provide contribution to O&M in return for provision of HPC resources to national S&E community • These will be TeraGrid RP awards – aligned w/ TG time frame • Expect to be highly competitive • Funding opportunity this year (Nov 28, 2006); do not anticipate a similar competition next year Possible third model? • Provide contribution to acquisition costs if institution picks up O&M

  29. DATA Principal components Data, Data Analysis, and Visualization High-Performance Computing COVO LWD Collaboratories, Observatories and Virtual Organizations Learning and Workforce Development

  30. Strategic Plans for Data, COVO and LWD (FY 2006 – 2010) Data CI: - Investments will continue to be prioritized by science and engineering research and education needs - S&E data generated with NSF funds will be accessible & usable - Data CI includes tools to manage, locate, access, manipulate, and analyze data, mechanisms to maintain confidentiality, and tools to facilitate creation and management of metadata - Data CI will involve strong, international, inter-agency and public-private partnerships Challenges include: -Managing and analyzing very large datasets - Managing, analyzing, and using streaming data - Developing tools to permit research using confidential data COVO and LWD: To appear (August)

  31. The growth of observatories and virtual organizations Observatories - Based on ability to federate data-sets and data streams, some include instrument control, event detection and response, and some degree of virtualization - Examples: NVO, OOI, EarthScope, NEON, GEOSS Virtual organizations - A geographically dispersed community with common interests that uses cyberinfrastructure to integrate a variety of digital resources into a common working environment Supporting technologies - Portals, workflows, data analysis, models, streaming data, event detection, instrument/observatory control, networking, authentication/authorization, digital libraries, …

  32. IRNC International Research Network Connections • Components • TransPAC2 (U.S. – Japan and beyond) • GLORIAD, (U.S. – China – Russia – Korea) • Translight/PacificWave (U.S. – Australia) • TransLight/StarLight, (U.S. – Europe) • WHREN (U.S. – Latin America)

  33. CI-TEAM • A Foundation-wide effort to foster CI training and workforce devel’t • Started FY05 ($2.5M) - focused on demonstration projects • Anticipated funding in FY06: $10M - small and large activities • FY05: - 70 projects (101 proposals) received -11 projects funded • Broadening participation in CI • Alvarez (FIU) – CyberBridges • Crasta (VA Tech) – Project-Centric Bioinformatics • Fortson (Adler) – CI-Enabled 21st C. Astronomy Training for HS Science Teachers • Fox (IU) – Bringing MSI Faculty into CI & e-Science Communities • Gordon (OhSU) – Leveraging CI to Scale-up a Computational Science u/g Curriculum • Panoff (Shodor) – Pathways to Cyberinfrastructure: CI through Computational Science • Takai (SUNY Stonybrook) – High School Distributed Search for Cosmic Ray • Developing & implementing resources for CI workforce development • DiGiano (SRI) – Cybercollaboration between Scientists and Software Developers • Figueiredo (U FL) – In-VIGO/Condor-G Middleware for Coastal and Estuarine CI Training • Regli (Drexel) – CI for Creation and Use of Multi-disciplinary Engineering Models • Simpson (PSU) – CI-based Engineering Repositories for Undergraduates (CIBER-U)

  34. TeraGrid: an integrating infrastructure

  35. TeraGrid Offers: • Common user environments • Pooled community support expertise • Targeted consulting services (ASTA) • Science gateways • A portfolio of architectures • Exploring: • A security infrastructure that uses campus authentication systems • A lightweight, service-based approach to enable campus grids to federate with TeraGrid

  36. TeraGrid: What is It? • Integration of services provided by grid technologies • Distributed, open architecture. • GIG responsible for integration: • Software integration (including the common software stack, CTSS) • Base infrastructure (security, networking, and operations) • User support • Community engagement (including the Science Gateways activities) • 9 Resource Providers (with separate awards): • PSC, TACC, NCSA, SDSC, ORNL, Indiana, Purdue, Chicago/ANL, NCAR • Several other institutions participate in TeraGrid as a sub-awardees of the GIG • New sites may join as Resource Partners • TeraGrid: • Provides a unified, user environment to support high-capability, production-quality cyberinfrastructure services for science and engineering research. • Provides new S&E opportunities – by making possible new ways of using distributed resources and services • Examples of services include: • HPC • Data collections • Visualization servers • Portals

  37. Science Gateways • Specific examples of Virtual Organizations • Built to serve communities of practice by bring together a variety of resources in a customized portal • Examples include: • NanoHub • NEES • LEAD • SCEC Earthworks Project • NVO • http://www.teragrid.org/programs/sci_gateways/

  38. Science Gateways Biology and Biomedicine Science Gateway Computational Chemistry Grid (GridChem) Computational Science and Engineering Online (CSE-Online) GEON (GEOsciences Network) GIScience Gateway (GISolve) Grid Analysis Environment (GAE) Linked Environments for Atmospheric Discovery (LEAD) National Virtual Observatory (NVO) Network for Computational Nanotechnology and nanoHUB Network for Earthquake Engineering Simulation (NEES) Neutron Science Instrument Gateway Open Life Sciences Gateway Open Science Grid (OSG) SCEC Earthworks Project Special PRiority and Urgent Computing Environment (SPRUCE) TeraGrid Visualization Gateway The Telescience Project

  39. Goal: to create and maintain a powerful, stable, persistent, and widely accessible cyberinfrastructure to enable the work of science and engineering researchers and educators across the nation. NSF-Cyberinfrastructure

  40. Thank you.

More Related