230 likes | 353 Views
Cosmic-Scale Applications for Cyberinfrastructure. NSF MPS Cyberscience Workshop NSF Headquarters Arlington, VA April 21, 2004. Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technologies Harry E. Gruber Professor,
E N D
Cosmic-Scale Applicationsfor Cyberinfrastructure NSF MPS Cyberscience Workshop NSF Headquarters Arlington, VA April 21, 2004 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technologies Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD
Cosmic-Scale Science:Cyberinfrastructure Links Theory with Observation • Two Examples • Formation of Structures in the Early Universe • Black Hole Collisions and Gravitational Radiation • Common Features Emerge • $ Billions of New Instruments Generating Data • Much More Powerful Supercomputers Needed • Sophisticated Software Key • eg, Automatic Mesh Refinement • Cyberinfrastructure Required for Data Produced • Federated Repositories • Data Grid Middleware • Local Laboratory Standards-Based Clusters
Fundamental Physics Challenge:Formation of First Galaxies and Clusters: Hubble Ultra Deep Field NASA, ESA, S. Beckwith (STScI) and the HUDF Team 380,000 yr Faintest galaxies ~ 1 billion years old Galaxy population is strongly evolving NASA WMAP Source: Mike Norman, UCSD
Formation & Evolution of Galaxies:$Billions of New Digital Observatories Many Open Questions Are Being Investigated Observationally • Nature and Occurrence of the First Galaxies “First Light” (JWST, ALMA) • Properties of High-Z Galaxies (HST, ALMA) Galaxy Building Blocks? • Source(s) of Early Reionization (WMAP) • Star Formation History of Galaxies (Spitzer) • Emergence of the Hubble Types (DEEP2) • Influence of Environment on Galaxy Type and Large Scale Structure (SDSS) • Supermassive Black Hole Formation and AGN/QSO Phenomena In Galaxies (SDSS, HST, CXO) Source: Mike Norman, UCSD
Cosmic Simulator with a Billion Zone and Gigaparticle Resolution Source: Mike Norman, UCSD Compare with Sloan Survey SDSC Blue Horizon
Why Does the Cosmic SimulatorNeed Cyberinfrastructure? • One Gigazone Run: • Generates ~10 TeraByte of Output • A “Snapshot” is 100 GB • Need to Visually Analyze as We Create SpaceTimes • Visual Analysis Daunting • Single Frame is About 8GB • A Smooth Animation of 1000 Frames is 1000 x 8 GB=8TB • Stage on Rotating Storage to High Res Displays • Can Run Evolutions Faster than We can Archive Them • File Transport Over Shared Internet ~50 Mbit/s • 4 Hours to Move ONE Snapshot! • Many Scientists Will Need Access for Analysis Source: Mike Norman, UCSD
Limitations of Uniform Grids for Complex Scientific and Engineering Problems Gravitation Causes Continuous Increase in Density Until There is a Large Mass in a Single Grid Zone 512x512x512 Run on 512-node CM-5 Source: Greg Bryan, Mike Norman, NCSA
Solution: Develop Automatic Mesh Refinement (AMR) to Resolve Mass Concentrations 64x64x64 Run with Seven Levels of Adaption on SGI Power Challenge, Locally Equivalent to 8192x8192x8192 Resolution Source: Greg Bryan, Mike Norman, John Shalf, NCSA
AMR Allows Digital Exploration of Early Galaxy and Cluster Core Formation • Background Image Shows Grid Hierarchy Used • Key to Resolving Physics is More Sophisticated Software • Evolution is from 10Myr to Present Epoch • Every Galaxy > 1011 Msolar in 100 Mpc/H Volume Adaptively Refined With AMR • 2563 Base Grid • Over 32,000 Grids At 7 Levels Of Refinement • Spatial Resolution of 4 kpc at Finest • 150,000 CPU-hr On NCSA Origin2000 • Completed In 1999 • 5123 AMR or 10243 Unigrid Now Feasible • 8-64 Times The Mass Resolution • Can Simulate First Galaxies Source: Mike Norman, UCSD
Hydrodynamic Cosmology Simulation of Galaxy Formation Using Parallel Adaptive Mesh Refinement (Enzo) Simulation: M. Norman (UCSD) Image credit: Donna Cox, Bob Patterson (NCSA)
Cosmic Simulator:Thresholds of Capability and Discovery LSS Hubble types • 2000: Formation of Galaxy Cluster Cores (1 TFLOP/s) • 2006: Properties of First Galaxies (40 TFLOP/s) • 2010: Emergence of Hubble Types (150 TFLOP/s) • 2014: Large Scale Distribution Of Galaxies By Luminosity And Morphology (500 TFLOP/s) Source: Mike Norman, UCSD
Proposed Galaxy Simulation Cyber-Grid Production Simulated Galaxy Grid • Developer Grid • Enzo Code • Data Mgmt • Analysis Tools • Visualization • Middleware • User Grid • Modelers • Observers • Visualizers Enzo Data Grid Portal Interface Simulated Galaxy Archive • Observational Survey Partners • SDSS • DEEP2 • SWIRE • Outreach • Tutorials • Animations • PBS Nova Enzo Data Analysis Tools Enzo Simulation Code NSF NMI, PI: M. Norman, UCSD
LIGO, VIRGO, GEO and LISASearch for Gravitational Waves • $1B Being Spent On Ground-Based LIGO/VIRGO/GEO and Space-Based LISA • Use Laser Interferometers To Detect Waves • Matched Filtering of Waveforms Requires Large Numbers of Simulations • Stored In Federated Repositories • LISA’s Increased Sensitivity Vastly Opens Parameter Space: • Many Orders Of Magnitude More Parameter Space to be Searched! Virgo-Pisa Source: Ed Seidel, LSU LIGO-Hanford
Two Body Problem in General Relativity -The Collision of Two Black Holes Numerical Solution of Einstein Equations Required Problem Solution Started 40 Years Ago, 10 More to Go Wave Forms Critical for NSF LIGO Gravitational Wave Detector A PetaFLOPS-Class Grand Challenge Oct. 10, 1995 Matzner, Seidel, Shapiro, Smarr, Suen, Teukolsky, Winicuor
The Numerical Two Black Hole ProblemSpans the Digital Computer Era Modern Era Eppley Thesis Smarr Thesis Hahn & Lindquist Cadez Thesis DeWitt/Misner -Chapel Hill DeWitt-LLNL Lichnerowicz Megaflop Kiloflop Gigaflop Teraflop
Relative Amount of Floating Point Operationsfor Three Epochs of the 2BH Collision Problem 9,000,000X 1963 Hahn & Lindquist IBM 7090 One Processor Each 0.2 Mflops 3 Hours 1977 Eppley & Smarr CDC 7600 One Processor Each 35 Mflops 5 Hours 1999 Seidel & Suen, et al. SGI Origin 256 Processors Each 500 Mflops 40 Hours 30,000X 300X 10,000x More Required!
What is Needed to Finish the Computing Job • Current Black Hole Jobs • Grid: 768 X 768 X 384 Memory Used: 250+ GB • Runtime:~ Day Or More Output: Multi-TB+ (Disk Limited) • Inspiraling BH Simulations Are Volume Limited • Scale As N3-4 • Low-Resolution Simulations of BH Collisions: • Currently Require O(1015) FLOPS: • High-Resolution Inspiraling Binaries Need: • Increased Simulation Volume, Evolution Time, And Resolution - And O(1020 +) Flops • 50-100TF With Adaptive Meshes Will Make This Possible Source: Ed Seidel, LSU
Why Black Hole SimulationsNeed Cyberinfrastructure • Software Development is Key • Use Adaptive Meshes to Accurately Resolve Metric • ~10 Levels Of Refinement, • Several Machine-Days Per Spacetime • Output • Minimal 25-100TB For Full Analysis (Multiple Orbits) of: • Gravitational Waves • Event Horizon Structure Evolution • Real-Time Scheduling Needed Across Multiple Resources For Collaborative Distributed Computing • Spawning (For Analysis, Steering Tasks), Migration • Interactive Viz From Distributed Collaborations • Implies Need for Dedicated Gigabit Light Pipes (Lambdas) Source: Ed Seidel, LSU
Ensembles Of Simulations Needed for LIGO, GEO, LISA Gravitational Wave Astronomy • Variations for Internal Approximations • Accuracy, Sensitivity Analysis To Gauge Parameters, Resolution, Algorithms • Dozen Simulations Per Physical Scenario • Variations In Physical Scenarios-->Waveform Catalogs • Masses, Spins, Orbital Characteristics Varied • Huge Parameter Space To Survey • In Total: 103 - 106 Simulations Needed • Potentially Generating 25TB Each • Stored In Federated Repositories • Data Analysis Of LIGO, GEO, LISA Signals • Interacting With Simulation Data • Managing Parameter Space/Signal Analysis Source: Ed Seidel, LSU
To a Grid “Supercomputers” are Just High Performance Data Generators • Similar to Particle Accelerators, Telescopes, Ocean Observatories, Microscopes, etc. • All Require: • Web Portal Access for Real-Time Instrument Control • Grid Middleware for Security, Scheduling, Reservations • Federated Repositories for Data Archiving • Data Grids for Data Replication and Management • High Performance Networking to Deal With Data Floods • Local Visualization and Analysis Facilities • Multi-Site Multi-Modal Collaboration Software • That is—a Cyberinfrastructure!
NSF Must Increase Funding for Community Software/Toolkit Development • Major Problem To Enable Community • Modern Software Engineering • Training • User Support • Require Toolkits For: • Sharing/Developing Of Community Codes • Algorithmic Libraries, e.g. AMR • Local Compute, Storage, Visualization, & Analysis • Federated Repositories • Grid Middleware • Lambda Provisioning
LambdaGrid Required to Support the Distributed Collaborative Teams • Grand Challenge-Like Teams Involving US and International Collaborations • Example: GWEN (Gravitational Wave European Network) Involves 20 Groups! • Simulation Data Stored Across Geographically Distributed Spaces • Organization, Access, Mining Issues • Collaborative Data Spaces to Support Interaction with: • Colleagues, Data, Simulations • Need Lambda Provisioning For: • Coupling Supercomputers and Data Grid • Remote Visualization And Monitoring Of Simulations • Analysis Of Federated Data Sets By Virtual Organizations Source: Ed Seidel, LSU
Special Thanks to: • Ed Seidel • Director, Center for Computation and Technology, • Department of Physics and Astronomy, • Louisiana State University • & Albert-Einstein-Institut • Potsdam, Germany • Representing dozens of scientists • Michael Norman • Director, Laboratory for Computational Astrophysics • Physics Department, • UC San Diego • Members of the OptIPuter Team