320 likes | 453 Views
Albert-Einstein-Institut www.aei-potsdam.mpg.de. This work results from many great collaborations:. AEI-PotsdamG. Allen, B. Brgmann, T. Goodale, J. Mass, T. Radke, W. Benger, many physicists contributing to scientific parts of code...ZIB-BerlinChristian Hege, Andre Merzky, ...RZG-Garc
E N D
1. Albert-Einstein-Institut www.aei-potsdam.mpg.de Metacomputing and Solving Einsteins Equations for Black Holes, Neutron Stars, and Gravitational Waves Solving Einsteins Equations and Impact on Computation
Large collaborations essential and difficult! Code becomes the collaborating Tool.
Cactus, a new community code for 3D GR-Astrophysics
Toolkit for many PDE systems
Suite of solvers for Einstein system
Metacomputing for the general user: what a scientist really wants and needs
Distributed Computing Experiments with Cactus/Globus
2. Albert-Einstein-Institut www.aei-potsdam.mpg.de This work results from many great collaborations: AEI-Potsdam
G. Allen, B. Brügmann, T. Goodale, J. Massó, T. Radke, W. Benger, + many physicists contributing to scientific parts of code...
ZIB-Berlin
Christian Hege, Andre Merzky, ...
RZG-Garching
Ulli Schwenn, Herman Lederer, Manuel Panea, ...
Network providers
DTelekom, DFN-Verein, Canarie/Teleglobe, Star Tap/vBNS
NCSA
John Shalf, Jason Novotny, Meghan Thornton, ...
Washington University
Wai-Mo Suen, Mark Miller, Malcolm Tobias, ...
Argonne
Ian Foster, Warren Smith, ...
3. Albert-Einstein-Institut www.aei-potsdam.mpg.de Einsteins Equations and Gravitational Waves Einsteins General Relativity
Fundamental theory of Physics (Gravity)
Among most complex equations of physics
Dozens of coupled, nonlinear hyperbolic-elliptic
equations with 1000s of terms
Barely have capability to solve after a century
Predict black holes, gravitational waves, etc.
Exciting new field about to be born: Gravitational Wave Astronomy
Fundamentally new information about Universe
What are gravitational waves??: Ripples in spacetime curvature, caused by matter motion, causing distances to change:
A last major test of Einsteins theory: do the exist?
Eddington: Gravitational waves propagate at the speed of thought
4. Albert-Einstein-Institut www.aei-potsdam.mpg.de Detecting Gravitational Gravitational Waves
5. Albert-Einstein-Institut www.aei-potsdam.mpg.de Computational Needs for 3D Numerical Relativity Explicit Finite Difference Codes
~ 104 Flops/zone/time step
~ 100 3D arrays
Require 10003 zones or more
~1000 Gbytes
Double resolution: 8x memory, 16x Flops
TFlop, Tbyte machine required
Parallel AMR, I/O essential
Etc...
6. Albert-Einstein-Institut www.aei-potsdam.mpg.de Example simulation: gravitational waves forming a BH in 3D (First such simulation!) Better quality underway right now at NCSA...
7. Albert-Einstein-Institut www.aei-potsdam.mpg.de (A Single) Such Large Scale Computation Requires Incredible Mix of Varied Technologies and Expertise! Many Scientific/Engineering Components
formulation of EEs, equation of state, astrophysics, hydrodynamics, etc.
Many Numerical Algorithm Components
Finite differences? Finite elements? Structured meshes?
Hyperbolic equations: implicit vs implicit, shock treatments, dozens of methods (and presently nothing is fully satisfactory!)
Elliptic equations: multigrid, Krylov subspace, spectral, preconditioners
Mesh Refinement?
Many Different Computational Components
Parallelism (HPF, MPI, PVM, ???)
Architecture Efficiency (MPP, DSM, Vector, NOW, ???)
I/O Bottlenecks (generate gigabytes per simulation, checkpointing
)
Visualization of all that comes out!
8. Albert-Einstein-Institut www.aei-potsdam.mpg.de This is fundamental question addressed by Cactus. Clearly need huge teams, with huge expertise base to attack such problems...
In fact, need collections of communities to solve such problems...
But how can they work together effectively?
We need a simulation code environment that encourages this...
9. Albert-Einstein-Institut www.aei-potsdam.mpg.de NSF Black Hole Grand Challenge Alliance (1993-1998) University of Texas (Matzner, Browne)
NCSA/Illinois/AEI (Seidel, Saylor, Smarr, Shapiro, Saied)
North Carolina (Evans, York)
Syracuse (G. Fox)
Cornell (Teukolsky)
Pittsburgh (Winicour)
Penn State (Laguna, Finn)
10. Albert-Einstein-Institut www.aei-potsdam.mpg.de NASA Neutron Star Grand Challenge (1996-present) NCSA/Illinois/AEI (Saylor, Seidel, Swesty, Norman)
Argonne (Foster)
Washington U (Suen)
Livermore (Ashby)
Stony Brook (Lattimer)
11. Albert-Einstein-Institut www.aei-potsdam.mpg.de What we learn from Grand Challenges Successful, but also problematic
No existing infrastructure to support collaborative HPC
Most scientists are bad Fortran programmers, and NOT computer scientists (especially physicists
like me
); suspicious of PSEs, want complete control/access to their source code
Many sociological issues of large collaborations and different cultures
Many language barriers...
Applied mathematicians, computational
scientists, physicists have very different concepts
and vocabularies
Code fragments, styles, routines often clash
Successfully merged code (after years) often impossible to transplant into more modern infrastructure (e.g., add AMR or switch to MPI
)
Many serious problems...
12. Albert-Einstein-Institut www.aei-potsdam.mpg.de Large Scale Scientific/Engineering Collaboration
13. Albert-Einstein-Institut www.aei-potsdam.mpg.de Cactus: new concept in community developed simulation code infrastructure Generally: Numerical/computational infrastructure to solve PDEs
Specifically:
Modular Code for Solving Einstein Equations
Over two dozen developers in an international collaboration in numerical relativity working through flexible, open, modular code infrastructure
Cactus Divided in Flesh (core) and Thorns (modules or collections of subroutines)
Parallelism largely automatic and hidden (if desired) from user
Very modular, but with fixed interface between flesh and thorns
User specifies flow: when to call thorns; code switches memory on and off
User choice between Fortran and C; automated interface between them
Freely available, open community source code: spirit of gnu/linux
The code becomes the collaborating tool, just an accelerator is the focus of high energy physics experiment.
14. Albert-Einstein-Institut www.aei-potsdam.mpg.de Cactus Computational Tool Kit(Allen, Massó, Goodale, Walker) Flesh (core) written in C
Thorns (modules) grouped in packages written in F77, F90, C, C++
Thorn-Flesh interface fixed in 3 files written in CCL (Cactus Configuration Language):
interface.ccl: Grid Functions, Arrays, Scalars (integer, real, logical, complex)
param.ccl: Parameters and their allowed values
schedule.ccl: Entry point of routines, dynamic memory and communication allocations
Object oriented features for thorns (public, private, protected variables, inheritance) for clearer interfaces
15. Albert-Einstein-Institut www.aei-potsdam.mpg.de Toolkits
16. Albert-Einstein-Institut www.aei-potsdam.mpg.de Computational Toolkit: provides parallel utilities (thorns) for computational scientist Choice of parallel library layers (presently MPI-based)
Portable, efficient (T3E, SGI, Dec Alpha, Linux, NT Clusters
)
3 mesh refinement schemes: Nested Boxes, DAGH, HLL (coming
)
Parallel I/O (Panda, FlexIO, HDF5, etc
)
Parameter Parsing
Elliptic solvers (Petsc, Multigrid, SOR, etc
)
Visualization Tools
Globus
INSERT YOUR CS MODULE HERE...
To be maintained by AEI and NCSA
17. Albert-Einstein-Institut www.aei-potsdam.mpg.de How to use Cactus: Avoiding the MONSTER code syndrome... [Optional: Develop thorns, according to some rules
e.g. specify variables through interface.ccl)
Specify calling sequence of the thorns for given problem and algorithm (schedule.ccl)]
Specify which thorns are desired for simulation (ADM+leapfrog +HRSC hydro+AH finder+wave extraction+AMR+
)
Specified code is then created, with only those modules, those variables, those I/O routines, that AMR system,
, needed
Subroutine calling lists generated automatically
Automatically created for desired computer architecture
Run it
Training/Tutorial at NCSA Aug 16-21 this summer...
18. Albert-Einstein-Institut www.aei-potsdam.mpg.de It works: dozens of people in seed community, with different backgrounds, personalities, on different continents, work together effectively.
Connected modules actually work together, largely without collisions.
Test suites used to ensure integrity of physics.
Basis for various CS Research Projects
I/O, AMR, Scaling, Elliptic Solvers, Distributed Computing, Etc
http://cactus.aei-potsdam.mpg.de Current Cactus Picture: Preparing for Public Release
19. Albert-Einstein-Institut www.aei-potsdam.mpg.de Excellent scaling on many architectures
Origin up to 256 processors
T3E up to 1024
NCSA NT cluster up to 128 processors
Achieved 142 Gflops/s on 1024 node T3E-1200 (benchmarked for NASA NS Grand Challenge)
But, of course, we want much more
metacomputing...
20. Albert-Einstein-Institut www.aei-potsdam.mpg.de
21. Albert-Einstein-Institut www.aei-potsdam.mpg.de
22. Albert-Einstein-Institut www.aei-potsdam.mpg.de Metacomputing: harnessing power when and where it is needed Einstein equations require extreme memory, speed
Largest supercomputers too small!
Networks very fast!
DFN Gigabit testbed: 622 Mbits Potsdam-Berlin-Garching, connect multiple supercomputers
Gigabit networking to US possible
Connect workstations to make supercomputer
Acquire resources dynamically during simulation!
Seamless computing and visualization from anywhere
Many metacomputing experiments in progress connecting Globus + Cactus...
23. Albert-Einstein-Institut www.aei-potsdam.mpg.de What we need and want : I. Exploration Got an idea? Write cactus module, link to other exisiting modules, and
Find Resources for interactive use: Garching? ZIB? NCSA? SDSC?
Launch simulation. How?
Watch simulation as it progresses... Need live visualization
Limited bandwidth: compute viz. inline with simulation
High bandwidth: ship data to be visualized locally
Call in an expert colleague
let her watch it too
Sharing data space
Remote collaboration tools
24. Albert-Einstein-Institut www.aei-potsdam.mpg.de Distributing Spacetime: SC97 Intercontinental Metacomputing at AEI/Argonne/Garching/NCSA1999: about to become part of production code!
25. Albert-Einstein-Institut www.aei-potsdam.mpg.de What we need and want: II. Production Find resources:
Where?
How many computers?
Big jobs: Fermilab at disposal: must get it right while the beam is on!
Launch Simulation
How do get executable there?
How to store data?
What are local queue structure/OS idiosyncracies?
Monitor the simulation
Remote Visualization live while running
Visualization server: all privileged users can login and check status/adjust if necessary...Interactive Steering
Are parameters screwed up? Very complex?
Is memory running low? AMR! What to do? Refine selectively or acquire additional resources via Globus? Delete unecessary grids?
Postprocessing and analysis
26. Albert-Einstein-Institut www.aei-potsdam.mpg.de Metacomputing the Einstein Equations:Connecting T3Es in Berlin, Garching, San Diego
27. Albert-Einstein-Institut www.aei-potsdam.mpg.de Details of our experiments... Different modalities of live visualization
Viz computed in parallel with simulation: can save factors of 100 in data to be transferred, while adding minimal amount to simulation time...
Data shipped and processed elsewhere: if bandwidth is sufficient, or algorithm prefers it, ship it all and process viz. locally...
Scaling on multiple machines
Tradeoffs between memory and performance
Optimizations can be done to make it efficient enough to justify doing it...
28. Albert-Einstein-Institut www.aei-potsdam.mpg.de
29. Albert-Einstein-Institut www.aei-potsdam.mpg.de Scaling of Cactus on two T3Es on different continents
30. Albert-Einstein-Institut www.aei-potsdam.mpg.de Analysis of metacomputing experiments It works! (Thats the main thing we wanted at SC98
)
Cactus not optimized for metacomputing: messages too small, lower MPI bandwidth, could be better:
ANL-NCSA
Measured bandwidth 17Kbits/sec (small) --- 25Mbits/sec (large)
Latency 4ms
Munich-Berlin
Measured bandwidth 1.5Kbits/sec (small) --- 4.2Mbits/sec (large)
Latency 42.5ms
Within single machine: Order of magnitude better
Bottom Line:
Expect to improve performance significantly with work
Can run much larger jobs on multiple machines
31. Albert-Einstein-Institut www.aei-potsdam.mpg.de Colliding Black Holes and MetaComputing: German Project supported by DFN-Verein Solving Einsteins Equations
Developing Techniques to Exploit High Speed Networks
Remote Visualization
Distributed Computing Across OC-12 Networks between AEI (Potsdam), Konrad-Zuse-Institut (Berlin), and RZG (Garching-bei-München)
32. Albert-Einstein-Institut www.aei-potsdam.mpg.de