320 likes | 483 Views
End-to-end data management capabilities in the GPSC & CPES SciDAC’s: Achievements and Plans. SDM AHM December 11, 2006 Scott A. Klasky End-to-End Task Lead Scientific Computing Group ORNL. GPSC. Outline. Overview of GPSC activities. The GTC and GEM codes.
E N D
End-to-end data management capabilities in the GPSC & CPES SciDAC’s: Achievements and Plans SDM AHM December 11, 2006 Scott A. Klasky End-to-End Task Lead Scientific Computing Group ORNL GPSC
Outline • Overview of GPSC activities. • The GTC and GEM codes. • On the path to petascale computing. • Data Management Challenges for GTC. • Overview of CPES activities. • The XGC and M3D codes. • Code Coupling. • Workflow Solutions. • ORNL’s end-to-end activities. • Asynchronous I/O. • Dashboard Efforts.
It’s all about the features which lead us to Scientific discovery! It’s all about the data Applications Math CS It’s all about the enabling technologies… Enabling technologies respond Applications drive D. Keyes
GPSC gyrokinetic PIC codes used for studying microturbulence in plasma core • GTC(Z. Lin et al., Science 281, p.1835, 1998) • Intrinsically global 3D nonlinear gyrokinetic PIC code • All calculations done in real space • Scales to > 30,000 processors • Delta-f method • Recently upgraded to fully electromagnetic • GEM(Y. Chen & S. Parker, JCP, in press 2006) • Fully electromagnetic nonlinear delta-f code • Split-weight scheme implementation of kinetic electrons • Multi-species • Uses Fourier decomposition of the fields in toroidal and poloidal directions (wedge code) • What about PIC noise. • “It is now generally agreed that these ITG simulations are not being influenced by particle noise. Noise effects on ETG turbulence remain under study but are beginning to seem of diminishing relevance.” PSACI-PAC.
Cray T3E IBM SP3 Cray X1E Cray Baker GTC Code performance. Historical Prediction of GTC Data Production Cray XT3 • Increase output because of • Asynchronous metadata rich I/O. • Workflow automation. • More analysis services in the workflow.
GTC: Towards a Predictive Capability for ITER Plasmas • Petascale Science • Investigate important physics problems for ITER plasmas, namely, the effect of size and isotope scaling on core turbulence and transport (heat, particle, and momentum). • These studies will focus on the principal causes of turbulent transport in tokamaks, for example, electron and ion temperature gradient (ETG and ITG) drift instabilities, collisonless and collisional (dissipative) trapped electron mode (CTEM and DTEM) and ways to mitigate these phenomena.
Impact: How does turbulence cause heat, particles and momentum to escape from plasmas? • Investigation of the ITER confinement properties is required • a dramatic step from 10 MW for 1 second to the projected 500 MW for 400 seconds. • The race is on to improve predictive capability before ITER comes on line (projected 2015). • More realistic assessment of ignition margins requires more accurate calculations of steady-state temperature and density profiles for ions, electrons and helium ash. • The success of ITER depends in part on its ability to operate in a gyroBohm scaling regime which must be demonstrated computationally. • Key for ITER is the fundamental understanding of the effect of deuterium-tritium isotope presence (isotope scaling) on turbulence.
Calculation Details • Turbulent transport studies will be carried out using the present GTC code, which uses a grid of the size of ion gyroradius. • The electron particle transport physics requires the incorporation of the size of the electron skin depth in the code for the TEM physics, which can be an order of magnitude smaller than the size of ion gyroradius. • A 10,000x10,000x100 grid and 1 trillion particles (100 particles/cell) are estimated to be needed. (700 TB/scalar field, 25TB particles(1 time step). • For the 250TF machine a 2D domain decomposition (DD) for electrostatic simulation of ITER size machine (a/rho>1000) with kinetic electron is necessary. W. Lee
GTC Data Management Issues • Problem: Move data from NERSC to ORNL then to PPPL as the data was being generated. • Transfer from NERSC to ORNL, 3000 timesteps, 800GB within the simulation run (34 hours). • Convert each file to HDF5 file • Archive files to 4GB chunks to HPSS at ORNL. • Move portion of hdf5 files to PPPL. • Solution: NorbertPodhorszki Watch Transfer Convert Archive
GTC Data Management Achievements • In the process to remove • Ascii output. • Hdf5 output. • Netcdf output. • Replace with • Binary (parallel) I/O with metadata tags. • Conversion to HDF5 during the simulation on a ‘cheaper’ resource. • 1 XML file to describe all files output in GTC. • Only work with 1 file from the entire simulation. • Large buffer writes. • Asynchronous I/O when it becomes available.
The data-in-transit problem • Particle data needs to be examined occasionally. • 1 trillion particles = 25TB/hour. (Demand <2% I/O overhead). • Need 356GB/sec to handle burst! (7GB/sec aggregate). • We can’t store all of this data! (2.3 PB/simulation) x 12 simulations/year = 25 PB. • Need to analyze on-the-fly and not save all of the data for permanent storage. [Analyze on another system]. • Scalar data needs to be analyzed during the simulation. • Computational Experiments too costly to let simulation run and ignore it. [Estimated cost = $500K/simulation on Pflop machine]. • GTC already = 0.5M CPU hours/simulation; approaching 3M CPU hours on 250Tflop system. • Need to compare new simulations with older simulations and experimental data. • Metadata needs to be stored in databases.
Workflow Simulation monitoring. • Images generated from the workflow. • User needs to set angles, min/max and then the workflow produces the images. • Still need to put this in our everyday use. • Really need to identify the features as it’s running. • Trace back features once they are known to earlier timesteps (where are they born?)
5D Data Analysis -1 • Common in fusion to look at puncture plots. (2D). • To gleam insight, we need to be able to detect ‘features’ • Need temporal perspective, involving the grouping of similar items to possibly identify interesting new plasma structures (within this 5D-phase space) at different stages of the simulations. 2D Phase Space
5D Data Analysis -2 • Our turbulence covers the global volume as opposed to some isolated (local) regions • The spectral representation of the turbulence, evolves in time by moving to longer wavelengths. • Understanding key nonlinear dynamics here involves extracting relevant information from the data sets for the particle behavior. • The trajectories of these particles are followed self-consistently in phase space • Tracking of spatial coordinates and the velocities. • The self- consistent interaction between the fields and the particles is most important when viewed in the velocity space because particles of specific velocities will resonate with waves in the plasma to transfer energy. • Structures in velocity space could potentially be used in the future development of multi- resolution compression methods. W. Tang
Data Management Challenge • A new discovery was made by Z. Lin in large ETG calculations. • We were able to see radial flow across individual eddies. • The Challenge: • Track the flow across the individual eddies, give statistical measurements on the velocity of the flow • Using Local Eddy Motion Density (PCA)to examine data. • Hard problem for lots of reasons! Ostrouchov ORNL Decomposition shows transient wave components in time
Physics in tokamak plasma edge • Plasma turbulence • Turbulence suppression (H-mode) • Edge localized mode and ELM cycle • Density and temperature pedestal • Diverter and separatrix geometry • Plasma rotation • Neutral collision Diverted magnetic field Edge turbulence in NSTX (@ 100,000 frames/s)
XGC code • XGC-0 self-consistently includes • 5Dion neoclassical dynamics, realistic magnetic geometry and wall shape • Conserving plasma collisions (Monte Carlo) • 4D Monte Carlo neutral atoms with recycling coefficient • Conserving MC collisions, ion orbit loss, self-consistent Er • Neutral beam source, magnetic ripple, heat flux from core. • XGC-1 includes • Particle source from neutral ionization • Full-f ions, electrons, and neutrals • Gyrokinetic Poisson equation for neoclassical and turbulent electric field • Full-f electron kinetics for neoclassical physics • Adiabatic electrons for electrostatic turbulence • General 2d field solver in a dynamically evolving 3D B field
Neoclassical potential and flow of edge plasma from XGC1 Electric potential Parallel flow and particle positions
Need real-time visualization to help monitor/debug these simulations. • Need better integration with interactive debugging sessions. • Need to be able to look at derived quantities from raw data. XGC-MHD coupling plan Blue: Developed • Red: To be developed
XGC-M3D code couplingCode coupling framework with Kepler End-to-end system 160p, M3D runs on 64P Monitoring routines here XGC on Cray XT3 40 Gb/s Data replication User monitoring Data archiving Data replication Post-processing Ubiquitous and transparent data access via logistical networking
Code Coupling Framework XGC1 R2D0 M3DOMP Bbcp first then portals with sockets. lustre M3DMPP lustre • Necessary steps for initial completion • R2D0, M3DOMP becomes a service • M3DMPP is launched from Kepler once M3DOMP returns a failure condition. • XGC1 stops when M3DMPP is launched. • Get incorporated into Kepler
Kepler workflow framework Kepler: developed by the SDM Center • Kepler is an adaptation of the UC Berkeley tool, Ptolemy • Can be composed of sub-workflows • Uses event-based “director” and “actors” methodology • Features in Kepler relevant to CPES • Launching components (ssh, command line) • Execution logger – keep track of runs • Data movement – Sabul, Gridftp, Logistical Networks (future), data streaming (future).
Original View of CPES workflow(a typical scenario) KEPLER What’s wrong with this picture? Iterate On TS Run Simulation Move files In time step Analyze Time step Visualize Analyzed data TS TS TS Kepler Workflow Engine Simulation Program (MPI) SRM Data Mover Analysis Program CPES VIS tool Software components Disk Cache Disk Cache Hardware + OS HPSS ORNL Seaborg NERSC Disk cacke Ewok-ORNL
What’s wrong with this picture? • Scientists running simulations will NOT use Kepler to schedule jobs on super-computers • Concern about dependency on another system • But need to track when files are generated so Kepler can move them • Need a “FileWatcher” actor in kepler • ORNL permit only One-Time-Password (OTP) • Need a OTP login actor in Kepler • Only SSH can be used to invoke jobs including data copying • Cannot use GridFTP (requires GSI security support at all sites) • Need an ssh-based DataMover actor in Kepler: scp, bbcp, … • HPSS does not like a large number of small files • Need an actor in Kepler to TAR files before archiving
New actors in CPES workflowto overcome problems KEPLER Start Two Independent processes Login At ORNL (OTP) Detect when Files are Generated Move files Tar files Archive files 2 Kepler Workflow Engine 1 Simulation Program (MPI) OTP Login actor File Watcher actor Scp File copier actor Tar’ing actor Local archiving actor Software components Disk Cache Disk Cache Hardware + OS HPSS ORNL Seaborg NERSC Disk cacke Ewok-ORNL
Future SDM work in CPES • Workflow Automation of the coupling problem. • Critical for for code debugging. • Necessary to track provenance to ‘replay’ coupling experiments. • Q: Do we stream data or write files? • Dashboard for monitoring simulation. • Fast SRM movement of data NERSC<--> ORNL.
Asynchronous petascale I/O for data in transit • High-performance I/O • Asynchronous • Managed buffers • Respect firewall constraints • Enable dynamic control with flexible MxN operations • Transform using shared-space framework (Seine)
Current Status Asynchronous I/O • Currently working on XT3 development machine (rizzo.ccs.ornl.gov). • Current implementation based on RDMA approach. • Current benchmarks indicate 0.1% overhead writing 14TB/hour on jaguar.ccs.ornl.gov. • Looking at changes in ORNL infrastructure to deal with these issues. • Roughly 10% of machine will be carved off for real-time analysis. (100 Tflop for real-time analysis with TBs/sec bandwidth).
SDM/ORNL Dashboard: Current Status • Step 1: • Monitor ORNL and NERSC machines. • Log in • https://ewok-web.ccs.ornl.gov/dev/rbarreto/SDMP/WebContent/SdmpApp/rosehome.php • Uses OTP. • Working to pull out users jobs. • Workflow will need to move data to ewok web disk. • Jpeg, xml files (metadata).
Dashboard- future • Current and old simulations will be accessible on webpage. • Schema from simulation will be determined by XML file the simulation produces. • Pictures and simple metadata (min/max…) are displayed on the webpage. • Later we will allow users to ‘control’ their simulations.
The End-to-End Framework Applications Metadata rich output from components. Applied Math VIZ/Dashboard Workflow Automation Data Monitoring CCA SRM LN Async. NXM streaming
Plans • Incorporate workflow automation into everyday work. • Incorporate visualization services into the workflow. • Incorporate asynchronous I/O (data streaming) techniques. • Unify Schema in fusion SciDAC PIC codes. • Further Develop workflow automation for code coupling. • Will need dual-channel Kepler actors to understand data streams. • Will need to get certificates to deal with OTP with workflow systems. • Autonomics in workflow automation. • Easy to use for non-developers! • Dashboard. • Simulation monitoring (via push method) available end Q2: 2004. • Simulation control!