280 likes | 441 Views
ENZO Case Study. SDSC Summer Institute 2005 Robert Harkness, SDSC. Enzo Case Study. Enzo Development Team. Prof. Michael Norman UCSD Tom Abel James Bordner Greg Bryan Dave Collins Robert Harkness Alexei Kritsuk Brian O’Shea Pascal Paschos Richard Wagner. What is ENZO?.
E N D
ENZO Case Study SDSC Summer Institute 2005 Robert Harkness, SDSC
Enzo Development Team • Prof. Michael Norman UCSD • Tom Abel • James Bordner • Greg Bryan • Dave Collins • Robert Harkness • Alexei Kritsuk • Brian O’Shea • Pascal Paschos • Richard Wagner
What is ENZO? • ENZO is an adaptive mesh refinement (AMR) grid-based Eulerian hybrid code for combined hydrodynamic and N-body simulations, primarily in the realm of cosmological structure formation. • ENZO is written in C++ and Fortran-90. • ENZO uses domain decomposition mapping subgrids to MPI tasks with dynamic load balancing in AMR mode • ENZO message passing is done in MPI • ENZO I/O uses serial and parallel HDF5 • ENZO can write directly to SRB
Enzo Features • N-body gravitational dynamics (particle-mesh method) • Hydrodynamics with PPM and ZEUS finite-difference • Up to 9 species of H and He • Radiative cooling • Uniform UV background (Haardt & Madau) • Star formation and feedback • Metallicity fields • Tracer particles • Fast parallel I/O using HDF5 • SRB hooks
Enzo Modes - 1 • Unigrid • Non-adaptive • Ultra-large meshes for uniform resolution • Lyman Alpha forest “GigaCube” simulations at 1024^3 • Runtimes ~ 100,000 – 250,000 cpu hours • Output ~ 25 – 50+ Terabytes • 200 – 300 dumps at ~ 140 GBytes each • 512 / 1K HDF5 files per dump • Output write rates > 1 GByte/sec
Enzo Modes -2 • Adaptive Mesh Refinement • Fully adaptive 3D mesh refinement, space and time • Up to 30+ levels of refinement • 64/128-bit precision for location • Dynamic load balancing • Galaxy formation, first stars • Largest to date 512^3 top grid with 8 levels of refinement • Output ~ 200+ dumps of ~ 15 Gbytes/each • Packed subgrid output using HDF5 groups • Tens to hundreds of thousands of subgrids!
Enzo Computational Pipeline Create Initial Conditions Shared memory (OpenMP) Requires random numbers with very long cycle Monolithic HDF5 output files Sort particles for parallel input Map dark matter particles to initial grid structure(s) Parallel HDF5 option with MPI ring sorting Production run Batch scheduled in many pieces Ncpu + 4 files per dump Process Assemble products in large shared memory, viz (IDL, Vista, etc) Tar dumps and products before archiving [IMPORTANT!]
I/O Strategy • Owner reads and writes subgrids independently • Hyperslab offsets for parallel reads of monolithic input • Output arranged for dump/restart efficiency • Large shared memory used to re-order data in memory • Essential to reduce number of reads, writes and seeks • Minimum number of very large I/O operations • HDF5 hyperslabs with extended dimensions • Direct HDF5 parallel I/O possible for unigrids • Caution: fragments I/O – too many seeks and small op counts • HDF5 group concept used to pack AMR subgrids • Essential to reduce file count and inode count • Essential for realistic file manipulation
Advantages of HDF5 • Machine independent data format • No endian-ness issues • Easy control of precision • Parallel interface built on MPI I/O • High performance and very robust • Excellent logical design ! • Hierarchical structure ideal for packing AMR grids • HDF5 Files used at processor level • HDF5 Groups used to aggregate subgrids owned by a cpu • HDF5 Datasets used for physical fields and particle data • Useful inspection tools • H5ls • H5dump
Workflow Inits -> INITS5 -> Logfile ASCII Parameters PowerSpectrum.out ASCII GridDensity HDF5 NG GridVelocities HDF5 NG*3 ParticlePositions HDF5 NG*3 ParticleVelocities HDF5 NG*3 ParticleMass HDF5 NG ParticleType HDF5 NG ParticleAttributes HDF5 NG*NA
Particle Sorting ParticlePositions HDF5 NG*3 -> RING -> PPosnnnn HDF5 ~NG/Ncpu ParticleVelocities HDF5 NG*3 PVelnnnn ParticleMass HDF5 NG PMassnnnn ParticleType HDF5 NG PTypennnn ParticleAttributes HDF5 NG*NA PAttrnnnn nnnn = 0001 to Ncpu
Enzo Input GridDensity HDF5 -> ENZO GridVelocities HDF5 Particle* HDF5 PPos* HDF5 PVel* HDF5 PMass* HDF5 PType* HDF5 Pattr* HDF5
ENZO Output ENZO -> Logfiles Output files, per restart dump, data dump or redshift dump: Parameter file with pointers to boundary data Hierarchy file with pointers to subgrids Boundary file with top grid boundary values Ncpu files with packed AMR subgrids Structure of dumpfiles: HDF5 File --> processor HDF5 Group --> subgrid HDF5 DataSet --> data field
Restart and Data Dump Files Structure of dumpfiles: HDF5 File --> processor HDF5 Group --> subgrid HDF5 DataSet --> data field
Packed ENZO Output ds002:/dsgpfs2/harkness/SI/Dumps/DD0048% dir total 13952 -rw-r--r-- 1 harkness use300 6708 Jul 22 20:54 DD0048 -rw-r--r-- 1 harkness use300 273 Jul 22 20:54 DD0048.boundary -rw-r--r-- 1 harkness use300 419968 Jul 22 20:54 DD0048.boundary.hdf -rw-r--r-- 1 harkness use300 760424 Jul 22 20:54 DD0048.cpu0000 -rw-r--r-- 1 harkness use300 795280 Jul 22 20:54 DD0048.cpu0001 -rw-r--r-- 1 harkness use300 714768 Jul 22 20:54 DD0048.cpu0002 -rw-r--r-- 1 harkness use300 866120 Jul 22 20:54 DD0048.cpu0003 -rw-r--r-- 1 harkness use300 1186408 Jul 22 20:54 DD0048.cpu0004 -rw-r--r-- 1 harkness use300 691268 Jul 22 20:54 DD0048.cpu0005 -rw-r--r-- 1 harkness use300 582652 Jul 22 20:54 DD0048.cpu0006 -rw-r--r-- 1 harkness use300 965408 Jul 22 20:54 DD0048.cpu0007 -rw-r--r-- 1 harkness use300 38085 Jul 22 20:54 DD0048.hierarchy -rw-r--r-- 1 harkness use300 3942 Jul 22 20:54 DD0048.procmap
Groups within CPU files h5ls DD0024.cpu0004 Grid00000005 Group Grid00000014 Group Grid00000016 Group Grid00000017 Group
Groups contain Datasets h5dump --contents DD0024.cpu0004 HDF5 "DD0024.cpu0004" { FILE_CONTENTS { group /Grid00000005 dataset /Grid00000005/Dark_Matter_Density dataset /Grid00000005/Density dataset /Grid00000005/Electron_Density dataset /Grid00000005/Gas_Energy dataset /Grid00000005/HII_Density dataset /Grid00000005/HI_Density dataset /Grid00000005/HeIII_Density dataset /Grid00000005/HeII_Density dataset /Grid00000005/HeI_Density dataset /Grid00000005/Temperature dataset /Grid00000005/Total_Energy dataset /Grid00000005/particle_index dataset /Grid00000005/particle_mass dataset /Grid00000005/particle_position_x dataset /Grid00000005/particle_position_y dataset /Grid00000005/particle_position_z dataset /Grid00000005/particle_velocity_x dataset /Grid00000005/particle_velocity_y dataset /Grid00000005/particle_velocity_z dataset /Grid00000005/x-velocity dataset /Grid00000005/y-velocity dataset /Grid00000005/z-velocity group /Grid00000014 dataset /Grid00000014/Dark_Matter_Density . . . } }
Datasets contain subgrid data Dark_Matter_Density Dataset {16, 16, 16} Density Dataset {16, 16, 16} Electron_Density Dataset {16, 16, 16} Gas_Energy Dataset {16, 16, 16} HII_Density Dataset {16, 16, 16} HI_Density Dataset {16, 16, 16} HeIII_Density Dataset {16, 16, 16} HeII_Density Dataset {16, 16, 16} HeI_Density Dataset {16, 16, 16} Temperature Dataset {16, 16, 16} Total_Energy Dataset {16, 16, 16} particle_index Dataset {3444} particle_mass Dataset {3444} particle_position_x Dataset {3444} particle_position_y Dataset {3444} particle_position_z Dataset {3444} particle_velocity_x Dataset {3444} particle_velocity_y Dataset {3444} particle_velocity_z Dataset {3444} x-velocity Dataset {16, 16, 16} y-velocity Dataset {16, 16, 16} z-velocity Dataset {16, 16, 16}
Processor-to-subgrid Map 1 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0000 Grid00000001 2 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0007 Grid00000002 3 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0006 Grid00000003 4 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0005 Grid00000004 5 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0004 Grid00000005 6 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0003 Grid00000006 7 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0002 Grid00000007 8 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0001 Grid00000008 9 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0001 Grid00000009 10 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0002 Grid00000010 11 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0002 Grid00000011 12 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0003 Grid00000012 13 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0003 Grid00000013 14 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0004 Grid00000014 15 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0000 Grid00000015 16 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0004 Grid00000016 17 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0004 Grid00000017 18 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0005 Grid00000018 19 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0006 Grid00000019 20 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0007 Grid00000020 21 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0005 Grid00000021 22 /dsgpfs2/harkness/SI/Dumps/DD0024/DD0024.cpu0007 Grid00000022
Network Realities • Why do it at all? • NSF allocation (2.1 million SU in 2005) is too large for one center to support • Some architectures are better suited for different parts of the computational pipeline • Central location for processing and archival storage at SDSC (IBM P690s, HPSS, SAM-QFS, SRB) • TeraGrid backbone and GridFTP make it possible…
Local and Remote Resources • SDSC • IBM Power4 P655 and P690 (DataStar) • TeraGrid IA-64 Cluster • NCSA • TeraGrid IA-64 Cluster (Mercury) • SGI Altix (Cobalt) • IBM Power4 P690 (Copper) • PSC • Compaq ES45 Cluster (Lemieux) • Cray XT3 (Big Ben) • LLNL • IA-64 Cluster (Thunder)
Network Transfer Options • GridFTP • globus-url-copy • tgcp • Dmover • PSC bulk transfers from FAR • SRB • Sput • Sget • Sregister • HPSS • put • get
Recommendations • Maximize parallelism in all I/O operations • Use HDF5 or MPI I/O • Process results while they are on disk • Never use scp when GridFTP is available • Containerize your data before archiving it! • Use checksums when you move data • Archive your code and metadata as well as your results – the overhead is minimal but you will never regret it!
Acknowlegements • SDSC Systems Group • SDSC SRB Group • PSC Systems Group • Chris Jordan of SDSC Systems Group made the movies with VISTA
Movies • 1024^3 non-adaptive run • 6 Megaparsec cube • Star formation and feedback • Starts at Z=99 and runs down to Z=3 • 30 TBytes of raw output in 203 dumps • 7 TBytes of processed ouput so far • Run performed in pieces on • PSC TCS1 “Lemieux” • NCSA IA-64 Cluster “Mercury” • SDSC Power4 “DataStar”