100 likes | 119 Views
CyberShake Study 2.2: Computational Review. Scott Callaghan. Computational Goals. 269 CyberShake sites on Kraken with existing SGTs 47 complete, 221 remaining, 1 lost Produce seismograms, PSA values, hazard curves Establish Kraken / Cray architecture as platform for CyberShake.
E N D
CyberShake Study 2.2: Computational Review Scott Callaghan
Computational Goals • 269 CyberShake sites on Kraken with existing SGTs • 47 complete, 221 remaining, 1 lost • Produce seismograms, PSA values, hazard curves • Establish Kraken / Cray architecture as platform for CyberShake
Inputs • 221 sets of SGTs + MD5 sums • 5 on HPC • 184 on Ranger disk • 31 on Ranch archive • Will need to be staged back to Ranger • About 8.5 TB • Rupture Geometries • 14,000 files • 1.5 GB
Outputs • Files • 116 M seismograms, 116 M PSA files • 2.5 TB (2.1 TB new) • 350,000 workflow files • 1.3 TB (1.1 TB new) • Small number of curves, maps • Database • 350 M entries (37% increase) • About 40 GB • Access • Hazard curves, maps posted on web site • PSA values in DB • Seismograms on disk
Computing Environment/Resources • Kraken nodes • Pegasus 4.2.0 + PMC • SGT extraction code • Memcached • In-memory rupture variation generation • Seismogram/PSA code • Combined • Memcached • In-memory rupture variation generation • CyberShake codes tagged in SVN
Computing Resources • 1.2M Kraken SUs, ~5M SUs available • Local disk space • 3.2 TB (additional) required • 4.8 TB available on scec-02 • Duration • Start 10/8 (with review approval) • ~2 months (dependent on Kraken queue, I/O) • Personnel • Scott • Request help from Pegasus group when needed
Reproducibility • Science code tagged in SVN • Metadata captured in database • SGTs long-term • Ranger decommissioned in Feb • Either archive or throw away and regenerate
Metrics • Calculate metrics previously highlighted in papers and posters, especially: • Average makespan • Parallel speedup • Utilization • Tasks/sec • Delay per job • SI2 metrics • Number of hazard curves • Compare metrics, determine improvement
Open Issues / Risk Analysis • Kraken I/O • Depending on file system performance, runtime is variable by a factor of 3 • Kraken gridmanager • Support for load? • SUs • Uncertain about usage of other SCEC users • Statistics gathering • Have had issues with pegasus-monitord in the past • May have to populate DB after workflow is complete