1 / 20

PetaApps : Update on software engineering and performance

PetaApps : Update on software engineering and performance. J. Dennis M. Vertenstein N. Hearn. Code Base Update. Trunk+ means ccsm4 release code + IE mods scripts – trunk+ (just in) f ixes build problem inherent in alpha38+ cice – trunk+

Download Presentation

PetaApps : Update on software engineering and performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PetaApps: Update on software engineering and performance J. Dennis M. Vertenstein N. Hearn

  2. Code Base Update • Trunk+ means ccsm4 release code + IE mods • scripts – trunk+ (just in) • fixes build problem inherent in alpha38+ • cice – trunk+ • has weighted space filling curves and restarts on tripole grid working • has OpenMP threading capability • has PIO for history and restarts (netcdf) • has multi-frequency history capability (1 file per day)

  3. Code Base Update (con’t) • pop - alpha38+ • has fix to tripole grid problem and restarts are working • has multi-frequency history capability (1 file per day) • TO DO: migrate time series capability from trunk onto alpha38+ • TO DO: migrate PIO capability from trunk onto alpha38+ • TO DO: OpenMP threading capability is not functional (ORNL working on this)

  4. Code Base Update (con’t) • cam - alpha38+ • TO DO: migrate cam to cam trunk- will then get pio – (almost done by Nathan) • clm - alpha38+ • drv - alpha38+ • interactive ensembles for atm functional • TO DO: Interactive ensembles for ice in progress • TO DO: Migrate driver to the head of the trunk - where interactive ocean ensembles have been implemented

  5. Interactive Ensemble Runs Update • TO DO: Finish validation of 2 degree atm/ 1 degree ocean interactive ensembles • POP convergence problem at year 150 for low-res IE • - Reduce pop time step • Problem with branch/hybrid start for IE from HRC03 • Demonstrated functionality with a 10 member atm ensemble for high-res • Execute high-res interactive ensemble run

  6. Status of TRAC allocation

  7. Experiences on Kraken • Somewhat behind on cycle usage • Highly variable Disk I/O performance ~18x • Use little-endian binary writes avoids performing 4K to file system • Job performance dependent on node mapping • Some jobs are ~20% slower [excludes I/O]

  8. Job Placement of CCSM within the Torus White = Ice only Blue = Ocean Green = Land Red = Atmosphere & Ice Courtesy of Nick Jones

  9. Courtesy of Nick Jones

  10. Experiences on Kraken • Somewhat behind on cycle usage • Highly variable Disk I/O performance ~18x • Use little-endian binary writes avoids performing 4K to file system • Job performance dependent on node mapping • Some jobs are ~20% slower [excludes I/O] • Friendly User access • Invaluable for development effort • Now can run < 1GB per core • Multi-frequency support in CICE, POP • Hex-core improves CCSM performance

  11. Kraken Upgrade • Started August 1th October 5th • OS upgrade • Significant increase in job failures [1/3 of all jobs failed] • Subset of nodes upgraded to Hex-core • Queue wait became excessive • Friendly user access

  12. Queue access on Kraken

  13. Kraken Upgrade • Started August 1th October 5th • OS upgrade • Significant increase in job failures [1/3 of all jobs failed] • Subset of nodes upgraded to Hex-core • Queue wait became excessive • Friendly user access • Entire system down for upgrade • Access to Athena • Friendly user access • What changed? • CPU: • quad-core to hex-core [12 core per node] • Improved memory controller • Memory: • All nodes to 16 GB per node (1.3GB per core)

  14. Simulation cost [HRC03] • CCSM(1,1,1,1) @ f0.5_tx0.1v2 on 5848 cores • Monthly output [Historical perspective] • First time [ATLAS] 140K per year [0.8 SYPD] early 2008 • NERSC [XT4] 100K per year [1.3 SYPD] fall 2008 • Budgeted [XT4] 89K per year [1.6 SYPD] early 2009 • Actual [XT5] 81K per year [1.8 SYPD] summer 2009 • Measured [XT5] 65K per year [2.1 SYPD] fall 2009 • upgraded Hex-core system • Small user group • Monthly + Daily output • Measured: 91K per year [1.6 SYPD] • Observations • Time to complete additional 100 years [61 days wall-clock]

  15. Simulation cost (con’t) • CCSM(10,1,1,1) @ f0.5_tx0.1v2 on 7434 cores • Monthly + Daily output • Budgeted: 234K per year • Measured: 120K per year [1.5 SYPD] • On Cray XT4 • Observations • Significantly cheaper than budgeted!! • Implied start times: mid January 2010 [41 days wall-clock]

  16. ATM-IE performance on 7434 cores on Cray XT4 ATM on 480 cores per ensemble (10 members) 1.5 SYPD 120K per year Problem in CPL7 currently limits parallelism to 2000

  17. Simulation cost (con’t) • CCSM(10,1,10,1) @ f0.5_tx0.1v2 on 6000 cores • ICE-IE is still being tested/developed • Monthly + Daily output • Budgeted: 234K per year [0.8 SYPD] • Observations • Implied start times: • December 1st, 2009 [79 days wall-clock]

  18. Resource requirements: TRAC1

  19. Resource requirements: TRAC2 Ice IE experiment moved to second year

  20. Resource requirements: PRAC

More Related