1 / 29

Parallelizing ROMS for Distributed Memory Machines using the Scalable Modeling System (SMS)

Parallelizing ROMS for Distributed Memory Machines using the Scalable Modeling System (SMS). Dan Schaffer NOAA Forecast Systems Laboratory (FSL) August 2001. Outline. Who we are Intro to SMS Application of SMS to ROMS Ongoing Work Conclusion. Who we are. Mark Govett Leslie Hart

parker
Download Presentation

Parallelizing ROMS for Distributed Memory Machines using the Scalable Modeling System (SMS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelizing ROMS for Distributed Memory Machines using the Scalable Modeling System (SMS) Dan Schaffer NOAA Forecast Systems Laboratory (FSL) August 2001

  2. Outline • Who we are • Intro to SMS • Application of SMS to ROMS • Ongoing Work • Conclusion

  3. Who we are • Mark Govett • Leslie Hart • Tom Henderson • Jacques Middlecoff • Dan Schaffer • Developing SMS for 20+ man years

  4. Intro to SMS • Overview • Directive based • FORTRAN comments • Enables single source parallelization • Distributed or shared memory machines • Performance portability

  5. Distributed Memory Parallelism

  6. Code Parallelization using SMS Original SerialCode Add SMS Directives SMS Serial Code Serial Executable SMS Parallel Code PPP Parallel Executable Parallel Pre-Processor

  7. Low-Level SMS SMS Parallel Code NNT SST SRS Spectral Library FDA Library Parallel I/O MPI, SHMEM, etc.

  8. Intro to SMS (contd) • Support for all of F77 plus much of F90 including: • Dynamic memory allocation • Modules (partially supported) • User-defined types • Supported Machines • COMPAQ Alpha-Linux Cluster (FSL “Jet”) • PC-Linux Cluster • SUN Sparcstation • SGI Origin 2000 • IBM SP-2

  9. Intro to SMS (contd) • Models Parallelized • Ocean : ROMS, HYCOM, POM • Mesoscale Weather : FSL RUC, FSL QNH, NWS Eta, Taiwan TFS (Nested) • Global Weather : Taiwan GFS (Spectral) • Atmospheric Chemistry : NOAA Aeronomy Lab

  10. Data Decomposition csms$declare_decomp csms$create_decomp csms$distribute Communication csms$exchange csms$reduce Index Translation csms$parallel Incremental Parallelization csms$serial Performance Tuning csms$flush_output Debugging Support csms$reduce (bitwise exact) csms$compare_var csms$check_halo Key SMS Directives

  11. SMS Serial Code

  12. Advanced Features • Nesting • Incremental Parallelization • Debugging Support (Run-time configurable) • CSMS$REDUCE • Enables bit-wise exact reductions • CSMS$CHECK_HALO • Verifies a halo region is up-to-date • CSMS$COMPARE_VAR • Compare variables for simultaneous runs with different numbers of processors • HYCOM 1-D decomp parallelized in 9 days

  13. “global” “local” “local” “global” Incremental Parallelization • CALL NOT_PARALLEL(...) SMS Directive:CSMS$SERIAL

  14. Advanced Features (contd) • Overlapping Output with Computations (FORTRAN Style I/O only) • Run-time Process Configuration • Specify • number of processors per decomposed dim or • number of grid points per processor • 15% performance boost for HYCOM • Support for irregular grids coming soon

  15. SMS Performance (Eta) • Eta model run in production at NCEP for use in National Weather Service Forecasts • 16000 Lines of Code (excluding comments) • 198 SMS Directives added to the code

  16. ETA Performance • Performance measured on NCEP SP2 • I/O excluded • Resolution : 223x365x45 • 88 PE run-time beats NCEP hand-coded MPI by 1% • 88 PE Exchange time beats hand-coded MPI by 17%

  17. SMS Performance (HYCOM) • 4500 Lines of Code (excluding comments) • 108 openMP directives included in the code • 143 SMS Directives added to the code

  18. HYCOM Performance • Performance measured on O2K • Resolution : 135x256x14 • Serial code runs in 136 seconds

  19. Intro to SMS (contd) • Extensive documentation available on the web • New development aided by • Regression test suite • Web-based bug tracking system

  20. Outline • Who we are • Intro to SMS • Application of SMS to ROMS • Ongoing Work • Conclusion

  21. SMS ROMS Implementation • Used awk and cpp to convert to dynamic memory; simplifying SMS parallelization • Leveraged existing shared memory parallelism do I = ISTR, IEND • Directives added to handle NEP scenario • 13000 Lines of Code, 132 SMS directives • Handled netCDF I/O with CSMS$SERIAL

  22. Results and Performance • Runs and produces correct answer on all supported SMS machines • Low Resolution 128x128x30 • “Jet”, O2K, T3E Scaling • Run-times for main loop (21 time steps) excluding I/O • High Resolution 210x550x30 • PMEL using in production • 97% Efficiency between 8 and 16 processors on “Jet”

  23. SMS Low Res ROMS “Jet” Performance

  24. SMS Low Res ROMS O2K Performance

  25. SMS Low Res ROMS T3E Performance

  26. Outline • Who we are • Intro to SMS • Application of SMS to ROMS • Ongoing Work • Conclusion

  27. Ongoing Work (funding dependent) • Full F90 Support • Support for parallel netCDF • T3E port • SHMEM implementation on T3E, O2K • Parallelize other ROMS scenarios • Implement SMS nested ROMS • Implement SMS coupled ROMS/COAMPS

  28. Conclusion • SMS is a high level directive-based tool • Simple single source parallelization • Performance optimizations provided • Strong debugging support included • Performance beats hand-coded MPI • SMS is performance portable

  29. Web-Site www-ad.fsl.noaa.gov/ac/sms.html

More Related