120 likes | 257 Views
WRF Software Development and Performance. NCAR: W. Skamarock, J. Dudhia, D. Gill, A. Bourgeois, W. Wang, C. Deluca, R. Loft NOAA/NCEP: Tom Black, Jim Purser, S. Gopal NOAA/FSL: T. Henderson, J. Middlecoff, L.Hart U. Oklahoma: M. Xue AFWA: J. Wegiel, D. McCormick
E N D
WRF Software Development andPerformance NCAR: W. Skamarock, J. Dudhia, D. Gill, A. Bourgeois, W. Wang, C. Deluca, R. Loft NOAA/NCEP: Tom Black, Jim Purser, S. Gopal NOAA/FSL: T. Henderson, J. Middlecoff, L.Hart U. Oklahoma: M. Xue AFWA: J. Wegiel, D. McCormick C. Coats (MCNC), J. Schmidt (NRL), V. Balaji (GFDL) S. Chen (UC Davis), J. Edwards (IBM) Acknowledgement: Significant funding for WRF software development from DoD HPCMO CHSSI program (CWO6) John Michalakes, NCAR
Goals Community Model Good performance Portable across a range of architectures Flexible, maintainable, understandable Facilitate code reuse Multiple dynamics/ physics options Run-time configurable Nested Aspects of Design Single-source code Fortran90 modules, dynamic memory, structures, recursion Hierarchical software architecture Multi-level parallelism CASE: Registry Package-neutral APIs I/O, data formats Communication IKJ Storage Order Vector’s not dead yet! WRF Software WRF Users Workshop
Software Architecture Driver Driver Layer Config Inquiry DM comm I/O API Package Independent Solve Mediation Layer OMP Config Module WRF Tile-callable Subroutines Data formats, Parallel I/O Package Dependent Message Passing Threads Model Layer External Packages • Driver: I/O, communication, multi-nests, state data • Model routines computational, tile-callable, thread-safe • Mediation layer: interface between model and driver • Interfaces to external packages WRF Users Workshop
Single version of code for efficient execution on: Distributed-memory Shared-memory Clusters of SMPs Vector and microprocessors WRF Multi-Layer Domain Decomposition Logical domain 1 Patch, divided into multiple tiles Model domains are decomposed for parallelism on two-levels • Patch: section of model domain allocated to a distributed memory node • Tile: section of a patch allocated to a shared-memory processor within a node; this is also the scope of a model layer subroutine. • Distributed memory parallelism is over patches; shared memory parallelism is over tiles within patches Inter-processor communication WRF Users Workshop
I/O Architecture • Requirements of I/O Infrastructure • Efficiency: key concern for operations • Flexibility: key concern in research • Both types of user-institution already heavily invested in I/O infrastructure • Operations: GRIB, BUFR • Research: NetCDF, HDF • “Portable I/O” – adaptable to range of uses, installations without affecting WRF and other programs that use the I/O infrastructure WRF Users Workshop
I/O Architecture • WRF I/O API • Package-independent interface to NetCDF, Fast-binary, HDF (planned) • Random access of fields by timestamp/name • Full transposition to arbitrary memory order • Built-in support for read/write of parallel file systems (planned) • Data-set centric, not file-centric (planned); Grid Computing • Additional WRF model functionality • Collection/distribution of decomposed data to serial datasets • Fast, asynchronous, “quilt-server” I/O from NCEP Eta model WRF Users Workshop
5 MB/s 16 MB/s I/O Performance 120,000,000 netcdf 100,000,000 bin 80,000,000 bytes/second 60,000,000 40,000,000 20,000,000 0 0 1 4 i/o servers WRF Users Workshop
WRF Performance • Platforms • IBM SP (blackforest.ucar.edu) • 293 4x375 Mhz Power3 nodes • Peak 1500 Mflop/s/cpu • Compaq TCS (lemieux.psc.edu) • 750 4x1 GHz EV68 nodes • Peak 2000 Mflop/s/cpu • Scaling efficiency (32 to 512pe) • IBM: 69 % • Compaq: 57 % • Efficiency relative to peak • 32pes: IBM (7%), Compaq (20%) • 512pes: IBM (5%), Compaq (11%) • Sustained Performance: • IBM: 39 Gflop/second • Compaq: 110 Gflop/second • 12 km CONUS • 425x300x35 • 4.5 million cells • 22 Gflop/time step • 48 hour forecast • 21 minutes on 128p • 8 minutes on 512p • I/O time not included
Model Performance • Efficiency with respect to other models • WRF about 2x cost of NCEP Eta (mid 2001) • Complexity: WRF 1.6 times more operations for a given period of integration • Code efficiency: WRF .78 of Eta • Scientific or forecast efficiency…? WRF Users Workshop
Summary • Status • Third release: WRFV1.2, April 2002 • Systems: IBM, Compaq, SGI, PC/Alpha Linux • Nesting, 3DVAR: first implementations this Summer • WRF software architecture designed to support development and maintenance as a community model serving operational and research users over a range of applications, and on a variety of computing architectures • Additional information: http://www.wrf-model.org WRF Users Workshop