Recent Developments for Parallel CMAQ

Recent Developments for Parallel CMAQ • Jeff Young AMDB/ASMD/ARL/NOAA • David Wong SAIC – NESCC/EPA

AQF-CMAQ Running in “quasi-operational” mode at NCEP 26+ minutes for a 48 hour forecast (on 33 processors)

Code modifications to improve data locality Some vdiff optimization Jerry Gipson’s MEBI/EBI chem solver Tried CGRID( SPC, LAYER, COLUMN, ROW ) – tests Indicated not good for MPI communication of data Some background: MPI data communication – ghost (halo) regions

Ghost (Halo) Regions DO J = 1, N DO I = 1, M DATA( I,J ) = A( I+2,J ) * A( I,J-1 ) END DO END DO

Horizontal Advection and Diffusion Data Requirements 3 4 5 3 4 Ghost Region 0 1 2 Exterior Boundary Stencil Exchange Data Communication Function

Architectural Changes for Parallel I/O Tests confirmed latest releases (May, Sep 2003) not very scalable Background: Original Version Latest Release AQF Version

Parallel I/O 2002 Release Computation only Read, write and computation 2003 Release Read and computation Write and computation asynchronous Write only AQF Data transfer by message passing

Modifications to the I/O API for Parallel I/O For 3D data, e.g. … time interpolation INTERP3 ( FileName, VarName, ProgName, Date, Time, Ncols*Nrows*Nlays, Data_Buffer ) spatial subset XTRACT3 ( FileName, VarName, StartLay, EndLay, StartRow, EndRow, StartCol, EndCol, Date, Time, Data_Buffer ) INTERPX ( FileName, VarName, ProgName, StartCol, EndCol, StartRow, EndRow, StartLay, EndLay, Date, Time, Data_Buffer ) WRPATCH ( FileID, VarID, TimeStamp, Record_No ) Called from PWRITE3 (pario)

Standard (May 2003 Release) DRIVER read IC’s into CGRID begin output timestep loop advstep (determine sync timestep) couple begin sync timestep loop SCIPROC X-Y-Z advect adjadv hdiff decouple vdiff DRYDEP cloud WETDEP gas chem (aero VIS) couple end sync timestep loop decouple write conc and avg conc CONC, ACONC end output timestep loop

DRIVER set WORKERs, WRITER if WORKER, read IC’s into CGRID begin output timestep loop if WORKER advstep (determine sync timestep) couple begin sync timestep loop SCIPROC X-Y-Z advect hdiff decouple vdiff cloud gas chem (aero) couple end sync timestep loop decouple MPI send conc, aconc, drydep, wetdep, (vis) if WRITER completion-wait: for conc, write conc CONC ; for aconc, write aconc, etc. end output timestep loop AQFCMAQ

Power3 Cluster (NESCC’s LPAR) 11 12 13 14 15 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Memory 2X slower than NCEP’s Switch The platform (cypress00, cypress01, cypress02) consists of 3 SP-Nighthawk nodes All cpu’s share user applications with file servers, interactive use, etc.

Power4 p690 Servers (NCEP’s LPAR) 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Memory Memory Memory Memory 2 “colony” SW connectors/node ~2X performance of cypress nodes Switch Each platform (snow, frost) is composed of 22 p690 (Regatta) servers Each server has 32 cpu’s LPAR-ed into 8 nodes per server (4 cpu’s per node) Some nodes are dedicated to file servers, interactive use, etc. There are effectively 20 servers for general use (160 nodes, 640 cpu’s)

“ice” Beowulf Cluster Pentium 3 – 1.4 GHz 0 1 0 1 0 1 0 1 0 1 0 1 Memory Memory Memory Memory Memory Memory Internal Network Isolated from outside network traffic

“global” MPICH Cluster Pentium 4 XEON – 2.4 GHz global global1 global2 global3 global4 global5 0 1 0 1 0 1 0 1 0 1 0 1 Memory Memory Memory Memory Memory Memory Network

RESULTS 5 hour, 12Z – 17Z and 24 hour, 12Z – 12Z runs 20 Sept 2002 test data set used for developing the AQF-CMAQ Input Met from ETA, processed thru PRDGEN and PREMAQ 166 columns X 142 rows X 22 layers at 12 km resolution Domain seen on following slides CB4 mechanism, no aerosols, Pleim’s Yamartino advection for AQF-CMAQ PPM advection for May 2003 Release

The Matrix 4 = 2 X 2 8 = 4 X 2 16 = 4 X 4 32 = 8 X 4 64 = 8 X 8 64 = 16 X 4 = comparison of run times run = comparison of relative wall times wall for main science processes

24–Hour AQF vs. 2003 Release 2003 Release AQF-CMAQ Data shown at peak hour

24–Hour AQF vs. 2003 Release Almost 9 ppb max diff between Yamartino and PPM Less than 0.2 ppb diff between AQF on cypress and snow

AQF vs. 2003 Release Absolute Run Times 24 Hours, 8 Worker Processors Sec

AQF vs. 2003 Release on cypress and AQF on snow Absolute Run Times 24 Hours, 32 Worker Processors Sec

AQF CMAQ – Various Platforms Relative Run Times 5 Hours, 8 Worker Processors % of Slowest

AQF-CMAQ cypress vs. snow Relative Run Times 5 Hours % of Slowest

AQF-CMAQ on Various Platforms Relative Run Times 5 Hours % of Slowest Number of Worker Processors

AQF-CMAQ on cypress Relative Wall Times 5 Hours %

AQF-CMAQ on snow Relative Wall Times 5 Hours %

AQF vs. 2003 Release on cypress Relative Wall Times 24 hr, 8 Worker Processors PPM % Yamo

AQF vs. 2003 Release on cypress Relative Wall Times 24 hr, 8 Worker Processors % Add snow for 8 and 32 processors

AQF Horizontal Advection Relative Wall Times 24 hr, 8 Worker Processors % x-r-l y-c-l x-row-loop y-column-loop -hppm kernel solver in each loop

AQF Horizontal Advection Relative Wall Times 24 hr, 32 Worker Processors %

AQF & Release Horizontal Advection Relative Wall Times 24 hr, 32 Worker Processors % r- release

Future Work Add aerosols back in TKE vdiff Improve horizontal advection/diffusion scalability Some I/O improvements Layer variable horizontal advection time steps

Aspect Ratios 71 71 142 83 41 42 166 35 35 36 36 42 41 21 20

Aspect Ratios 17 35 36 18 21 11 10 20

Recent Developments for Parallel CMAQ

Recent Developments for Parallel CMAQ

Presentation Transcript

IMPORTANT RECENT DEVELOPMENTS

Recent Privacy Developments

GRETINA : Recent Developments

RECENT ECONOMIC DEVELOPMENTS

Parallel HDF5 Developments

RECENT LEGAL DEVELOPMENTS

MuonGeoModel – recent developments

Recent developments

Ofsted: Recent developments

Recent TOB Developments

Recent Developments

Recent PROOF developments

GDML - recent developments

REcent Developments

IAIS Recent Developments

CMAQ Parallel Performance

Recent SURAgrid Developments

RECENT DEVELOPMENTS O.R.P.C.

Pakistan Recent Developments

Recent TOB Developments

REcent Developments

Recent Developments