240 likes | 363 Views
Operational COSMO Demonstrator OPCODE. COSMO-GM, Rome, 5-9 September 2011. André Walser and Oliver Fuhrer MeteoSwiss. Project overview. Additional proposal to the Swiss HP2C initiative to build an “ OP erational CO SMO DE monstrator (OPCODE) ” Project proposal accepted by end of May
E N D
Operational COSMO Demonstrator OPCODE COSMO-GM, Rome, 5-9 September 2011 André Walser and Oliver Fuhrer MeteoSwiss
Project overview • Additional proposal to the Swiss HP2C initiative to build an “OPerational COSMO DEmonstrator (OPCODE)” • Project proposal accepted by end of May • Start of project 1 June 2011 until end of 2012 • Project resources: • second contract with IT company SCS to continue collaboration until end of 2012 • 2 new positions at MeteoSwiss for about 1 year • Swiss HPC center CSCS • C2SM (collaboration with ETH Zurich and others)
GPU based hardware (a few rack units) Cray XT4 (3 cabinets) Main goals • Leverage the research results of the ongoing HP2C COSMO project • Prototyp implementation of the COSMO production suite of MeteoSwiss making aggressive use of GPU technology • MeteoSwiss ready to buy a GPU based hardware for the 2015 production machine • Same time-to-solution on substantially cheaper hardware:
GPU perspectives GFLOPS per Watt is expected to increase strongly in the next years
Current production scheme COSMO-7 assimilation COSMO-7 forecast COSMO-7 TC products COSMO-2 assimilation COSMO-2 forecast COSMO-2 TC products COSMO-7 / COSMO-2suite: Elapsed time in min 1 7 11 46 0 34 49 61 25-72h forecast (00 UTC) and TC products 0-24h forecast (00 UTC) and TC products 0-24h forecast (00 UTC) and TC products 3h assimilation (21 UTC) 3h assimilation (21 UTC) • Time-critical post-processing takes about 15 minutes longer than forecasts for both COSMO-2 and COSMO-7 • current bottleneck is post-processing tool fieldextra • entire suite has to be optimized for demonstrator
Two workpages • Workpage A:Porting remainig parts of opr COSMO code @ MeteoSwiss to demonstrator • Workpage B:Porting suite to demonstrator, optimize it, and operate it
Work package A To use full speed-up, data has to remain on GPU within a time step; sent to CPU for I/O only COSMO workflow: What’s still missing for a full GPU implementation? Input Physics Dynamics Assimilation Boundary Conditions Diagnostics Output
Work package A To use full speed-up, data has to remain on GPU within a time step; sent to CPU for I/O only COSMO workflow: What’s still missing for a full GPU implementation? Input Physics HPC2 Dynamics HPC2 Assimilation Boundary Conditions Diagnostics Output
Task A2: Inter-/intra-GPU parallelization • COSMO requires a communication library with halo-update as well as several other communications (e.g. global reduce, gather, scatter) • e.g. peer-to-peer:
A4. Data Assimilation: Porting to GPU Assimilation part is a huge code!
Organization 0.9 FTE new position @MeteoSwiss1 yearstill open 1.9 FTEnew collaborator @MeteoSwiss 15 months, CSCS 1.7 FTESCS, CSCS, C2SM