150 likes | 284 Views
Met Office Computing Update. Paul Selwood, Met Office. The Met Office. National Weather Service Global and Local Area Climate Prediction (Hadley Centre) Operational and Research activities Computers: 1991-1996 : Cray Y-MP/C90 1996-present : Cray T3E Currently Relocating to Exeter.
E N D
Met Office Computing Update Paul Selwood, Met Office
The Met Office • National Weather Service • Global and Local Area • Climate Prediction (Hadley Centre) • Operational and Research activities • Computers: • 1991-1996 : Cray Y-MP/C90 • 1996-present : Cray T3E • Currently Relocating to Exeter
Relocation: Exeter 2003 • ~500 staff already working in Exeter • 1 T3E, 1 mainframe, 30 NEC nodes, many servers already moved • 1 T3E + mass storage system moving now. • Completion due end November 2003.
Major Applications • Unified Model • Single code used for NWP forecast and climate prediction • Submodels (atmosphere, ocean …) • Grid-point model (regular lat-long) • non-hydrostatic, semi-implicit, Semi-Lagrangian dynamics • Arakawa C-grid, Charney-Philips vertical staggering • Variational Assimilation • Currently 3D-Var, shortly moving to 4D-Var • Six hour time window • Increase in satellite observations
What is the Met Office getting? • 2003 (Exeter site) • 30 nodes of NEC SX-6 • Two computer halls for resiliency • Front end redundancy for failover • 6x current capability • 2005 • additional 15 nodes of next generation machine • 12.5x current capability
IXS NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) NODE(8 CPUs) IXS(Interconnect) FibreChannel Gigabit Ethernet FC Switch TX712xIA64 TX712xIA64 Mirrored filesystems User filesystems Met Office Networks The bits (one hall)
Porting • Initial focus has been the porting of operational codes. • Basic port completed for all with no major issues encountered (some minor ones…) • Much easier than C90 to T3E! • Trial suite being assembled for parallel running from October. • Porting system very stable (4 months without reboot!)
Optimisation • Vectorisation • T3E optimisations not too drastic • T3E streams encouraged vector-like code • Decomposition can effect vector length • Memory • Avoid bank conflicts • Communication • Relatively slower compared to T3E • Typically 0.5% lines of code inserted, deleted or changed.
Challenges • How do we schedule work? • Operational • Climate Production • Research / Development • OpenMP within a node? • I/O - needs a rework • Current access patterns are inefficient • Packing vectorises poorly • Packed data sizes too small to utilise best I/O connections
Initial Results (N216L38) • Real job including I/O • Load balance problem? • Can run operationally with just 1 node for current resolutions • Better scalability has been observed for higher resolutions
Opportunities • More complex climate models • Higher resolution • More physical interactions represented (eg more complex chemistry) • Satellite data volumes • Introduce 4D-Var • Increase observational window to 12 hours • Increase resolutions of models • Global, Euro-LAM, UK-Mes • Many different scenarios