130 likes | 367 Views
NWP Transition from AIX to Linux Lessons Learned. Dan Sedlacek AFWA Chief Engineer AFWA A5/8 14 MAR 2011. Overview. Introduction AFWA Architecture Applications run on HPC Original NWP Environment Linux Configuration TCO Comparison Lessons Learned Future Linux Plans Summary.
E N D
NWP Transition from AIX to Linux Lessons Learned Dan Sedlacek AFWA Chief Engineer AFWA A5/8 14 MAR 2011
Overview Introduction AFWA Architecture Applications run on HPC Original NWP Environment Linux Configuration TCO Comparison Lessons Learned Future Linux Plans Summary
Introduction • AFWA has a long history of AIX HPC environment • Air Force Weather Environment • Worldwide, 24x7x365, systems, weather data and product support • Headquarters, Operational Weather Squadrons (OWS), and Combat Weather Teams (CWTs), Climatological Center (14th WS) • 600+ systems across 4 distinct security enclaves • 16 million+ lines of code • ~1,000 software applications supported • As model resolutions improve and processing requirements soar, AFWA requirements for NWP processing capability have increased dramatically • SEMS (in-house support contractor) performed a study, evaluating IBM, HP, and Cray • Red Hat Linux on HP hardware • Transitioning from IBM/AIX to HP/Linux has resulted in a significant savings in Total Cost of Ownership (TCO)
Applications Run on HPC • Run Regional Models • WRF • WRF Chem • CDFS II (future) • Dust • LIS • Run Global UM • Ensembles • Model post-processing • Misc space products
“Free” Hardware Adventure • In 2008 AFWA evaluated JVN (available from HPCMO Modernization) • 1024 compute nodes • 36 racks of equipment • 589 KW power requirements • 161 tons of cooling • The “Free” hardware was not cost-effective • SEMS performed a study to evaluate alternatives • New hardware was more cost effective • Less space • Less power • Less cooling • More Flops • Lower TCO • Decision made to pursue Linux HPC solution
Linux ConfigurationProd 8/DC3 OS: Linux RHEL 5.3 File System: Lustere Disk: 50 TB I/O Bandwidth: 900 Mb/s throughput Chipset (2) ) 2.53 GHz Intel Nehalem E5540 quad-core CPUs per node Compute Blades: 128 Cores/Memory: 1024 cores, 3GB per core Processing capacity: 10 TeraFlops (Production) Test and development system (DC3): 5 TeraFlops
TCO Comparison Original 10 TeraFlops of IBM/AIX HPC O&M (non-labor) - $1.4M Nominally $133K per TeraFlop for IBM/AIX HPC Annual projected O&M costs for Linux (now totalling 24 TeraFlops) - $ 1M Conservatively, $30K per TeraFlop for HP/Linux HPC Bottom line: Linux HPC solution represented a significant savings
Lessons Learned • Not all “free” hardware is desirable (JVN) • Differences in Linux vs. AIX compilers (minor, but require modifications) • Significant tuning differences between AIX and Linux • File system configurations significantly different (Lustere/IBRIX vs GPFS) • Job scheduler differences had to be worked through (IBM Load Leveler vs. Platform LSF) • Full reduction of TCO doesn’t occur until previous OS support is no longer required • So far, Linux has been proven to be a reliable and cost-effective OS for NWP
Future Linux Plans 5000+ core Linux cluster is being planned for delivery in August 2011 Represents 51 TeraFlopsof additional capability Total HPC capacity by end of year 2011 > 90 TeraFlops Total phase out of IBM/AIX HPC environment
Summary • Total Cost of Ownership is complex • Initial costs • Transition costs • Facility costs • Support costs • Linux does scale well • Linux is a viable and cost-effective HPC platform • Transitioning from IBM/AIX to HP/Linux has resulted in a significant TCO savings