380 likes | 577 Views
Grid Computing in Distributed High-End Computing Applications: Coupled Climate Models and Geodynamics Ensemble Simulations Shujia Zhou Northrop Grumman IT/TASC W. Kuang 2 , W. Jiang 3 , P. Gary 2 , J. Palencia 4 , G. Gardner 5 2 NASA Goddard Space Flight Center, 3 JCET, UMBC,
E N D
Grid Computing in Distributed High-End Computing Applications: Coupled Climate Models and Geodynamics Ensemble Simulations Shujia Zhou Northrop Grumman IT/TASC W. Kuang2, W. Jiang3, P. Gary2, J. Palencia4, G. Gardner5 2NASA Goddard Space Flight Center, 3JCET, UMBC, 4Raytheon ITSS, 5INDUSCORP
Outline • Background • One potential killer application (coupling distributed climate models) • One near-reality application (managing distributed ensemble simulation) • One framework supporting Grid computing applications: Common Component Architecture (CCA/XCAT3, CCA/XCAT-C++) • High-speed network at NASA GSFC • An ensemble-dispatch prototype based on XCAT3 • ESMF vs. Grid • ESMF-CCA Prototype 2.0: Grid computing • Summary
Earth-Sun System Models • Typical Earth-Sun system models (e.g., climate, weather, data assimilation) consist of several complex components coupled together through exchange of a sizable amount of data. • There is a growing need for coupling model components from different institutions • Discover new science • Validate predictions A M PEs to N PEs data-transfer problem !!!
Coupled Atmosphere-Ocean Models Atmosphere Ocean Different grid type, resolution
Flow Diagram of Coupling Atmosphere and Ocean(a typical ESMF application) Create Atm, Ocn, CplAtmXOcn, CplOcnXAtm componets Registration time Component registration data t=t0 Atmosphere ESMF_State::exportAtm CplAtmXOcn Regridding: interpolate(…) ESMF_State::importOcn Ocean Runn tOcntime steps, run() tglobal ESMF_State::exportOcn CplOcnXAtm Regridding: extract(…) ESMF_State::importAtm exportAtm Runm tAtmtime steps, run() Atmosphere t=t0 + ncycletglobal Finalize
Coupling Earth System Model Components from Different Institutions • Current approach: physically port source codes and their support environment such as libraries and data files to one computer • Problems: • Considerable efforts and times are needed for porting, validating, and optimizing the codes • Some code owners may not want to release their source codes. • Owners continue to update the mode codes. A killer application: Couple models at their institutions via Grid computing !
Import: 12 2D arrays U_flux, v_flux, q_flux, salt_flux, sw_flux, fprec, runoff, calving, p, t_flux, lprec, lw_flux For the 0.25 degree resolution without mask, ~99 MB data Export: 6 2D arrays T_surf, s_surf, u_surf, v_surf, sea_level, frazil For the 0.25 degree resolution without mask, ~49 MB data How Much Data Exchange in A Climate Model?(e.g, NOAA/GFDL MOM4 Version Beta2) • For a coupling interval of 6 hours between atmosphere and ocean models with 0.25 degree resolution, data exchange is typically not more frequent than 1 minute of a wall clock <1MB per second. Observation: ~100KB/s for using “scp” to move data from NCCS to UCLA! A Gbps-network is much sufficient for this kind of data exchange!
Distributed Ensemble Simulations • Typical Earth-Sun system models (e.g., climate, weather, data assimilation, solid Earth) are also highly computationally demanding • One geodynamo model, MoSST, requires 700 GB RAM, and 1016flops for the (200, 200, 200) truncation level • The ensemble simulation is needed to obtain the best estimation used for optimal forecasting • For a successful assimilation with MoSST, a minimum of 30 ensemble runs and ~50PB storage are expected. Using a single supercomputer is not practical!
Characteristics of Ensemble Simulation • Little or no interaction among ensemble members • The initial state for next ensemble run may depend on the previous ensemble run---loosely coupled. • High failure tolerance • Small network usage reduces the failure possibility • The forecasting depends on the collection of all the ensemble members, not on a particular ensemble member
Technology: Grid Computing Middleware(CCA/XCAT) • Merging OGSI and DOE’s high-performance component framework, Common Component Architecture (CCA) • Component model • Compliant with CCA specification • Grid services • Each XCAT component is also a collection of Grid services • XCAT Provides Ports are implemented as OGSI web service • XCAT Uses Ports can also accept any OGSI compliant web service • XCAT: provide a component-based Grid services model for Grid computing • Component Assembly: Composition in space • “Provide-Use” pattern facilitates composition • Standard ports are defined to streamline the connection process • More applicable for the cases where users and providers know each others
CCA Framework Proteus API Protocol 1 Protocol 2 TCP UDT Technology: Grid Computing Middleware(Proteus: Multi-Protocol Library) • Proteus provides single-protocol abstraction to components • Allows users to dynamically switch between traditional and high-speed protocols • Facilitates use of specialized implementations of serialization and deserialization Proteus allows a user to have a choice of networks
Technology: Dedicated High-Speed Network (Lambda Networks) • Dedicated high speed links (1Gbps, 10 Gbps, etc) • Being demonstrated in large-scale distributed visualization and data mining • National LambdaRail is currently under development • NASA GSFC is prototyping it and is in the process of connecting to it.
Distributed Ensemble Simulation via Grid Computing(System Architecture) driver geo1 geo2 geo3 dispatch MoSST MoSST MoSST host remote1 remote2 remote3 PE0 PE0 PE1 PE0 PE1 PE0 PE1 Note: is “grid computing codes” is an “application code” Separated for flexibility!
Prototype: Ensemble Simulation via Grid Computing(Components and Ports) dispatchProvide dispatch go driver dispatchUse geo1Use geo1Provide geo1 geo2Use geo2Provide geo2 Simpler than “workflow”
Prototype: Ensemble Simulation via Grid Computing(Flow Chart of Invoking A Simulation) Dispatcher invokes remote applications 3 dispatch provideCMD geo1 provideCMD useCMD 1 3 2 useCMD driver geo2 provideCMD 4 2 useCMD Run on three computer nodes connected by 10 Gigabit Ethernet
Prototype: Ensemble Simulation via Grid Computing(Flow Chart Of Feedback During A Simulation) Simulations report failure or completion dispatch provideCMD 1 geo1 provideCMD useCMD 3 1 4 useCMD driver geo2 provideCMD 2 4 useCMD A monitoring functionality is developed for geo components
Adaptive User Interface • Network programming is complex and its concept is unfamiliar to scientists • A user-friendly interface is even more important in applying grid computing to scientific applications • A Model Running Environment (MRE) tool is developed to reduce the complexity of running scripts by adaptively hiding details.
Original script Marked script Filled script MRE 2.0 is used in GMI Production!
Where is ESMF in Grid Computing? • Make components known to Grid • Need global Component ID • Make component services available to Grid • ESMF_Component (F90 user type + C function pointer) • C++ interfaces for three fundamental data types are not complete • ESMF_Field, ESMF_Bundle, ESMF_State • The function pointers need to be replaced with the remote one • Make data-exchange type transferable via Grid • ESMF_State (F90 data pointer + C array) • Serialization/deserialization is available • The data represented by a pointer needs to be replaced with data copy
Grid-Enabled ESMF:Link Functions in Remote Model Components init grid layout run Driver final init grid grid grid Assembled component layout layout layout run final Network init init init run run run final final final Atmosphere Coupler Ocean setEntryPoint setService
Grid-Enabled ESMF:Transfer Data Across Network ESMF_State::importOcn Network RMI Ocean proxy Ocean Component, import/export state, clock ESMF_State::exportOcn
ESMF-CCA Prototype 2.0 Global component ID CCA component registration Provide Port Use Port Provide Port Atmosphere Use Port Proxy Init() Run() Final() ESMF_State Init() Run() Final() Network RMI for remote pointer XSOAP for data transfer Ocean Init() Run() Final() CCA tool ESMF concept Grid computing
A sequential coupling between an atmosphere and a remote ocean model component implemented in the ESMF-CCA Prototype 2.0 Create Atm, Ocn, CplAtmXOcn, CplOcnXAtm componets Registration Component registration time t=t0 Atmosphere Evolution ESMF_State::exportAtm ESMF_State::exportAtm CplAtmXOcn Regridding ESMF_State::importOcn RMI OceanProxy Ocean tglobal ESMF_State::exportOcn CplOcnXAtm Regridding ESMF_State::importAtm Atmosphere Evolution t=t0 + ncycletglobal Finalize
Composing Components with XCAT3 Jython Script 1. Launch components Atm Atmosphere Component 2. Connect Uses and Provides Ports A2O CplAtmXOcn Component Climate Component Atm Ocn Ocean1 Component A2O O2A CplOcnXAtm Component Ocn O2A Go Go Component Go Run on two remote computer nodes connected by 10 Gigabit Ethernet
Summary • Grid computing technology and high-speed network such as Lambda network make distributed high-end computing applications promising. • Our prototype based on XCAT3 framework shows distributed ensemble simulation can be performed on a up to 10 Gbps network in a user-friendly way. • ESMF component could be grid-enabled with the help of CCA/XCAT.
Prototype: Ensemble Simulation via Grid Computing(Flow chart of intelligently dispatching ensemble members) dispatch driver 1 3 5 geo1 geo2 geo3 2 4 6 The type, “geoCMD,” is used to exchange data among components
observation Scientific Objective: Develop a geomagnetic data assimilation framework with MoSST core dynamics model and surface geomagnetic observations to predict changes in Earth’s magnetic environment. Algorithm Xa: Assimilation solution Xf: Forecast solution Z:Observation data K: Kalman Gain matrix H: Observation operator
New Transport Layer Protocols Why needed • TCP’s original design for slow backbone networks • Standard “out-of-the-box” kernel TCP protocol tunings inadequate for large bandwidth*long delay application performance • TCP requires a knowledgeable “wizard” to optimize the host for high performance networks
Current throughput findings from GSFC’s 10-Gbps networking efforts From UDP-based tests between GSFC hosts with 10-GE NIC’s, enabled by: nuttcp -u -w1m From To Throughput TmCPU% RcCPU% %packet-loss San Diego Chicago 5.213+ Gbps 99 63 0 Chicago San Diego 5.174+ Gbps 99 65 0.0005 Chicago McLean 5.187+ Gbps 100 58 0 McLean Chicago 5.557+ Gbps 98 71 0 San Diego McLean 5.128+ Gbps 99 57 0 McLean San Diego 5.544+ Gbps 100 64 0.0006
Current throughput findings from GSFC’s 10-Gbps networking efforts From TCP-based tests between GSFC hosts with 10-GE NIC’s, enabled by: nuttcp -w10m From To Throughput TmCPU% RcCPU% San Diego Chicago 0.006+ Gbps 0 0 Chicago San Diego 0.006+ Gbps 0 0 Chicago McLean 0.030+ Gbps 0 0 McLean Chicago 4.794+ Gbps 95 44 San Diego McLean 0.005+ Gbps 0 0 McLean San Diego 0.445+ Gbps 8 3
Current throughput findings from GSFC’s 10-Gbps networking efforts From UDT*-based tests between GSFC hosts with 10-GE NIC’s, enabled by: iperf From To Throughput San Diego Chicago 2.789+ Gbps Chicago San Diego 3.284+ Gbps Chicago McLean 3.435+ Gbps McLean Chicago 2.895+ Gbps San Diego McLean 3.832+ Gbps McLean San Diego 1.352+ Gbps • *Developed by Robert Grossman (UIC): http://udt.sourceforge.net/
The non-experts are falling behind Year Experts Non-experts Ratio 1988 1 Mb/s 300 kb/s 3:1 1991 10 Mb/s 1995 100 Mb/s 1999 1 Gb/s 2003 10 Gb/s 3 Mb/s 3000:1
New Transport Layer Protocols Major Types • UDP and TCP Reno standard (“default w/OS”) • Other versions of TCP (Vegas, BIC) are included in the Linux 2.6 train • Other OS’s may not have the stack code included • Alternative transport protocols are non-standard and require kernels to be patched or operate in user space
Next Step: Transform A Model into A Set of Grid Services providePort (grid service) usePort XCAT Component Wrapper to XCAT Component ESMF Component Import/export state Init(),Run(),Final() • Standalone (local) • Coupled systems (distributed) model supercomputer data storage