150 likes | 301 Views
Climate Simulation using Ninf-G on the ApGrid Testbed. Yoshio Tanaka, Hiroshi Takemiya Kazuyuki Shudo, Satoshi Sekiguchi Grid Technology Research Center, AIST. Elements of this DEMO. Application: Climate Simulation Originally developed by Dr. Tanaka (U. of Tsukuba)
E N D
Climate Simulation using Ninf-Gon the ApGrid Testbed Yoshio Tanaka, Hiroshi Takemiya Kazuyuki Shudo, Satoshi Sekiguchi Grid Technology Research Center, AIST
Elements of this DEMO • Application: Climate Simulation • Originally developed by Dr. Tanaka (U. of Tsukuba) • Portal: Grid PSE Builder • Any Unix-command application can be integrated to Web portal • Middleware used for the implementation of Grid-enabled climate simulation: Ninf-G • GridRPC middleware based on the Globus Toolkit which is used for gridifying the original (sequential) application • Testbed: ApGrid Testbed • International Grid Testbed over the Asia Pacific Region
…… Application: Climate Simulation • Goal • Long term, global climate simulation • Winding of Jet-Stream • Blocking phenomenon of high atmospheric pressure • Barotropic S-Model • Climate simulation model proposed by Prof. Tanaka • Simple and precise Modeling complicated 3D turbulence as a horizontal one Keep high precision over long periods • Taking a statistical ensemble mean • ~ several 100 simulations • Introducing perturbation at every time step • Typical parameter survey
PSE: Grid PSE Builder • Generates an web interface for running an Unix-command application. • Write an interface using XML. <application> <appname>ls</appname> <argspec>/bin/ls %option% %width%</argspec> <arglist> <args use=“required”> <title>option</title> <radio name=“option”> <option value=“-a”>do not hide entries …</option>
HTTP server + Servlet (Apache + Tomcat) user JDBC Interface (TCP/IP) PSE: Grid PSE Builder (cont’d) client auth. Grid PSE Core SignOn/SignOff Job Control submission/query /cancel Job Queuing Manager & Signing Server globusrun Accounting DB (Postgress) accounting information
Middleware: Ninf-G (GridRPC System) Utilization of remote supercomputers ② Notify results Internet user ① Call remote procedures Call remote libraries Large scale computing utilizing multiple supercomputers on the Grid
Requires no detailed knowledge on Grid infrastructure Middleware: Ninf-G (cont’d) • RPC library on the Grid • Built on top of Globus Toolkit • MDS: managing stub information • GRAM: invocation of server programs • GSI: secure communication between a client and a server • Simple and easy-to-use programming interface • Hiding complicated mechanism of the grid • Providing RPC semantics for (i = start; i <= end; i++) { /* sequential search */ SDP_search(argv[1], i, &value[i]); } grpc_function_handle_init(&hdl, …, “SDP/search”); for (i = start; i <= end; i++) { /* parallel search using async. call */ grpc_call_async(&hdl, argv[1], i, &value[i]): }
80 160 32 16 32 40 Testbed: ApGrid Testbed http://www.apgrid.org/
Ninf-g Ninf-g Ninf-g Grid Lib Web browser Ninf-g Ninf-g user Ninfy the original (seq.) climate simulation • Dividing a program into two parts as a client-server system • Client: • Pre-processing: reading input data • Post-processing: averaging results of ensembles • Server • climate simulation, visualize S-model Program Reading data Solving Equations Solving Equations Solving Equations Averaging results VIsualize
Testbed • UME Cluster (AIST) • jobmanager-grd, (40cpu + 20cpu) • AMATA Cluster (KU) • jobmanager-sqms, 6cpu • Galley Cluster (Doshisha U.) • jobmanager-pbs, 10cpu • Gideon Cluster (HKU) • jobmanager-pbs, 15cpu • PRESTO Cluster (TITECH) • jobmanager-pbs, 4cpu • VENUS Cluster (KISTI) • jobmanager-pbs, 16cpu • ASE Cluster (NCHC) • jobmanager-fork, 2cpu
Climate Simulation client server front node - public IP - Globus - gatekeeper - jobmanager - pbs, grd, sqms - NAT backend nodes - private IP or public IP - Globus SDK - Ninf-G Lib
Lessons Learned • Difficulties caused by the bottom-up approach and the problems on the installation of the Globus Toolkit. • Most resources are not dedicated to the ApGrid Testbed. • Site’s policy should be respected. • There were some requirements on modifying software configuration, environments, etc. • Version up of the Globus Toolkit (GT1.1.4 -> GT2.0 -> GT2.2) • Apply patches, install additional packages • Build bundles using other flavors • Different requirements for the Globus Toolkit between users. • Middleware developers needs the newest one. • Application developers satisfy with using the stable (older) one. • It is not easy to catch up frequent version up of the Globus Toolkit. • ApGrid software package should solve some of these problems
Lessons Learned (cont’d) • Problems in scalabiliy • Initialization of function handles • Initialization of a function handle takes several to several ten seconds • Overhead caused by hitting gatekeeper (GSI authentication) and a jobmanager invocation • Overhead caused by MDS lookup • Current Ninf-G implementation needs to hit gatekeeper for initialization of function handles one-by-one • Although Globus GRAM enables to invoke multiple jobs at one contact to gatekeeper, GRAM API is not sufficient to control each jobs.
Lessons Learned (cont’d) • We observed that Ninf-G apps did not work correctly due to un-expected configuration of clusters • Failed in GSI auth. for establishing connection for file transfers using GASS. • Backend nodes do not have host certs. • Due to the configuration of local scheduler (PBS), Ninf-G executables were not activated. • Example: • PBS jobmanager on a 16 nodes cluster • Call grpc_call 16 times on the cluster. App. developer expected to invoke 16 Ninf-G executables simultaneously. • Configuration of PBS Queue Manager set the max number of simultaneous job invocation for each user a 9 • 9 Ninf-G executables were launched, however 7 were not activated
Special Thanks (for technical support) to: • Kasetsart University (Thailand) • Sugree Phatanapherom • Doshisha University (Japan) • Yusuke Tanimura • University of Hong Kong (Hong Kong) • CHEN Lin, Elaine • KISTI (Korea) • Gee-Bum Koo, Jae-Hyuck • Tokyo Institute of Technology (Japan) • Ken’ichiro Shirose • NCHC (Taiwan) • Julian Yu-Chung Chen • AIST (Japan) • Grid Support Team • APAN • HK, TW, JP