270 likes | 418 Views
22nd APAN Meeting in Singapore. Grid Challenge - programming competition on the Grid -. Kento Aida Tokyo Institute of Technology. What is Grid Challenge?. programming competition to develop high-performance programs on the Grid The organizer operates a Grid testbed.
E N D
22nd APAN Meeting in Singapore Grid Challenge- programming competition on the Grid - Kento Aida Tokyo Institute of Technology Kento Aida, Tokyo Institute of Technology
What is Grid Challenge? • programming competition to develop high-performance programs on the Grid • The organizer operates a Grid testbed. • Participants develop/run programs on the testbed. • a special event in the Annual Symposium on Advanced Computing Systems and Infrastructures (SACSIS) • history • 1st Grid Challenge in SACSIS 2005 • 2nd Grid Challenge in SACSIS 2006 Kento Aida, Tokyo Institute of Technology
Category • compulsory • programming competition on the Grid testbed • solving the problem provided by the organizer • Graph Partitioning Problem • students (university and high school) • free • giving opportunities to perform experiments on the Grid • presentations during the conference • students, engineers and researchers Kento Aida, Tokyo Institute of Technology
2 4 1 6 3 L R 5 Compulsory Graph Partitioning Problem for given undirected graph G(V,E), |V| = 2n L and R are disjoint partitions generated by equally dividing G, where |L| = |R|. Find partition that minimizes the number of edges with one endpoint in L and the other in R. Kento Aida, Tokyo Institute of Technology
Compulsory (cont’d) • qualifying runs (3 weeks) • Solve early! • to find a solution within a given threshold • shared resources • problem size: |V| = 500 - 1500 • final runs (2 weeks) • Solve fast! • dedicated time slots for finalists (2.5h per a team) • to find a solution within a given period (10 min) • A finalist with the best solution will be a winner! • problem size: |V| = 30000 - 35000 Kento Aida, Tokyo Institute of Technology
Free • experiments of research projects (1 month) • shared resources • projects • tools • a monitoring tool, a message passing system, a programming tool, volunteer computing • applications • physics simulation, bio informatics, simulation of diesel engine, optimization problems Kento Aida, Tokyo Institute of Technology
H, 1 D, 2 U, 1 D, 2 U, 6 M, 12 M, 5 Participants compulsory free Kento Aida, Tokyo Institute of Technology
Testbed • Grid Challenge Federation • AIST • Tokyo Institute of Technology • The University of Tokyo • Doshisha University more than 1,200 CPUs Kento Aida, Tokyo Institute of Technology
Resources • collection of PC clusters • spec of a PC cluster • a gateway node • gateway, compiling • computing nodes • computation • global IP address/private IP address • NFS • “/home” is shared among nodes Kento Aida, Tokyo Institute of Technology
Resources (cont’d) Kento Aida, Tokyo Institute of Technology
SAKURA Tsukuba WAN F32 PrestoIII WIDE Chikayama DIS Tau SINET Xenia Internet Connection Kento Aida, Tokyo Institute of Technology
Software • Grid middleware • Globus Tool Kit 2.4 • batch queueing system • Sun Grid Engine, PBS • remote process invocation • SSH, GXP • monitoring • Ganglia • programming • MPICH 1.2.7, Ninf-G 2.4 Kento Aida, Tokyo Institute of Technology
GXP http://www.logos.ic.i.u-tokyo.ac.jp/phoenix/gxp_quick_man.shtml • shell for distributed multi-cluster environment • fast simultaneous command submissions • parallel job pipes • interactive selection of nodes to execute commands • no cumbersome per-node operations! • installation and deployment • invocation of parallel processes • monitoring, trouble diagnosis, debugging • dead processes clean-up Kento Aida, Tokyo Institute of Technology
Ninf-G http://ninf.apgrid.org/ • reference implementation of GridRPC • GridRPC : a simple RPC-based programming model for the Grid • Client invokes remote libraries installed on remote servers on the Grid. • utilizing task parallelism server client program server program data library result client grpc_call(…) data server library result Kento Aida, Tokyo Institute of Technology
Ganglia http://ganglia.sourceforge.net/ • a distributed monitoring tool for high-performance computing systems such as PC clusters and Grids • CPU load • memory usage • network traffic Kento Aida, Tokyo Institute of Technology
Operation • The testbed is operated by volunteers! • researchers/technical staff/students • What we need to do • installation and its training for students • user management • job management Kento Aida, Tokyo Institute of Technology
User Management • local account • the same UID and login name for a user on all sites • remote login via ssh • public key • Globus account • temporal CA for the Grid Challenge Kento Aida, Tokyo Institute of Technology
Job Management • interactive or batch • All sites provide both environment for job execution. • dedicated slot • Finalists are assigned dedicated slots for their application runs. • the gentlemen’s agreement Kento Aida, Tokyo Institute of Technology
Troubles … • computing nodes • OS hang up, troubles on hard disc drives • power supply • failure of balancing power supply • servers • troubles on NFS, batch queueing systems • monitoring • troubles to collect monitoring data on ganglia Kento Aida, Tokyo Institute of Technology
Troubles … (cont’d) • jobs being out of control • waste of CPU/memory resources by jobs being out of control • dedicated slots • jobs running beyond its slot. Kento Aida, Tokyo Institute of Technology
Operational Issue • trouble on computing nodes • monitoring tools to identify computing nodes • power supply • critical problem for small groups, e.g., a lab in university • tools for power monitoring • low-power processor • servers • redundancy Kento Aida, Tokyo Institute of Technology
Operational Issue (cont’d) • user/process management • tools to control user processes • monitoring user processes • detecting unusual behavior • suspending/killing jobs being out of control • tools for reservation • reserving dedicated slots for users • controlling user jobs Kento Aida, Tokyo Institute of Technology
Snapshots qualifying runs final runs Kento Aida, Tokyo Institute of Technology
Snapshots (cont’d) Kento Aida, Tokyo Institute of Technology
Conclusions • Grid Challenge is programming competition to develop high-performance programs on the Grid. • compulsory and free categories • Grid testbed for Grid Challenge • 6 sites, 7 PC clusters, >1200 CPU • Globus, SGE, PBS, GXP, Ganglia, Ninf-G, MPICH, … • discussion about operational issue • tools for monitoring, power supply, user/process management Kento Aida, Tokyo Institute of Technology
Acknowledgements • Information Processing Society of Japan • Sun Microsystems • Soum Corporation • Grid Consortium Japan Kento Aida, Tokyo Institute of Technology
Thank you. Kento Aida, Tokyo Institute of Technology