1 / 15

Asia Pacific Grid: Towards a production Grid

Asia Pacific Grid: Towards a production Grid. Yoshio Tanaka Grid Technology Research Center, Advanced Industrial Science and Technology, Japan. Contents. Updates from PRAGMA 5 Demo at SC2003 (climate simulation using Ninf-G) Joint demo with NCHC Joint demo with TeraGrid

shanae
Download Presentation

Asia Pacific Grid: Towards a production Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Asia Pacific Grid:Towards a production Grid Yoshio Tanaka Grid Technology Research Center, Advanced Industrial Science and Technology, Japan

  2. Contents • Updates from PRAGMA 5 • Demo at SC2003 (climate simulation using Ninf-G) • Joint demo with NCHC • Joint demo with TeraGrid • Experiences and Lessons Learned • Towards a production Grid

  3. Why the climate simulation? • Climate simulation is used as a test application to evaluate progress of resource sharing between institutions • We can confirm achievements of • Globus-level resource sharing • Globus is correctly installed • Mutual authentication based on GSI • High-level Middleware (GridRPC) –level resource sharing • JobManager works well • Network configuration of the cluster(note that most clusters use private IP addresses)

  4. Behavior of the System Severs NCSA Cluster (225 CPU) Ninf-G Client (AIST) Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU)

  5. Terrible 3 weeks (PRAGMA5~SC2003) • Increased resources • 14 clusters -> 22 clusters • 317 cpus -> 853 cpus • Installed Ninf-G and climate simulation on TeraGrid • Account was given in Nov. 4th • Port Ninf-G2 to IA64 architecture

  6. Necessary steps for the demo • Apply my account to each site • Add an entry to grid-mapfile • Test globusrun • authentication • Is my CA trusted? Do I trust your CA? • Is my entry in grid-mapfile? • DNS lookup • reverse lookup is used for server authentication • firewall / TCP Wrapper • Can I connect to the Globus gatekeeper? • Can the globus jobmanager connect to my machine? • jobmanager • Is the queuing system (eg. pbs, sge) installed appropriately? • Does jobmanager script work as expected? • In case of TeraGrid • Obtained my user certificate from TeraGrid CA (NCSA CA) • Asked TITECH and KISTI to trust NCSA CA • It was not feasible to ask TeraGrid to trust AIST GTRC CA

  7. Necessary steps for the demo (cont’d) • Install Ninf-G2 • Frequently occurred problem due to inappropriate installation of GT2 SDK • GT2 manual: • GRAM and DATA: gcc32dbg • Info: gcc32dbgpthr • Asked additional installation of Info SDK with gcc32dbg • Test Ninf-G application • Can Ninf-G server program connect to the client? • If private IP address is used for the backend node, NAT must be available • These are application/middleware specific requirements. Requirements depend on applications and middleware. • New Ninf-G application (TDDFT) needs Intel Fortran Compiler • Other application needs GAMESS / Gaussian

  8. Lessons Learned • Need to pay much efforts for initiation • MDS is not scalable and still unstable • Need to modify some parameters in grid-info-slapd.conf • Testbed was unstable • Unstable / poor network • System maintenance (incl. version up of software) without notification • realized when the application would fail. • it worked well yesterday, but I’m not sure whether it works today

  9. Lessons Learned (cont’d) • Difficulties caused by the grass-roots approach. • It is not easy to keep the GT2 version coherent between sites. • Different requirements for the Globus Toolkit between users • Most resources are not dedicated to the Testbed. • resources may be busy / highly utilized • Need grid level scheduler, fancy Grid reservation system? • (from point of view of resource providers) we need flexible control of donated resources • e.g. 32 nodes for default user, 64 nodes for specific groups, 256 nodes for my organization

  10. Summary of current status (cont’d) • What has been done? • Resource sharing between more than 20 sites (853cpus were used by Ninf-G application) • Use GT2 as a common software • What hasn’t? • Formalize “how to use the Grid Testbed” • I could use, but it is difficult for others • I was given an account at each site by personal communication • Provide documentation • Keep the testbed stable • Develop management tools • Browse information • CA/Cert. management

  11. Towards a production Grid • Define minimum requirements of Grid middleware • Resource WG has the responsibility • NMI, TeraGrid software stack • Each site must follow the requirement • Keep the testbed as stable as possible • Understand that the security is definitely essential for international collaboration • How is the security (CA) policy in Asia Pacific?

  12. Towards a production Grid (cont’d) • Draft “Asia Pacific Grid Middleware Deployment Guide”, which is a recommendation document for deployment of Grid middleware • Minimum requirements • Configuration • Draft “Instruction of Grid Operation in the Asia Pacific Region”, which guides how to run Grid Operation Center to support management of stable Grid testbed. • Launch Asia Pacific Grid Policy Management Authority ( http://www.apgridpma.org/ ) • Coordinate security level in Asia • Interact with outside of Asia (DOEGrids PMA, EUGrid PMA) • Sophisticated users’ Guide is necessary

  13. Towards a production Grid (cont’d) • Each site should provide a document and/or web for users • Requirements for users • How to obtain an account • Available resources • hardware • software and its configuration • resource utilization policy • support and contact information

  14. Future Plan (cont’d) • Should think about GT3/GT4-based Grid Testbed • Each CA must provide CP/CPS • International Collaboration • TeraGrid, UK eScience, EUDG, etc. • Run more applications to evaluate feasibility of Grid • large-scale cluster + fat link • many small cluster + thin link

  15. Summary • It is tough work to make resources available for applications • many steps • It is tough to keep the testbed stable • Many issues to be solved toward a production Grid • Technical • local and global scheduler • dedication / reservation / co-allocation • Political • CA policy • How can I get an account on your site? • Both • Coordination of middlewares • More interaction between resource and applications WG is necessary • Need to establish necessary procedures for resource sharing

More Related