380 likes | 393 Views
This project focuses on GRID interoperability, covering various GRID-related subjects including BIO_1 for biomedical simulations. The expected outcome is wider sharing of resources and information globally. Activities involve new middleware development, iRODS members, and collaborations between Japan and France. Key aspects include the development of iRODS, network monitoring, and GRID interoperability R&D. The project emphasizes the importance of GRID in distributed computing, with a focus on NAREGI and gLite interoperability. Efforts include interoperation islands between production grids and developing a Meta Scheduler.
E N D
Computing_3:GRID Interoperability CC-IN2P3 and KEK Computing Research Center Takashi Sasaki KEK Computing Research Center FJPPL Workshop 08
The project • Not only GRID interoperability, but also covering many subjects related GRID • Also related with BIO_1 for simulation in the bio-medical field • Expected outcome • Resource and information will be shared much wider in the world even though people use the different middleware • GRID Interoperability • Especially between gLite and NAREGI • New middleware development • iRODS
Members • Japan • S. Kawabata • T. Sasaki • G. Iwai • K. Murakami • Y. Iida • France • D. Boutigny • S. Reynaud • F. Hernandez • J.Y. Nief • Chen Cheng
Activities in the last year • Workshop at Lyon in October 2007 • 4 of KEK staffs visited there • Yoshimi Iida, an engineer of KEK, stayed at CC-IN2P3 for 8 months during Oct. 2007 and April 2008 sponsored by the French government • iRODS development • Network monitoring and performance tuning • R&D on GRID interoperability in progress • NAREGI installation at CC-IN2P3 • JSAGA development • GRID deployment and support for ILC/CALICE
What is GRID? • GRID is the concept of a technology for distributed computing over the Internet • Not a name of product nor software, but the name of the concept • Virtual Organization (VO) • Single sign-on • Well maintained reliable services
gLite • Developed by the EGEE project since 2002 • EGEE III started over in this April • The final stage of EGEE • Two years funding 2008-2009 • KEK finally joined the consortium • Strong connection to CERN • Core component of LCG middleware • Needs much work in the experiment side still
Introduction • NAREGI: NAtional REsearch Grid Initiative • Foundation: 2003-2007 10 billion Yen for 5 years • Host institute: National Institute of Infomatics(NII) • Core collaborations: IMS(molecular science), AIST(industrial app.), TIT, Osaka, Hitachi, Fujitsu, NEC • Mission: • R&D of the Grid middleware for research and industrial application toward the advanced infrastructure • Primary target application is nano technology for innovative and intelligent materials production. • More focused in the computing grid for linking supercomputer centers for coupled simulation of multi-scale physics • Support heterogeneous computer architectures (vector & super parallel & clusters) • Data grid part were integrated in 2005
NAREGI status • Version 1 finally released on May 9th, 2008 • The first production version of NAREGI • Can be downloaded from http://www.naregi.org • English manual is under preparation and will be released the end of this month
GRID Interoperability • Why it is necessary? • “GRID” is the name of the concept and the real infrastructure is realized by middleware • gLite in Europe, GRID5000 in France, OSG and TereGRID in US, NAREGI in Japan, ChinaGRID in China and so on, so on…. • Different middleware in different area of research and different geometrical area • Depending on national infrastructure helps to save human resources • US HEP is using OSG • NAREGI is the national flagship project in Japan • High Energy Physics is not the only program where GRID is deployed • CC-IN2P3 and KEK supports wide area of physical science
Interoperability between EGEE and NAREGI • 2 possible approaches • Implement the GIN (Grid Interoperability Now) layer in NAREGI • Defined by the GIN group from the OGF • Short term solution in order to get the Interoperability Now ! • Pragmatic approach • Work with longer term standards defined within the OGF • Develop a Meta Scheduler compatible with many Grid implementations • Based on SAGA (Simple API for Grid Applications) "Instead of interfacing directly to Grid Services, the applications can so access basic Grid Capabilities with a simple, consistent and stable API" • and JSDL (Job Submission Description Language)
GIN • Pushed by KEK, NAREGI has done considerable efforts to implement the GIN layer • "Trying to identify islands of interoperation between production grids and grow those islands“ • This has been deployed at KEK and under testing Developing an interoperation island with EGEE Common framework for Authentication and VO management Cross job submission between gLite/EGEE and NAREGI Data transfer between gLite and Gfarm Grid resource information service around the world From NAREGI presentation in 02/07
The SAGA / JSDL approach • This approach is being developed at CC-IN2P3 (Sylvain Reynaud) • Is in fact an extension of a software layer necessary to easily interface our local batch system to any Grid Computing Element D. Boutigny
Interoperability - JSAGA • JSAGA is an API designed for uniformly and efficiently access to heterogeneous grids • Use the existing grids as they are, as efficiently as possible • Designed for scalability, efficiency and robustness • Interfaces conform to OGF specifications: • SAGA (Simple API for Grid Application) • Object-oriented API with a number of functional packages for fundamental programming capabilities • JSDL (Job Submission Description Language) • XML-based language for describing single job submission requirements • Extended to support parametric job and job collection submission Objective: describe your job once, submit it everywhere ! http://grid.in2p3.fr/jsaga/index.html
hide grid infrastructures heterogeneity (e.g. EGEE, OSG, DEISA) hide middleware heterogeneity (e.g. gLite, Globus, Unicore) JSAGA – overview interfaces implementation JSAGA API JSAGA ENGINE Collection Mgr Job Collection language expression SAGA API Data Service File Job Service protocol control monitor Job Security Mgr Context security
JSAGA – current status • SAGA implementation (hide middleware heterogeneity) • compliant with SAGA: • version 1.0 of the specification (January 15, 2008) • "snapshot-1" of the official Java Binding • streams and RPC interfaces unimplemented (and not planned) • plug-ins provided for: • gLite-WMS, Unicore 6, Globus GK and WS-GRAM, SSH… • gsiftp, rbyteio, sftp, https… in progress: SRB/iRods, srm, lfc… • Job collection mgt. (hide infrastructures heterogeneity) • implementation almost finished of the mechanisms for: • efficient transport of data to/from workers with many protocols • selection of environment variables and security contexts
JSAGA – perspectives • SAGA implementation (hide middleware heterogeneity) • will be presented at OGF 23 • will also implement the service discovery API when it will be added to the SAGA specification • continue adding new plug-ins (NAREGI, CREAM…) • Job collection mgt. (hide infrastructures heterogeneity) • short term (end of may): finish implementation • long term: add resources selection ?...
AMS. AMS Testbed for NAO-KEK federation March 2008 User Browsers User Browsers NAO Site KEK Site Portal Portal SuperSched Infosys-NAS Infosys-CDAS GridVM Serv MetaData Serv AccessManage Gfarm Storage NAT/DNS UMS/VOMS Portal SS SS Private network IS-NAS IS-NAS 192.168.2.101~ IS-CDAS GridVM Eng IS-CDAS SINET3 GridVM Eng GVM Ser GridVM Ser MDS MDS Compute nodes FireWall AMS Gfarm Gfarm DataGrid part NAT/DNS NAT/DNS VOMS/KEK UMS/VOMS Compute nodes UMS/VOMS Test VO: naokek
Federation Test with NAO (National Astronomical Observatory) • Aim: Evaluation of application environments of NAREGI • Test Applications • NAO: JVO(Japanese Virtual Observatory) applications • KEK: HEP Data Analysis, ex Belle simulation Geant4 MPI simulation • Status: • NAO installed NAREGI b2 in the testbed Feb. 2008 (DataGrid part is not yet installed.) • Test VO: naokek hosted by KEK VOMS server/gLite • Simple Job submission and retrieve were successfully tested in the end of March • Remote data file staging-in/-out has been confirmed. • Astro application job has been submitted to KEK site and retrieved theresult to post-process for visualization. Apr. 2008
Federation Test with NAOJ-KEK SUBARU Telescope in Hawaii • Setup Astro Libraries at KEK site • Job submission to KEK with Work Flow Tool(WFT) at the NAOJ Portal • Input data are transferred from NAOJ and Output data are staged-out to NAOJ portal • Output data was processed with vis. software as shown in the right picture. Input Data:(2.7 GB) 10 CCD mosaic images 160MB x 17 job sub NAOJ KEK NAREGI Servers NAREGI Servers Process: 10 Hours senser calib. adjust deformation positioning mosaicing summing 17frames retrieve Astro lib (1GB) Data 2.7GB Visualization 50,000 objects identified in this frame.
Test Applications • Data Analysis program :Carbon Ion Scattering in the water measured with Emulsion at 150 MeV, 300MeV • Data analysis program (written in ruby) • Input data was in the Gfarm and analyzed data were stored in the Gfarm files and also transferred to the SRB storage with Grid FTP • Typical elapsed time of a job is about 2 hours. • Geant4 Simulation with MPI • Parallel processing Geant4 simulation with GridMPI of NAREGI has been test on the b1. • Belle event Simulation • full simulation softwares with libraties and database are installledand tested successfully in KEK site. • plan to inter-operate with gLite/EGEE belle VO • SUBARU telescope image enhancement
Data Grids Installation at KEK 2007.2.9 gLite/EGEE CPUs Naregi-kek SRB MCAT SRB-DSI SRB server gLite/ CRC-02 gLite/ CRC-01 SRB files Local files HPSS Grid files
NAREGI at CC-IN2P3 (in progress) • A full NAREGI system was installed at CC-IN2P3 • beta2 • With virtual machines • Two sets of cluster (with PBS pro. v7 and v9 (testing)) • JSAGA Interface is developed using this environment
Future Plan on NARGI at KEK • Migration to the production version ASAP • We will cooperate with Grid Operation Center in National Institute of Informatics • Planed to be started the operation in JFY2008 • Multi site federation test with full specification will be done • KEK leads improvement of the middleware in the application domain • Will join the development team if funding request is approved
RNS • Middleware independent file catalogue is strongly desirable to operate multi-middleware and share data • Robustness and scalability are issue • RNS: Resource Naming Service is standardized at OGF already • http://www.ogf.org/documents/GFD.101.pdf • Two independent implementations are going on • U. of Tsukuba • http://www.ogf.org/OGF21/materials/957/OGF21%20RNS.pdf • University of Virginia • KEK is working with U. of Tsukuba and has requested NAREGI to support RNS • U. of Virginia is developing LFC interface to RNS • LFC interface to iRODS will be the subject of the collaboration among us
iRODS • The successor of SRB • Developed by international collaboration • SDSC leads the project • CC-IN2P3 and KEK are the members of the collaboration • SRB is used for BELLE at KEK • Integrating existing data storage in the GRID environment • We are waiting for iRODs becomes matured enough to replace SRB • Provides much flexible functionality • Rule oriented • Also collaboration with Adil Hasan(RAL, UK)
iRODS at CC-IN2P3 • Tests scripts: • icommand test script: extensive test of the binary commands in order to track bugs (both client and server side problems). • Loading test of the system: launch n test scripts in // on the same machine. • stress test of the iCAT (database) with million of entries into the catalog (with Adil Hasan and Yoshimi Iida). • Host based access control micro-service: • User id, group id. • Hostname or range of IP adresses. • Improved firewall implemented at the iRODS level. • Load balancing and monitoring system: • At a given time, choose the least loaded server to put/get data or do any other operation. • Gather metrics on each server (CPU load, network activity, memory usage, swap, disk occupancy etc…) to choose the least loaded server. • Can also be used for monitoring purposes. Votre Nom / Titre de la présentation
Data transfer performance • Performance test was done by Yoshimi Iida while she was at CC-IN2P3 • Transfer speed using iRODS and bbftp were measured in 12 hours • iput/iget and bbcp are the names of corresponding commands • Both supports parallel transfer • iRODS transfer commands works better on congested network • Performance of GRIDftp will be compared
From KEK to Lyon • 1GB data transfer • TCP window size 4MB • number of parallel streams 16 • bbcp often failed to connect 12h 0h
From Lyon to KEK • 1GB data transfer • TCP window size 4MB • number of parallel streams 16 0 h 12h
iRODS at CC-IN2P3 • Transfer tests between KEK and Lyon (Yoshimi Iida): • 1 GB file transfer being done continuously over 12, 24, 48 hours. • Comparison with an other transfer protocol (bbcp). • Results look very promising. • But still needs to understand a couple of issue (asymetry of the performances between the 2 directions: KEK Lyon, Lyon KEK). • Observed with other protocols. • stress tests of the iCAT (database) with millions of entries into the catalog (Yoshimi Iida, Adil Hasan, J-Y Nief): • Doing several kind of queries. • Results are stable and good for iRODS. • Goal: go up to 10 millions of files. • Comparisons made with SRB: more performant. Votre Nom / Titre de la présentation
ILC/CALICE: The VO for Linear Collider Exp. • ILC/CALICE VO is supported at KEK • Since the end of 2006 • File sharing/transfer among DESY, IN2P3 and KEK over the VO • ILC • Number of cores: 32,793 • SPEC: 35,384 kSI2K • Storage: 68.4TB (12.6TB in use) • Members: 69 (4 from Japan) • Calice • Number of cores: 13,469 • SPEC: 15,140 kSI2K • Storage: 203TB (15.6TB in use) • Members: 52 (3 from Japan) • KEK offer small resource
Operation statistics in Last 2yrs ILC CALICE Number of Jobs: 145,776 579 of 145,776 has been processed at KEK-1/2 338,531 CPU time normalized by 1kSI2K (hrs*kSI2K) 1,061 of 338,531 has been used at KEK-1/2 • Number of Jobs: 150,269 • 955 of 150,269 has been processed at KEK-1/2 • 323,251 CPU time normalized by 1kSI2K (hrs*kSI2K) • 569 of 323,251 has been used at KEK-1/2
Prototype of GOC • Federated among major university groups and KEK in Japan. • Tohoku-U (KAMLAND, ILC) • Tsukuba-U (CDF, ATLAS) • Nagoya-U (BELLE, ATLAS) • Kobe-U (ILC, ATLAS) • Hiroshima-IT (ATLAS, Computing Science) • We have a common VO, but do NOT depend on scientific projects. • To test each site. • KEK assists their operation over the this VO • same motivation with ops VO Nagoya Univ. Tohoku Univ. Kobe Univ. Hiroshima IT KEK Univ. of Tsukuba Computing Resources over the PPJ VO • KEK behaves as the GOC • Remote installation • Monitoring • Software updates
Monitoring System for our GOC Summary view: Each site is iconified and shown their status as a few color, e.g., yellow show “warning”, red show “error”. The thickness and color of line indicates RTT and network status. • The monitoring system based on nagios and wiki • The monitoring portal creates a link automatically based on knowledgebase and navigates administratorsto appropriate troubleshooting page on wiki. Support system: consists of “monitoring system” and “knowledge DB” and “FAQ by wiki” Monitoring system: The site status is checked by a few simple jobs or commands, and is listed here. Link to FAQ is generated as to error description. Monitoring Portal Strongly inspired ASGC NAGIOS Monitoring Service maintained by Joanna Huang, APROC
Summary and conclusion • CC-IN2P3 and KEK are working jointly for GRID interoperability, iRODS development and LCG operation • We will continue the effort • We have learned that human network is much more important than computer network to operate GRID • FJPPL helped us to obtain good connection among people • Mutual trust is the key for GRID • VO is based on RO • Virtual Organization managed by good Real Organization works more efficiently