230 likes | 330 Views
Report on the INFN-GRID Globus evaluation. Massimo Sgaravatto INFN Padova for the INFN Globus group globus@infn.it. Why Globus ?. Some basic services (security, information services, resource management, …) must be deployed in order to implement and use a Grid for real applications
E N D
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group globus@infn.it
Why Globus ? • Some basic services (security, information services, resource management, …) must be deployed in order to implement and use a Grid for real applications • Globus identified as possible Grid framework providing these services • WP “Installation and Evaluation of the Globus Toolkit” of the INFN-GRID Project • Evaluation of the Globus toolkit (effectiveness, completeness, robustness, ease of use, …) • Provide feedback to the Globus team • Bringing attention to existing problems and requirements • Providing fixes to some problems
Globus activities within INFN • Activities driven by the following work plan • Evaluation of Globus security services • Evaluation of Grid Information Service • Evaluation of Globus services for resource management • Evaluation of Globus tools for data management • Evaluation of Globus HBM for fault monitoring • Evaluation of Globus GEM for execution environment management • Globus deployment and installation tools • Not only a simple evaluation • Some existing shortcomings addressed • Specific configurations and customizations implemented • INFN-GRID Globus evaluation activities performed between June 2000 and January 2001 • “Official” Globus 1.1.3 (1.1.4 for MPICH-G2) release tested
Globus security services • The Globus GSI security model seems to satisfy the HENP community current requirements on security • One time login mechanism • Use of X509 certificates • Possibility for extending relations of trust to multiple CA’s without having to interfere with their X.500 naming scheme • Some shortcomings • Need for limited (by scope or purpose) proxies • Globus team is already addressing this problem • Memory leaks in the GAA library • Fixed: patches provided by INFN • Cryptic diagnostics • Now partially solved with newer code • Interface between GSI and AFS • Already addressed with gsiklog • No tools for group management • Addressed with new CAS service
INFN customizations on security • INFN-CA • CRL distribution • Centralized management of the grid-mapfile • Goal: Ease the sharing of the same access policies (represented by the grid-mapfiles) for groups of hosts with common purposes • Proposed system • Central repository (LDAP server) to store user certificates (subjects) and to define groups of users • Certificates published by CA manager • Group manager responsible for editing group memberships (using a LDAP client) • Resource owners (Globus administrators) periodically (i.e. cron job) “connect” to this repository, “download” the subject of the certificates that meet a specified criterion (e.g. all users of group X), and produce grid-mapfile entries
Globus Information Services • INFN implemented a hierarchical structure of GIS based on geographical entities • Site GIIS’s • Local GRIS’s registered at the site GIIS • Root GIIS where local GIIS’s are registered
INFN GIS Topology Dc=infn,dc=it, o=grid Top Level INFN GIIS Dc=mi, Dc=infn, dc=it,o=grid Dc=pd,Dc=infn, dc=it,o=grid GIIS GIIS GRIS Padova Milano
GRIS GRIS GRIS GRIS GRIS GRIS GRIS GRIS GRIS root GIIS A global view 1st level query focus on a set of resources Scheduling/ Resource discovery High Availability ldbm backend (?) GIIS replication (?) GIIS 2nd and 3rd level query Get more updated info GIIS ……..
Globus Information Services • Problems • Performance • Querying the root GIIS server, on the worst case the whole namespace must be searched • The overall response time is limited by the slowest response of a descendant • Poor GRIS performance (shell backend) • Example (querying a site GIIS): • ~ 1 sec. When cache is on • ~ 5-10 sec. When cache expired and GIIS and GRIS not busy • > 1 min. when cache expired and GRIS busy
Globus Information Services • Other problems • Pull model • Mixed push/pull model more suitable • Security and access controls • Any GRIS can register itself to a GIIS • No access control when searching the GIS • Fault tolerance • No automatic failover mechanisms
Globus Information Services • Most of these problems already addressed or are being addressed with the new MDS development • Improved GRIS performance • Improved GIIS performance (e.g. support for GIIS timeouts) • Integration of GSI security and access control • Support for customized indexes • Support for pluggable information providers • Support for both registration and invitation • …
Globus Information Services • Other INFN customisations • INFN-GIS browser • Tools (MRTG based) to monitor LDAP servers • Entries returned • Connections
Resource Management • Evaluation of Globus GRAM • Focus on possible use of GRAM as uniform interface to different underlying local resource management systems • Tests with Condor, LSF and PBS as LRMS • INFN WAN Condor pool as Globus resource • The model is fine, but lack of “robustness” (needed for real production environments) • Memory leaks in the Globus job manager • Fixed: patches provided by our group were fed back to Globus • Scalability (one job manager for each job) • Reliability (the job manager is not persistent) • Addressed with the new jobmanager (by Condor team) • New resource management architecture foreseen with GRAM-2
Resource Management • Default GRAM Reporter (Information providers) not enough for our needs (in particular considering PC farms): • Many useless attributes (at least for our needs), attributes not calculated (always defined as 0), some attributes not properly calculated, important information (e.g. needed by a resource broker) missing • We are addressing this problem in the context of the DataGrid Project • Submission of Condor jobs to Globus resources • Condor-G • Useful as a reliable job submission service • Persistent queue of jobs • Logging information • Exploitation of the new persistent Globus jobmanager • Reliable (two phase commit) submission protocol • GlideIn • Evaluation of MPICH-G2 vs. MPICH • Some shortcomings found (lack of support for shared memory, worse latency performance for small messages wrt. MPICH)
Data management • Tests with GASS • Tests with GridFTP alpha release 2 • Capability of resuming an interrupted file transfer successfully tested • Support for the GSI authentication mechanisms successfully tested • Throughput tests • Increasing number of parallel streams and fixed file size • Increasing file size and fixed number of streams • Increasing TCP buffer size • Increasing block size
Other services • Fault Monitoring (HBM) • Evaluation of HBM for fault detection (for “system” and “user” processes) • … but the HBM package is not seeing active development • Execution Environment Management (GEM) • Evaluation of GEM as service for “code migration” • … but Globus now provides only limited capabilities (executable staging)
Globus installation tools • INFN-GRID Globus installation toolkit • To make easier and more “automatic” the installation of the Globus toolkit • To shorten the installation time (very long using the standard install procedures) • Support for specific customisations and configurations • Quick distribution of patches • Support for distribution of new tools and packages • Proven to be successful • Used to setup a INFN GRID Testbed and also outside (CERN, FNAL, …) • Used as installation tool for DataGrid Testbed 0
INFN-GRID Installation toolkit • Characteristics • Distribution of binary files • Distribution of the packages needed to install/use Globus • Distribution of various Globus flavoured compilations (kerberos, MPICH, AFS) • Support for the most used platforms in the HENP community (Linux RH, Solaris) • Binary file relocation supported • Latest patches included (e.g. fixes for Globus jobmanager memory leaks) • Support for local customisations (hook to support different CA’s, support for different GIS configurations, support for different LRMS,…) • Support for distribution of new tools and packages (certretrieve, GDMP, …) • Upgrade and uninstall procedures • Documentation
New Globus packaging • Modular packages for individual components • More open development process • Possibility to build and install only desired packages • Simpler customization • Contributions from INFN included
Conclusions • The Globus toolkit can provide basic services useful to create and deploy usable Grids, but various shortcomings and issues must be addressed • Globus developers already addressed/ing most of them • Other info • Report on the INFN-GRID Globus Evaluation • http://www.infn.it/globus/Docs/infn-globus-evaluation.pdf • Response from Globus team to “Report on the INFN-GRID Globus Evaluation” • http://www.isi.edu/~annc/infn/responsetoinfn.pdf • http://www.infn.it/globus