300 likes | 417 Views
GRID Implementation and Requirements. F. Semeria INFN-Bologna HEPix/HEPnt LAL, 23 April 2001. Introduction. In a distributed environment like a Grid, one of the primary needs is to collect and retrieve resource information. The Grid Information Service (GIS)
E N D
GRID Implementation and Requirements F. Semeria INFN-Bologna HEPix/HEPnt LAL, 23 April 2001
Introduction • In a distributed environment like a Grid, one of the primary needs is to collect and retrieve resource information. • The Grid Information Service (GIS) is the way of making information available to Grid applications.
GIS=GRIS+GIIS • Globus implements the GIS by using two kinds of LDAP (v2) servers: • GRIS (Grid Resource Information Service) runs on each resource (machine). It uses an LDAP shell backend to gather the resource configuration and status. It registers itself to a GIIS providing info about itself.
GIS=GRIS+GIIS (cont.) • GIIS (Grid Index Information Service) is a LDAP sever that usually runs on few machines per organization and is a search engine for the GRIS’s registered under it.
Why LDAP? • PROs: • it is a standard way to describe and collect data. • it provides an effective distributed model for the data. • CONs: • directories are designed more for reading than for writing. Good for address book or NIS, but not for storing dynamic data like the CPU load of a machine.
General implementation • The proposed implementation of the Information Service is to have an hierarchical structure of servers (GIIS’s) having a root server at CERN.
General implementation (cont.) • Each organization has its top level GIIS registered on the root server, but can choose its own low level topology
EU GIIS (Cern) o=grid INFN (Italy) dc=infn,dc=it,o=grid France GIIS dc=fr,o=grid LIP (Portugal) dc=lip,dc=pt,o=grid ... ... IN2P3 (France) dc=in2p3,dc=fr,o=grid ... ... ... ...
INFN implementation • INFN has implemented a hierarchical structure of GIIS based on INFN departments (about 25) • Each GRIS registers itself to the site GIIS which in turn registers itself to the top level INFN GIIS
Top level GIIS dc=infn,dc=it,o=grid GIIS Bologna GIIS Milano GRIS GRIS dc=bo,dc=infn,dc=it,o=grid dc=mi,dc=infn,dc=it,o=grid
INFN top level GIIS • 11 GIIS’s registered • More than 40 GRIS’s
http://bond.cnaf.infn.it/ cgi-bin/mdsbrowse1.pl
GIS Requirements • Each experiment needs to be able to select its own set of machines (with its own name space) • We need more attributes to describe the status of jobs and machines. • Data replication for failure recovery and mirroring
Experiments resources • Each GRIS can register itself to several GIIS’s. • This allows repartitioning of resources by experiment.
Top level INFN GIIS dc=infn,dc=it,o=grid EU CMS GIIS ou=cms,o=grid GIIS Bologna GIIS Milano GRIS GRIS
Jobs and machines info • The underlying resource management systems, like Condor,LSF,PBS, provide useful information about machines and jobs that should be published in the GIS.
Examples of jobs info • job id • current status of the job • the size of the executable • the name of the user • the submitting and the executing host • why the job is not running • etc.
Example of machines info • the total and available physical memory and swap space • the speed of the machine in MIPS • the number of CPUs • the CPU load average • etc.
Extending the GRIS • The GRIS uses programs called information providers to collect information from the machines. • The requirements for an information provider are: • the program must emit LDIF objects to stdout • the object generated must respect the GLOBUS schema
Data flow • Information are not pushed periodically from a GRIS to a GIIS, but is the GIIS that queries the GRIS’s when an application needs information.
query application GIIS query GRIS
Caching • Information are stored in cache for a period of time (TTL=Time To Live). • Higher the level of the GIIS higher the TTL, lower the details.
query application GIIS query GRIS cache not expired cache expired
Performance • In the worst case the whole set of machines must be queried. • Some indexing techniques should be used to implement a search space pruning. • Also a periodicinformation update mechanism can be investigated.
Some tests • We have tested the performance dependency from caching and cpu load. • Test have been made on WAN. • The same queries on a GIIS take < 1 sec. when cache is on and > 10 sec. when off
Some tests (cont.) • When a GRIS has a loaded CPU the response time from its own GIIS is much higher when cache is expired (> 1 min. vs 1 sec.) • Also when a GIIS has a loaded CPU and the cache is not expired the response time is higher (6-7 sec.): better do not use a GIIS for computation!
Security and access policies • In the current implementation any machine can register itself to a GIIS • No access control when searching the GIIS. From any LDAP client I can: ldapsearch –p 389 –h mds.infn.it –b “o=grid” –s sub “*=*” and get all the information from the GIIS
Conclusions • The Globus Information Service is based on a standard protocol (LDAP). • It provides flexibility and a potentially good distributed data model. • But...
Conclusions (cont.) • A good topology for the HEP experiments must be still implemented. • The GRIS must be extended with new information providers. • Lack of data replication. • Some new mechanism should be introduced to improve performance and security.