590 likes | 726 Views
Introduction to the current EDG Testbed Software Krak ó w, December 2002. Steve Fisher s.m.fisher@rl.ac.uk – RAL on behalf of The European DataGrid Project Team http://www.edg.org/. The European DataGrid. Funded by the European Union Jan 1, 2001 - Dec 31, 2003
E N D
Introduction to the current EDG Testbed SoftwareKraków, December 2002 Steve Fisher s.m.fisher@rl.ac.uk – RAL on behalf of The European DataGrid Project Team http://www.edg.org/
The European DataGrid • Funded by the European Union • Jan 1, 2001 - Dec 31, 2003 • Develop, implement and exploit a large-scale data and CPU-oriented computational GRID. • Develop middleware, in collaboration with some of the leading centres of competence in GRID technology. • Complement, and help to coordinate at a European level, several on-going national GRID projects.
The EDG Main Partners • CERN – International (Switzerland/France) • CNRS - France • ESA/ESRIN – International (Italy) • INFN - Italy • NIKHEF – The Netherlands • PPARC - UK
EDG Assistant Partners Industrial Partners • Datamat (Italy) • IBM-UK (UK) • CS-SI (France) Research and Academic Institutes • CESNET (Czech Republic) • Commissariat à l'énergie atomique (CEA) – France • Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI) • Consiglio Nazionale delle Ricerche (Italy) • Helsinki Institute of Physics – Finland • Institut de Fisica d'Altes Energies (IFAE) - Spain • Istituto Trentino di Cultura (IRST) – Italy • Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany • Royal Netherlands Meteorological Institute (KNMI) • Ruprecht-Karls-Universität Heidelberg - Germany • Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands • Swedish Research Council - Sweden
} Applications EDG structure: work packages • WP1: Work Load Management System • WP2: Data Management • WP3: Information and Monitoring • WP4: Fabric Management • WP5: Storage Element • WP6: Testbed and demonstrators • WP7: Network Monitoring • WP8: High Energy Physics • WP9: Earth Observation • WP10: Biology • WP11: Dissemination • WP12: Management
Testbed1 EDG sites Reference site: CERN Current EDG Testbed NorduGrid: • Bergen • Copenhagen • Helsinki • Lund • Oslo • Stockholm • Uppsala NorduGrid Manchester NIKHEF RAL Italy: • Bologna • Cagliari • Catania • Milano • Napoli • Padova • Parma • Pisa • Roma • Torino Karlsruhe CERN Lyon Barcelona Madrid Lisboa
Security: Authentication/Authorization • Authentication • Who you are • users identified by certificates signed by a CA • Authorization • What you are allowed to do • based on membership of Virtual Organizations (VO).
grid-cert-request cert-request Certificate Request user once in every two-three years VO
Requesting a Certificate • grid-cert-request A certificate request and private key is being created. [...] Using configuration from /usr/local/grid/globus/etc/globus-user-ssleay.conf Generating a 1024 bit RSA private key [...] A private key and a certificate request has been generated with the subject: /O=Grid/O=CERN/OU=cern.ch/CN=Akos Frohner [...] Your private key is stored in .../.globus/userkey.pem Your request is stored in .../.globus/usercert_request.pem Please e-mail the certificate request to the CERN CA cat .../.globus/usercert_request.pem | mail cern-globus-ca@cern.ch Your certificate will be mailed to you within two working days.
grid-cert-request cert signing cert-request certificate Certificate Signing CA user
Registration/Authorization User registration in an EDG Virtual Organisation • convert your certificate: • openssl pkcs12 –export –in ~/.globus/usercert.pem –inkey ~/.globus/userkey.pem –out user.p12 –name ’Joe Smith’ • import your certificate in your browser • sign the usage guidelines: https://marianne.in2p3.fr/cgi-bin/datagrid/register/account.pl • ask an account from your VO administrator by email -> You are registered in the VO server and have a user account.
convert cert.pkcs12 registration Registration user certificate Account Registration VO once for the lifetime of the VO – you may change the certificate keys! Usage guidelines
proxy-cert grid-proxy-init Starting a Session user certificate cert.pkcs12 every 12/24 hours
Usage You must have a valid certificate from a trusted CA! • “login”: grid-proxy-init short lifetime certificate: 24 hours Enter PEM pass phrase: ...........................+++++ ....................................+++++ • checking the proxy: grid-proxy-info -subject /O=Grid/O=CERN/OU=cern.ch/CN=Akos Frohner/CN=proxy -> use the grid services • “logout”: grid-proxy-destroy
Configuration on the Server CA grid-cert-request cert signing service host-request cert/crl update host-cert ca-certificate crl crl automatically updated periodically
Authorization Information service host-cert ca-certificates crls VO-server gridmap mkgridmap automatically updated periodically
Using a Service service user host-cert certificate ca-certificates cert.pkcs12 crls gridmap proxy-cert grid-proxy-init host/proxy certs exchanged
User Interface (UI) Resource Broker (RB) Information Service (IS) Computing Element (CE) Gatekeeper (Front-end Node) Worker Nodes (WN) Storage Element (SE) Replica Catalog (RC) EDG Logical Machine Types
Information Systems overview • The aim of the Information and Monitoring Service is to deliver a flexible infrastructure that provides information on • the EU DataGrid itself • grid applications • EDG info systems are based upon Globus MDS (Metacomputing Directory Service or Monitoring and Discovery Service as it is now called) • Based on OpenLDAP, a hierarchical database • The information system is currently used mainly by the middleware. • You can use it to find out what is going on
LDAP attributes • A schema describes the attributes and the types of the attributes associated with data objects • Example - some attributes of SiteInfo: • siteName: RALDEV • sysAdminContact: grid.sysadmin@rl.ac.uk • userSupportContact: grid.support@rl.ac.uk • siteSecurityContact: grid.security@rl.ac.uk • dataGridVersion: 1.2 • InstallationDate: 20020704142800Z
LDAP hierarchy • Lightweight Directory Assess Protocol (LDAP) offers a hierarchical view of information • The objects are arranged in a Directory Information Tree (DIT) • One or more attributes represent the Relative Distinguished Name (RDN) • An object is identified by its Distinguished name • This is its RDN with the Distinguished name of its parent
RDN SE seId=dev02.hepgrid.clrc.ac.uk Protocols seProtocol=gridftp seProtocol=rfio seProtocol=file DN Site Mds-Vo-name=ral-dev,Mds-Vo-name=uk,o=Grid SE seId=dev02.hepgrid.clrc.ac.uk,Mds-Vo-name=ral-dev,Mds-Vo-name=uk,o=Grid Protocols seProtocol=gridftp, seId=dev02.hepgrid.clrc.ac.uk,Mds-Vo-name=ral-dev,Mds-Vo-name=uk,o=Grid seProtocol=rfio, seId=dev02.hepgrid.clrc.ac.uk,Mds-Vo-name=ral-dev,Mds-Vo-name=uk,o=Grid seProtocol=file, seId=dev02.hepgrid.clrc.ac.uk,Mds-Vo-name=ral-dev,Mds-Vo-name=uk,o=Grid site SE supported protocols RDNs and DNs
MDS GRISs & GIISs • Information providers are scripts which when invoked by the LDAP server make available the desired information • Information is cached by the server to improve performance • Within MDS the EDG information providers are invoked by a local LDAP server, the Grid Resource Information Server (GRIS) • “Aggregate directories”, Grid Information Index Servers (GIIS), are used to group resources • The GRISs use soft state registration to register with one or more GIISs • The GIIS can then act as a single point of contact for a number of resources • A GIIS may represent a site, country, virtual organization, etc. • In turn a GIIS may register with another GIIS
EDG Information Providers & the Directory Information Tree site computing element storage element site information network information between this and other sites status file statistics supported protocols storage elements that are close (not necessarily at the same site)
Data GRID datagrid countryA countryB siteA siteB siteC siteD information providers information providers information providers information providers EDG GRIS/GIIS Hierarchy • There is a top level datagrid GIIS to which all of the country GIISs register • Each country has a GIIS to which all of the site GIISs register • Each Site has a Grid Information Index Server (GIIS) which acts as a single point of contact for all of the sites resources. The GRISs register with their site GIIS • Information providers publish information to a local LDAP server known as a Grid Resource Information Server (GRIS)
EDG Information Providers • The EDG have produced information providers: • Site information • The Computing Element • The Storage Element • Network Monitoring • All of the EDG data objects are dynamic, they have a time stamp and a time to live (used by the cache mechanism) associated with them
in=siteinfo,Mds-Vo-name=ral-dev,Mds-Vo-name=uk,o=Grid objectClass: SiteInfo objectClass: DataGridTop objectClass: DynamicObject siteName: RALDEV sysAdminContact: grid.sysadmin@rl.ac.uk userSupportContact: grid.support@rl.ac.uk siteSecurityContact: grid.security@rl.ac.uk dataGridVersion: 1.2 installationDate: 20020704142800Z Siteinfo
ceId=dev01.hepgrid.clrc.ac.uk:2119/jobmanager-pbs-M,hn=dev01.hepgrid.clrc.ac.uk,Mds-Vo-name=ral-dev,Mds-Vo-name=uk,o=GridceId=dev01.hepgrid.clrc.ac.uk:2119/jobmanager-pbs-M,hn=dev01.hepgrid.clrc.ac.uk,Mds-Vo-name=ral-dev,Mds-Vo-name=uk,o=Grid objectClass: DataGridTop objectClass: ComputingElement CEId: dev01.hepgrid.clrc.ac.uk:2119/jobmanager-pbs-M GlobusResourceContactString:dev01.hepgrid.clrc.ac.uk:2119/jobmanager-pbs:/O=Grid/O=UKHEP/CN=dev01.hepgrid.clrc.ac.uk GRAMVersion: ? Architecture: intel OpSys: RH 6.2 MinPhysicalMemory: 258 MinLocalDiskSpace: 2048 TotalCPUs: 1 FreeCPUs: 1 NumSMPs: 0 MinSPUProcessors: 0 MaxSPUProcessors: 0 TotalJobs: 0 RunningJobs: 0 IdleJobs: 0 MaxTotalJobs: 1 MaxRunningJobs: 1 WorstTraversalTime: 108000 EstimatedTraversalTime: 0 Active: TRUE Priority: 20 MaxCPUTime: 108000 MaxWallClockTime: 432000 AverageSI00: 300 MinSI00: 300 MaxSI00: 300 AuthorizedUser:/O=Grid/O=UKHEP/OU=hepgrid.clrc.ac.uk/CN=Tim Eves AuthorizedUser:/O=Grid/O=UKHEP/OU=hepgrid.clrc.ac.uk/CN=Tim Folkes RunTimeEnvironment: RALDEV AFSAvailable: FALSE OutboundIP: TRUE InboundIP: FALSE QueueName: M LRMSType: PBS LRMSVersion: OpenPBS_2.3 Computing Element
$ldapsearch\ -x\ -H ldap://lxshare0225.cern.ch:2135\ -b 'Mds-Vo-name=datagrid,o=grid\ 'objectclass=ComputingElment‘\ CEId FreeCPUs \ -s base|one|sub “simple” authentication uniform resource identifier base distinguished name for search filter attributes to be returned scope of the search specifying just the base object, one-level or the complete subtree Querying the Information & Monitoring Service • Queries can be posed to the current Information and Monitoring Service using LDAP search commands • An LDAP search consists of the following components
Data GRID Mds-Vo-name =datagrid Mds-Vo-name =countryA Mds-Vo-name =countryB Mds-Vo-name =siteA Mds-Vo-name =siteB Mds-Vo-name =siteC Mds-Vo-name =siteD Querying the GRIS/GIIS Hierarchy • Mds-Vo-name=datagrid,o=grid • This will look at all the data • Mds-Vo-name=siteB, Mds-Vo-name=countryA,Mds-Vo-name=datagrid,o=grid • This will look at all the data from siteB • Mds-Vo-name=countryA,o=grid • This will look at all the data from countryA • Mds-Vo-name=siteB,Mds-Vo-name=countryA,o=grid • This will look at all the data from siteB • Mds-Vo-name=siteB,o=grid • This will look at all the data from siteB
The EDG WMS • The user interacts with GRID via a Workload Management System • The Goal of WMS is the distributed scheduling and resource management in a GRID environment. • What does it allow GRID users to do? To submit their jobs To execute them To get information about their status To retrieve their output • The WMS tries to optimize the usage of resources
WMS Components • WMS is currently composed of the following parts: • User Interface (UI) : access point for the user to the GRID • Resource Broker (RB) : the broker of GRID resources, performing the match-making • Job Submission System (JSS) : provides a reliable submission system • Information Index (II) : a specialized Globus GIIS (LDAP server) used by the Resource Broker as a filter to the information service (IS) to select resources • Logging and Bookkeeping services (LB) : store Job Info available for users to query
WMS UI Commands • dg-job-submit submits a job • dg-job-list-match lists resources matching a job description • dg-job-cancel cancels a given job • dg-job-status displays the status of the job (submitted, waiting, ready, scheduled, running, chkpt, done, outputready, aborted, cleared) • dg-job-get-output returns the job-output to the user • dg-job-get-logging-info displays logging information about submitted jobs • dg-job-id-info is a utility for the user to display job info in a formatted style
Example of UI Command Options • dg-job-submit –r<res_id>–n<user e-mail address>-c<config file>-o<output file><job.jdl> -r the job is submitted by the RB directly to the computing element identified by <res_id> -n an e-mail message containing basic information regarding the job (status and identification) is sent to the specified <e-mail address> when the job enters one of the following status: DONE or ABORTED READY RUNNING -c the configuration file <config file> is pointed by the UI instead of the standard configuration file -o the generated dg_jobId is written in the <output file> • dg-job-status –i<input file> (or dg_jobId) -i the bookkeeping information about dg_jobId contained in the <input file> are displayed
Job Description Language (JDL) • Mandatory for every single JDL file: • Executable (contains the command name) • Other attributes: • InputSandbox • OutputSandbox • Mandatory for JDL file dealing with Data Management: • ReplicaCatalog (contains the Replica Catalog Identifier) • DataAccessProtocol (contains the protocol or the list of protocols which the application is able to speak with for accessing InputData on a given SE) If InputData contains at least one PFN and no LFNs, only DataAccessProtocolis mandatory. If InputData contains at least one LFN, both ReplicaCatalog and DataAccessProtocol are mandatory.
Example JDL File Executable = “gridTest”; InputData = “LF:testbed0-00019”; ReplicaCatalog = “ldap://sunlab2g.cnaf.infn.it:2010/ \ rc=WP2 INFN Test, dc=infn, dc=it”; DataAccessProtocol = “gridftp”; StdError = “stderr.log”; StdOutput = “stdout.log”; OutputSandbox = {“stderr.log”, “stdout.log”}; InputSandbox = {“home/joda/test/gridTest”}; Rank = “other.MaxCpuTime”; Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4;
UI JDL A Job Submission Example Replica Catalogue (RC) Information Service (IS) Resource Broker (RB) Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element CE)
Input Sandbox UI JDL Job Submit Event Job Status A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) Resource Broker (RB) Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element (CE)
waiting UI JDL Job Status A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) Resource Broker (RB) Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element (CE)
UI JDL ready Job Status A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) waiting Resource Broker (RB) Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element (CE)
UI JDL scheduled BrokerInfo Job Status A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) waiting ready Resource Broker (RB) Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element (CE)
UI JDL Input Sandbox running Job Status A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) waiting ready scheduled Resource Broker (RB) Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element (CE)
UI JDL running Job Status Job Status A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) waiting ready scheduled Resource Broker (RB) Storage Element (SE) Logging & Book-keeping (LB) Job Submission Service (JSS) Compute Element (CE)
UI JDL done Job Status Job Status A Job Submission Example submitted Replica Catalogue Information Service waiting ready scheduled Resource Broker running Storage Element Logging & Book-keeping Job Submission Service Compute Element
UI JDL outputready Output Sandbox Job Status Job Status A Job Submission Example submitted Replica Catalogue Information Service waiting ready scheduled Resource Broker running Storage Element done Logging & Book-keeping Job Submission Service Compute Element
UI JDL Output Sandbox cleared Job Status A Job Submission Example submitted Replica Catalogue (RC) Information Service (IS) waiting ready scheduled Resource Broker (RB) running Storage Element (SE) done Logging & Book-keeping (LB) Job Submission Service (JS) outputready Compute Element (CE)
EDG Data Management Tools • Tools for • Locating data • Copying data • Managing and replicating data • Meta Data management • On EDG Testbed you have • EDG Replica Catalog • globus-url-copy (GridFTP) • EDG Replica Manager • Grid Data Mirroring Package (GDMP)
EDG Replica Catalog • Based upon the Globus LDAP Replica Catalog (will be replaced by RLS) • Stores LFN/PFN mappings and additional information (e.g. filesize): • Physical File Name (PFN): host + full path & and file name • Logical File Name (LFN): logical name that may be resolved to PFNs • LFN : PFN = 1 : n • Only files on storage elements may be registered • Each VO has a specific storage dir on an SE • Example PFN: lxshare0222.cern.ch/flatfiles/SE1/iteam/file1.dat host storage dir • LFN must be full path of file starting from storage dirLFN of above PFN: file1.dat
EDG Replica Catalog • API and command line tools • addLogicalFileName • getLogicalFileName • deleteLogicalFileName • getPhysicalFileName • addPhysicalFileName • deletePhysicalFileName • addLogicalFileAttribute • getLogicalFileAttribute • deleteLogicalFileAttribute http://cmsdoc.cern.ch/cms/grid/userguide/gdmp-3-0/node85.html
globus-url-copy • Low level tool for secure copying globus-url-copy <protocol>://<source file> \ <protocol>://<destination file> • Main Protocols: • gsiftp – for secure transfer, only available on SE and CE • file – for accessing files stored on the local file system on e.g. UI, WN globus-url-copy file://`pwd`/file1.dat \ gsiftp://lxshare0222.cern.ch/ \ flatfiles/SE1/EDGTutorial/file1.dat