E N D
Put your hands on gLite
gLite << gLite (pronounced "gee-lite") is the next generation middleware for grid computing. Born from the collaborative efforts of more than 80 people in 12 different academic and industrial research centres as part of the EGEE Project, gLite provides a bleeding-edge, best-of-breed framework for building grid applications tapping into the power of distributed computing and storage resources across the Internet. >> see http://glite.web.cern.ch/glite/
Before the party begins… Before you can use the grid, you must have a personal certificate: it should be stored on your UI under $HOME/.globus like this: CB:~ gnegri$ ls -l .globus total 0 -rw-r--r-- 1 gnegri staff 0 Sep 18 23:26 usercert.pem -r-------- 1 gnegri staff 0 Sep 18 23:26 userkey.pem The user certificate is not directly used: in order to raise the security level, a time-limited proxy is required. There are basically 2 commands for creating a proxy: grid-proxy-init <options> voms-proxy-init <options> If given without options, voms-proxy-init does exactly the same as grid-proxy-init. Giving also a -voms <VO> option, you create a proxy with attributes read from a VOMS server (your proxy will tell the grid which VO you belong to, which privileges you have and your priority level on some resources… definitely, not all grid users are equal!) The proxy is stored on the UI, under /tmp, with name x509_u<local_id>
Before the party begins… Creating a proxy (either with grid-proxy-init or voms-proxy-init) is usually enough to run most of gLite commands on the grid, but it may be not enough for submitting a job! Job submission (passing the job from the local UI to the RB) can be done in two different ways: glite-job-submit sends the job to the Network Server glite-wms-job-submit sends the job to the WMProxy The main difference is that the NS is a socket, while the WMProxy is a web services interface, allowing some more flexibility and powerful features such as the “bulk submission” (jobs are sent in a collection, possibly sharing their InputSandbox). The glite-job-submit command only needs a valid proxy on your UI, while the glite-wms-job-submit also requires a delegationID stored on the WMProxy server. The command to do this is glite-wms-job-delegate-proxy –d <delegationID> where delegationID is a user-defined string that will be used when submitting the job (the option is mandatory) glite-wms-job-submit –delegationid <delegationID> YourJob.jdl
gLite elements • The gLite middleware is deployed through different elements: • UI - User Interface • RB - Resource Broker • LB - Logging & Bookkeeping • CE - Computing Element • WN - Worker Node • SE- Storage Element • BDII - Information Indexe • LFC - LCG File Catalog • plus some other “service” elements
gLite: job workflow RB: the heart of the grid. Sends the jobs on the grid and keeps track of them BDII: LDAP database with info on LCG resources UI: local machine on which the user defines his jobs. All commands to the grid are issued from a UI LB: a SQL database in which each changing of status of a job is registered CE: the server of a LRMS (LSF, PBS, Torque…) LFC: files stored on a SE are registered in the catalog SE: output files are written on storage resources throughout the grid WN: CPUs that actually execute the jobs
gLite: job workflow • The user defines his job on his User Interface by writing a JDL (see next slide). • The JDL is submitted to the Resource Broker. • From now on, the RB notifies the L&B about every change in status of the job. • The RB parses the JDL and queries the BDII in order to find the best CE matching the job requirements. • The RB sends the job to the Computing Element proposed by the BDII. • The CE submits the job and sends it to one of the underlying Worker Nodes. • Usually, at the end a job writes its output files to a Storage Element and, if the operation is successful, it registers them in the LFC catalog, so that they’ll be available to all grid users. • The log files are usually sent back to the RB and then to the UI, so that the user can check that the job has really run as expected.
Defining a job A job is an executable that will run on a grid resource. In order to specify the executable (a simple command or a script), its arguments and its requirements, you have to write a JDL file. JDL (Job Description Language) is a high-level language based on Class Advertisement (ClassAd) Language used to describe the job’s characteristics and constraints. The JDL file consists of lines of the form attribute = expression; For example: Executable = “/bin/echo”; For a full list of the attributes of the gLite JDL, please refere to gLite documentation
HelloWorld.jdl This is the (almost) simplest JDL possible [ Executable = “/bin/echo”; Arguments = “Hello World!”; StdOutput = “HelloWorld.out”; StdError = “HelloWorld.err”; OutputSandbox = {“HelloWorld.out”,” HelloWorld.err”}; VirtualOrganisation = “atlas”; ] Note that the attribute VirtualOrganisation is not necessary if you issued a voms-proxy-init –voms <VO> You may submit it with glite-wms-job-submit –delegationid <delegateID> HelloWorld.jdl When submitted, the RB returns a job unique identifier, the JobID, in the form https://<RB_name>:9000/<unique_string>
HelloWorld.jdl To get the status of the job, you pass its JobID to the command glite-wms-job-status > glite-wms-job-status https://egee-rb-01.mi.infn.it:9000/BgWNAqxr_Vo1sNZu6uuXow ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://egee-rb-01.mi.infn.it:9000/BgWNAqxr_Vo1sNZu6uuXow Current Status: Waiting Submitted: Tue Sep 19 15:03:57 2006 CEST ************************************************************* • Possible job status are: • Submitted: job is entered by the user to the UI but not yet transferred to NS or WMP • Waiting: job has been accepted by the NS or WMP but not yet processed • Ready: job has been processed (matchmaking) but not yet transferred to the CE • Scheduled: job is waiting in the queue of the CE • Running: job is running on a WN • Done: job exited or it’s considered in a terminal state by CondorC • Aborted: job processing was aborted by WMS • Canceled: job has been canceled on user request • Cleared: output of the job has been retrieved after job successful conclusion
HelloWorld.jdl..?! > glite-wms-job-status https://egee-rb-01.mi.infn.it:9000/BgWNAqxr_Vo1sNZu6uuXow ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://egee-rb-01.mi.infn.it:9000/BgWNAqxr_Vo1sNZu6uuXow Current Status: Aborted Logged Reason(s): - File not available.Cannot read JobWrapper output, both from Condor and from Maradona. - Job got an error while in the CondorG queue. - Job got an error while in the CondorG queue. - Job got an error while in the CondorG queue. Status Reason: hit job shallow retry count (3) Destination: cmsitbsrv01.fnal.gov:2119/jobmanager-condor-atlas Submitted: Tue Sep 19 15:03:57 2006 CEST ************************************************************* What happened? We see from the output of the status command that the job tried to run on a machine in the US, at FNAL, which has condor as LRMS (not supported by LCG…). Uhm… FNAL… Shouldn’t it be in OSG??
Investigations • How do we get info on a site? • There are 3 ways: • the LCG command lcg-infosites • the LCG command lcg-info • an LDAP query to the BDII Geeks prefer the third one! All you need is the name of a BDII. The BDII is a LDAP database in which informations of grid sites are collected and organized in a hierarchical schema named GlueSchema (its structure is well described in gLite user guide, available at gLite documentation web page). If we want to use the BDII named exp-bdii.cern.ch, keeping in mind that it responds on port 2170 and has the base "mds-vo-name=local,o=grid", we may write > ldapsearch -x -h exp-bdii.cern.ch -p 2170 -b "mds-vo-name=local,o=grid" This will print out lots of informations about all the sites “registered” in it. Looking for the CE we’re investigating on, cmsitbsrv01.fnal.gov:2119/jobmanager-condor-atlas, we would find that it’s actually published as a OSG Site and with: GlueCEInfoLRMSType: condor
Investigations Before solving the problem, another tip: in this case, the status command already offered good hints of the cause of the abortion. Anyway, it’s often necessary to go a bit deeper in the job history. Any information about a job is stored in an almost persistent way in the LB, the Logging&Bookkeeping, which is accessed through the command glite-wms-job-logging-info [options] <JobID> among the options, the verbosity may be tuned from 0 (only the status are reported) to 2 (damn verbose!) with -v <0|1|2> > glite-wms-job-logging-info -v 2 https://egee-rb-b1.mi.infn.it:9000/BgWNAqxr_Vo1sNZu6uuXow
Requirements How can we exclude sites with condor queues from our list of possible Ces? The glite JDL has an attribute named Requirements that perfectly fits! The Requirements attribute will tell the RB to choose only sites satisfying some constraints the job imposes in order to run properly. Note that the Requirements attribute can have only one value, not a list, so if you have more than one requirement you have to “concatenate” them using boolean operators (&&, ||, <, !=,…). In our case the simplest Requirements would be • Requirements = other.GlueCEInfoLRMSType!="condor"; You may construct any expression you may need using the GlueSchema attributes of the CE (and SE). As an example, suppose that your job has to run for approximately 1 day on a generic grid WN and needs a certain ATLAS sw version, let’s say 11.0.42. Then you should add to your JDL file the following line: • Requirements = (other.GlueCEMaxWallClockTime > 86400) && • Member(“VO-atlas-release-11.0.42”,other.GlueHostApplicationSoftwareRunTimeEnvironment); Note that Member is a function of gLite jdl ClassAd
Who’s first? Before trying to submit the HelloWorld.jdl with its brand new Requirements attribute, let’s introduce another attribute that plays an important role in the matchmaking: the Rank. You may use the Rank to order the list of matching CE by certain characteristics that may affect your jobs. Like the Requirements, also the Rank is usually constructed with GlueSchema attributes. As an example, you’d prefere that your job be sent to sites with a higher number of free CPUs, so that you’d be sure that it will not be queued in already trafficked sites. Then you will add to your JDL • Rank = other.GlueCEStateFreeCPUs; The Rank is a floating point number and is ordered from the higher value to the least. The CE with the highest value will receive the job. To see the CE matching your job in a Rank-ordered list you may issue • > glite-wms-job-list-match [--rank] your.jdl
Back to HelloWorld Now our HelloWorld.jdl looks like this [ Executable = "/bin/echo"; Arguments = "Hello World!"; StdOutput = "HelloWorld.out"; StdError = "HelloWorld.err"; OutputSandbox = {"HelloWorld.out","HelloWorld.err"}; Requirements = other.GlueCEInfoLRMSType!="condor"; ] • > glite-wms-job-submit --delegationid guidone HelloWorld.jdl > glite-wms-job-status https://egee-rb-01.mi.infn.it:9000/HyUGIcK5n6JdobvQU1kdFw ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://egee-rb-01.mi.infn.it:9000/HyUGIcK5n6JdobvQU1kdFw Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas Submitted: Tue Sep 19 16:03:29 2006 CEST *************************************************************
Back to HelloWorld Once a job finishes, you can retrieve the output files specified in the OutputSandbox back to your local UI: > glite-wms-job-output https://egee-rb-01.mi.infn.it:9000/HyUGIcK5n6JdobvQU1kdFw Connecting to the service https://193.205.78.5:7443/glite_wms_wmproxy_server ================================================================================ JOB GET OUTPUT OUTCOME Output sandbox files for the job: https://egee-rb-01.mi.infn.it:9000/HyUGIcK5n6JdobvQU1kdFw have been successfully retrieved and stored in the directory: /tmp/negri_HyUGIcK5n6JdobvQU1kdFw ================================================================================ > ls -l /tmp/negri_HyUGIcK5n6JdobvQU1kdFw total 4 -rw-r--r-- 1 negri atlas 0 Sep 19 17:05 HelloWorld.err -rw-r--r-- 1 negri atlas 13 Sep 19 17:05 HelloWorld.out > cat /tmp/negri_HyUGIcK5n6JdobvQU1kdFw/ HelloWorld.out Hello World!
A more concrete case A generic ATLAS job has these JDL attributes: Rank = (other.GlueCEStateWaitingJobs == 0) ? ( (other.GlueCEStateFreeCPUs * 100) / ((other.GlueCEStateRunningJobs == 0) ? 1 : other.GlueCEStateRunningJobs) ) : ( -(other.GlueCEStateWaitingJobs * 100) / other.GlueCEStateRunningJobs) ; Requirements = (other.GlueCEStateStatus == "Production") && ((other.GlueCEPolicyMaxCPUTime * other.GlueHostBenchmarkSI00) >= 120016) && (other.GlueHostMainMemoryRAMSize >= 500) && (other.GlueHostNetworkAdapterOutboundIP == true) && (Member("VO-atlas-release-11.0.42", other.GlueHostApplicationSoftwareRunTimeEnvironment) || Member("VO-atlas-offline-11.0.42", other.GlueHostApplicationSoftwareRunTimeEnvironment)); The Rank uses two nested constructs “ true ? value1 : value2 “ and says if a site has no waiting jobs, then use (number of free CPU / 1) if there are no running jobs (number of free CPU / number of running jobs) if there are running jobs else, if there are waiting jobs, use - (number of waiting jobs * 100) / number of running jobs
Data management The Storage Element is the service that allows a user or an application to store data for future retrieval. In gLite, every SE must have a GSIFTP server, offering basically the same functionalitis of FTP but enhanced to support GSI security. Files that are copied to a SE should then be registered in a catalog. A catalog is basically a database that maps the name of a file (logical file name) to its physical location (physical file name). Files in a catalog may have more than one LFN (in principle, it has nothing to do with its real name), they can have more than one replica (that is, the aame file may be present on two different SE). What uniquely identifies them is the guid, grid unique identifier, a string of 40 bytes.
LCG File Catalog gLite supports two different types of catalogs: LFC (LCG File Catalog) and RLS (Replica Location Server). In this overview we’ll only deal with LFC, which is now the most used in ATLAS (the two catalogs are not synchronized!) The catalog can be accessed using data management commands from the UI. Two environment variables must be set: the file catalog type and its address export LCG_CATALOG_TYPE=lfc export LFC_HOST=lfc-atlas-test.cern.ch There are several LFC hosts on LCG and they’re not synchronized, so the choice of the user has to be consistent throughout his activity! Usually, there’s a central LFC per VO, so that basically there are no risks of this kind. LFN in LFC have a particular form: they’re organized in hierarchical directory-like structure, having the following look lfn:/grid/<VO>/<dir>/<filename>
LFC commands There are, on the UI, some commands that directly interact with the LFC catalog. Due to its particular LFN structure, files in the LFC catalog can be browsed as if they were in a unix filesystem. Try this: > lfc-ls /grid/atlas The lfc-ls command works just like a ls on a local filesystem (also allowing the -l option). In the same way, lfc-mkdir, lfc-chmod or lfc-chown behave almost like their corresponding brothers on unix. In spite of the easyness of LFC commands, usually only lfc-ls is used. Commands that perform actions on the catalog, that write on it or delete “directories” from it should be used with great caution: the risk is to cause inconsistencies between the catalog and the files on the SE. Data management command assure that such inconsistencies are not created. These commands write on the catalog but they also check that no “harm” is done to the system.
Data management commands Data management commands are of the form lcg-**. Some of them only access the catalog: lcg-aa add alias lcg-ra remove alias lcg-rf register file lcg-uf unregister file lcg-la list aliases lcg-lg list guid lcg-lr list replicas Some of them perform real data movement operations, usually updating the catalog about the new changes: lcg-cp copy locally a file (this command do not write on the catalog) lcg-cr copy and register a file on a SE lcg-del delete (physically) a file and its entry in the catalog lcg-rep replicate a file from a SE to another In order for these commands to work, besides the 2 catalog variables, another env variable must be set: export LCG_GFAL_INFOSYS=<BDII_address:2170>
Low level commands There are some “low-level” commands made available to grid users that should be used with caution, working merely on the SE without updating the catalog. Anyway, 2 of them will prove to be real friends to anyone who has to look for files on the grid: edg-gridftp-ls gsiftp://<SE_address>/<dir>/ globus-url-copy <src_file> <dest_file> The first command lists the content of a directory on a remote SE, the second one is the base for every lcg tool that has to move data. The <src_file> and <dest_file> have to be in a fully qualified format: file:///<abs_path>/<file_name> for local files gsiftp://<SE_address>/<abs_path>/<file_name> for remote files Other useful low-level commands (to be used carefully!) are edg-gridftp-rm <URL> edg-gridftp-rmdir <URL> edg-gridftp-rename <src_URL> <dest_URL>
Want to know more… You may find all the informations presented in these slides and much much more in the gLite documentation web page