640 likes | 650 Views
Learn about gLite WMS job submission, JDL attributes, DMS, SRM, LFC, and more. Submit jobs, manage tasks, and utilize Grid infrastructure efficiently.
E N D
The gLite WMS and theData Management System Giuseppe LA ROCCA INFN Catania giuseppe.larocca@ct.infn.it Master Class for Life Science, 4-6 May 2010 Singapore
Outline An introduction to the gLite WMS Job Submission via WMS Command line interface Job status The Job Description Language overview JDL attributes The gLite DMS The Storage Resource Manager (SRM) Grid file referencing schemes LFC File Catalogue Architecture LFC commands File & Replica Management Client Tools Run bioinformatics applications via Grid portal
Overview of the WMS The Workload Management System (WMS)is the gLite 3 component that allows users to submit jobs, and performs all tasks required to execute them, without exposing the user to the complexity of the Grid. Workload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources. • TheWorkload Manager (WM)aims to accept and satisfy requests for job management coming from its clients. • WM will pass the job to an appropriate CE for execution taking into account requirements and the preferences expressed in the job description. • The decision of which resource should be used is the outcome of amatchmakingprocess. • The Logging and Bookkeepingservicetracks jobs managed by the WMS. It collects events from many WMS components and records the status and history of the job.
Job Submission via WMS GILDA User Interface Computing Element Storage Element create proxy Grid Site VO Management Service(DB of VO users)
Job Submission via WMS WorkloadManagementSystem GILDA User Interface Write JDL, Submit job (executable) + small inputs Computing Element Storage Element Information System query create proxy publish state Grid Site VO Management Service(DB of VO users)
Job Submission via WMS WorkloadManagementSystem GILDA User Interface Write JDL, Submit job (executable) + small inputs Computing Element Storage Element Information System query create proxy publish state Submit job Logging Grid Site process VO Management Service(DB of VO users) Logging and bookkeeping
Job Submission via WMS WorkloadManagementSystem GILDA User Interface Write JDL, Submit job (executable) + small inputs Computing Element Storage Element Information System query Retrieve status& (small) output files create proxy publish state Submit job Retrieve output Job status Logging Grid Site process VO Management Service(DB of VO users) Logging and bookkeeping
The Command Line Interface The gLite WMS implements two different services to manage jobs: the Network Server and the WMProxy. The recommended method to manage jobs is through the gLite WMS via WMProxy, because it gives the best performance and allows to use the most advanced functionalities • The WMProxy implements several functionalities, among which: • submission of job collections; • faster authentication; • faster match-making; • faster response time for users; • higher job throughput.
To explicitly delegate a user proxy to WMProxy, the command to use is: glite-wms-job-delegate-proxy -d <delegID> Example: $glite-wms-job-delegate-proxy -d mydelegID Connecting to the service https://rb102.cern.ch:7443/glite_wms_wmproxy_server ======= glite-wms-job-delegate-proxy Success ======== Your proxy has been successfully delegated to the WMProxy: https://rb102.cern.ch:7443/glite_wms_wmproxy_server with the delegation identifier: mydelegID ===================================================== Proxy Delegation
Job Submission Starting from a simple JDL file, we can submit it via WMProxy by doing: $ glite-wms-job-submit –d mydelegID test.jdl Connecting to the service https://rb102.cern.ch:7443/glite_wms_wmproxy_server ======== glite-wms-job-submit Success ======== The job has been successfully submitted to the WMProxy Your job identifier is: https://rb102.cern.ch:9000/vZKKk3gdBla6RySximq_vQ ==============================================
Listing CE(s) that matching a job It is possible to see which CEs are eligible to run a job described by a given JDL using: $glite-wms-job-list-match –d mydelegID test.jdl Connecting to the service https://rb102.cern.ch:7443/glite_wms_wmproxy_server ==================================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* - CE.pakgrid.org.pk:2119/jobmanager-lcgpbs-cms - grid-ce0.desy.de:2119/jobmanager-lcgpbs-cms - gw-2.ccc.ucl.ac.uk:2119/jobmanager-sge-default - grid-ce2.desy.de:2119/jobmanager-lcgpbs-cms ====================================================
Retrieving the status of a job $glite-wms-job-status https://rb102.cern.ch:9000/fNdD4FW_Xxkt2s2aZJeoeg ===================================================== BOOKKEEPING INFORMATION: Status info for the Job : https://rb102.cern.ch:9000/fNdD4FW_Xxkt2s2aZJeoeg Current Status:Done (Success) Exit code:0 Status Reason: Job terminated successfully Destination: ce1.inrne.bas.bg:2119/jobmanager-lcgpbs-cms Submitted: Mon Dec 4 15:05:43 2006 CET ===================================================== Theverbositylevel controls the amount of information provided. The value of the-voption ranges from0to3. The commands to get the job status can have several jobIDs as arguments, i.e.:glite-wms-job-status <jobID1> ...or, more conveniently, the-i <file path>option can be used to
Retrieving the output(s) $glite-wms-job-output https://rb102.cern.ch:9000/yabp72aERhofLA6W2-LrJw Connecting to the service https://128.142.160.93:7443/glite_wms_wmproxy_server ===================================================== JOB GET OUTPUT OUTCOME Output sandbox files for the job: https://rb102.cern.ch:9000/yabp72aERhofLA6W2-LrJw have been successfully retrieved and stored in the directory: /tmp/doe_yabp72aERhofLA6W2-LrJw ===================================================== The default location for storing the outputs (normally /tmp) is defined in the UI configuration, but it is possible to specify in which directory to save the output using the --dir <path name>option.
Cancelling a job $ glite-wms-job-cancel https://rb102.cern.ch:9000/P1c60RFsrIZ9mnBALa7yZA Are you sure you want to remove specified job(s) [y/n]y : y Connecting to the service https://128.142.160.93:7443/glite_wms_wmproxy_server ========== glite-wms-job-cancel Success ============ The cancellation request has been successfully submitted for the following job(s): - https://rb102.cern.ch:9000/P1c60RFsrIZ9mnBALa7yZA ==================================================== If the cancellation is successful, the job will terminate in status CANCELLED
Job Submission with CLI GILDA User Interface delegID Computing Element Storage Element Manage job process glite-wms-job-delegate-proxy -d delegID glite-wms-job-list-match –d delegID hostname.jdl glite-wms-job-submit-d delegIDhostname.jdl JobID glite-wms-job-status JobID glite-wms-job-output JobID voms-proxy-init --voms gilda Grid Site VO Management Service(DB of VO users)
Job Description Language The Job Description Language (JDL)is a high-level language based on the Classified Advertisement (ClassAd) language, used to describe jobs and aggregates of jobs with arbitrary dependency relations. The JDL is used in WLCG/EGEE to specify the desired job characteristics and constraints, which are taken into account by the WMS to select the best resource to execute the job. A job description is a file (called JDL file) consisting of lines having the format: attribute = expression; Expressions can span several lines, but only the last one must be terminated by a semicolon.
The character “ ‘ ” cannot be used in the JDL. Comments must be preceded by a sharp character (#) or a double slash (//) at the beginning if each line. Multi-line comments must be enclosed between “/*” and “*/” . Job Description Language Attention!The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line.
Simple JDL example Executable = "/bin/hostname"; StdOutput = "std.out"; StdError = "std.err"; TheExecutableattribute specifies the command to be run by the job. If the command is already present on the WN, it must be expressed as a absolute path; if it has to be copied from the UI, only the file name must be specified, and the path of the command on the UI should be given in theInputSandboxattribute. Executable = "test.sh"; InputSandbox = {"/home/larocca/test.sh"}; StdOutput = "std.out"; StdError = "std.err";
The Arguments attribute can contain a string value, which is taken as argument list for the executable: Arguments = "fileA 10"; In theExecutableand in theArgumentsattributes it may be necessary to use special characters, such as&,\,|,>,<. These characters should be preceded by triple \ in the JDL, or specified inside quoted strings e.g.:Arguments = "-f file1\\\&file2"; The shell environment of the job can be modified using the Environment attribute. Environment = {"CMS_PATH=$HOME/cms"};
If files have to be copied from the UI to the execution node, they must be listed in the InputSandbox attribute: InputSandbox = {"test.sh", ... ,"fileN"}; The files to be transferred back to the UI after the job is finished can be specified using the OutputSandbox attribute:OutputSandbox = {"std.out","std.err"}; Wildcards are allowed onlyin the InputSandbox attribute. Absolute pathscannot be specified in the OutputSandbox attribute. The InputSandbox cannotcontain two files with the same name, even if they have a different absolute path, as when transferred they would overwrite each other.
TheRequirementsattribute can be used to express constraints on the resources where the job should run. Its value is a Boolean expression that must evaluate to true for a job to run on that specific CE. Note:Only one Requirements attribute can be specified (if there are more than one, only the last one is considered). If several conditions must be applied to the job, then they all must be combined in a single Requirements attribute. For example, let us suppose that the user wants to run on a CE using PBS as batch system, and whose WNs have at least two CPUs. He will write then in the job description file: Requirements = other.GlueCEInfoLRMSType == "PBS" && other.GlueCEInfoTotalCPUs > 1;
The WMS can be also asked to send a job to a particular queue in a CE with the following expression: Requirements = other.GlueCEUniqueID == "lxshare0286.cern.ch:2119/jobmanager-pbs-short"; It is also possible to use regular expressions when expressing a requirement. Let us suppose for example that the user wants all his jobs to run on any CE in the domain cern.ch. This can be achieved putting in the JDL file the following expression: Requirements = RegExp("cern.ch",other.GlueCEUniqueID); The opposite can be required by using: Requirements = (!RegExp("cern.ch", other.GlueCEUniqueID));
If the job must run on a CE where a particular experiment software is installed and this information is published by the CE, something like the following must be written: Requirements = Member(“BLAST-1.0.3”, other.GlueHostApplicationSoftwareRunTimeEnvironment); Note:TheMember operatoris used to test if its first argument (a scalar value) is a member of its second argument (a list). In fact, the GlueHostApplicationSoftwareRunTimeEnvironment attribute is a list of strings and is used to publish any VO-specific information relative to the CE (typically, information on the VO software available on that CE).
Advanced job types Job Collection: a set of independent jobs that user can submit and monitor as it was a single job [ Type = “Collection"; nodes={ [ Executable = "/bin/hostname"; Arguments = “-f"; StdOutput = "hostname.out"; StdError = "hostname.err"; OutputSandbox = {"hostname.err","hostname.out"}; ] , [ Executable = "/bin/sh"; Arguments = "start_povray_valve.sh"; StdOutput = “povray.out"; StdError = “povray.err"; InputSandbox = {“start_povray_valve.sh"}; OutputSandbox = {“povray.err",“povray.out"}; Requirements = Member (“POVRAY-3,5”, other.GlueHostApplicationSoftwareRunTimeEnvironment); ] }; ]
Advanced job types Parametric Job: a job collection where the jobs are identical but for the value of a running parameter • JobType = "Parametric"; • Executable = “/bin/echo"; • Arguments = “_PARAM_”; • StdOutput = "myoutput_PARAM_.txt"; • StdError = "myerror_PARAM_.txt"; • Parameters = 3; • ParameterStep = 1; • ParameterStart = 1; • OutputSandbox = {“myoutput_PARAM_.txt”};
DAG is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other ones The jobs are nodes (vertices) in the graph the edges (arcs) identify the dependencies Advanced job types nodeA nodeB nodeC NodeF nodeD • Type = "dag"; • max_nodes_running = 5; • InputSandbox = {"/tmp/foo/*.exe", "/home/larocca/bar", "gsiftp://neo.datamat.it:5678/tmp/cms_sim.exe ", "file:///tmp/myconf"}; • InputSandboxBaseURI = "gsiftp://matrix.datamat.it:5432/tmp"; • nodes = [ • nodeA = [ description = [ • JobType = "Normal"; • Executable = "a.exe"; • InputSandbox = { "/home/larocca/myfile.txt", root.InputSandbox}; • ]; • ]; • nodeF = [ description = [ • JobType = "Normal"; • Executable = "b.exe"; • Arguments = "1 2 3"; • OutputSandbox = {"myoutput.txt", "myerror.txt" }; • ]; • ]; • nodeD = [ description = [ • JobType = "Checkpointable"; • Executable = "b.exe"; • Arguments = "1 2 3"; • InputSandbox = { "file:///home/larocca/data.txt", root.nodes.nodeF.description.OutputSandbox[0] }; • ]; • ]; • nodeC = [ file = "/home/larocca/nodec.jdl"; ]; • nodeB = [ file = "foo.jdl"; ]; • ]; • dependencies = { { nodeA, nodeB }, { nodeA, nodeC }, {nodeA, nodeF }, { { nodeB, nodeC, nodeF }, nodeD } };
References WMProxy User’s guide https://edms.cern.ch/file/674643/1/EGEE-JRA1-TEC-674643-WMPROXY-guide-v0-3.pdf JDL Attributes Specification https://edms.cern.ch/file/555796/1/EGEE-JRA1-TEC-555796-JDL-Attributes-v0-8.pdf https://edms.cern.ch/file/590869/1/EGEE-JRA1-TEC-590869-JDL-Attributes-v0-9.pdf gLite 3.1 user’s guide https://edms.cern.ch/file/722398/1.2/gLite-3-UserGuide.pdf Complex jobs https://grid.ct.infn.it/twiki/bin/view/GILDA/WmProxyUse WMProxy API usage http://www.euasiagrid.org/wiki/index.php/WMProxy_Java_APIhttps://grid.ct.infn.it/twiki/bin/view/GILDA/WMProxyCPPAPI
Storage Elements TheStorage Elementis the service which allows a user or an application to store/retrieve data for future retrieval. The DMS provides services to locate, access and transfer files User does not need to know the physical location of file, just its logical file name; Files can be replicated or transferred to several locations (SEs) as needed; Files are shared with all the members of the given VO. Files stored in a SE are written-once, read-many Files cannot be changed unless remove or replaced;
Protocols TheGSIFTPprotocol offers the functionalities of FTP, but with support for GSI. It is responsible for secure, fast and efficient file transfers to/from Storage Elements. RFIO was developed to access tape archiving systems, such as CASTOR (CERN Advanced STORage manager) and it comes in a secure and an insecure version. The gsidcapprotocol is the GSI enabled version of the dCache native access protocol, dcap.
Grid file referencing schemes LFN GUID SURL TURL • Logical File Name (LFN) • lfn:/grid/gilda/tutorials/input-file • Grid Unique IDentifier (GUID) • guid:4d57edef-fa5c-4512-a345-1c838916b357 • Storage URL (for a specific replica, on a specific Storage Element) • srm://aliserv6.ct.infn.it/gilda/generated/2007-11-13/fileb366f371-b2c0-485d-b12c-c114edaf4db4 • sfn://se01.athena.hellasgrid.gr/data/dteam/doe/file1 • Transport URL (for a specific replica, on an SE, with a specific protocol) • gsiftp://aliserv6.ct.infn.it/gilda/generated/2007-11-13/fileb366f371-b2c0-485d-b12c-c114edaf4db4
Needles in a haystack How do I keep track of all files I have on Grid ? How does the Grid keep track of the mapping between LFN(s), GUID and SURL(s) ? LFC File Catalogue LFC = LCG File Catalogue LCG = LHC Compute Grid LHC = Large Hadron Collider • TheLCG File Catalogueis the service which maintains mappings between LFN(s), GUID and SURL(s).
LFC File Catalogue It consists of a unique catalogue, where the LFN is the main key. Further LFNs can be added as symlinks to the main LFN. Looks like a “top-level” directory in the Grid For each of the supported VO a separate subdirectory does exist under “/grid” directory All the members of the VO have read/write permissions System metadata are supported, while for user metadata only a single string entry is available The catalogue publishes its endpoint in the Information Service so that it can be discovered by Data Management tools and other services (the WMS for example).
Listing the entries of a LFC directory lfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [--ds] path… wherepathspecifies the LFN pathname (mandatory) Remember that LFC has a directory tree structure /grid/<VO_name>/<you create it> All members of a VO have read-write permissions under their directory You can setLFC_HOMEto use relative paths lfc-ls /grid/gilda/tutorials/taipei02 export LFC_HOME=/grid/gilda/tutorials lfc-ls -l taipei02 lfc-ls -l -R /grid lfc-ls Defined by the user LFC Namespace
lfc-mkdir Creating directories in the LFC lfc-mkdir [-m mode] [-p] path... Wherepathspecifies the LFC pathname Remember that while registering a new file (using lcg-cr, for example) the corresponding destination directory must be created in the catalog beforehand. Examples: lfc-mkdir /grid/gilda/<YOUR_DIRECTORY> Created by the user
lfc-ln Creating a symbolic link lfc-ln -s file linkname lfc-ln -s directory linkname Create a link to the specified file or directory with linkname Examples: lfc-ln -s /grid/gilda/test /grid/gilda/aLink Let’s check the link using lfc-ls with long listing lfc-ls -l aLink lrwxrwxrwx 1 19122 1077 0 Jun 14 11:58 aLink -> /grid/gilda/test Original File Symbolic Link
Adding metadata information The lfc-setcomment and lfc-delcomment commands allow the user to associate a comment with a catalogue entry and delete such comment. This is the only user-defined metadata that can be associated with catalogue entries. The comments for the files may be listed using the --comment option of the lfc-ls command. This is shown in the following example: $ lfc-setcomment /grid/gilda/file1 “My metadata“ $ lfc-ls --comment /grid/gilda/file1 /grid/gilda/file1 My metadata
LCG Data Management Client Tools The LCG Data Management tools allow users to copy files between UI, WN and a SE, to register entries in the file catalogue and replicate files between SEs.
The --vo <vo name>option, to specify the virtual organisation of the user, is present in all commands, except forlcg-gt. Its usage is mandatory unless the variableLCG_GFAL_VOis set(e.g.: export LCG_GFAL_VO=gilda) Timeouts The commandslcg-cr,lcg-del,lcg-gt,lcg-rf,lcg-sdand lcg-repall have timeouts implemented. By using the option-t, the user can specify a number of seconds for the timeout. The default is 0 seconds, that is no timeout. If we got a times out during the performing of an operation, all actions performed till that moment are rolled back, so no broken files are left on a SE and no existing files are not registered in the catalogues. Environment variables /1
Environment variables /2 For alllcg-*commands to work, the environment variable LCG_GFAL_INFOSYSmust be set to point to a top BDII in the format <hostname>:<port>, so that the commands can retrieve the necessary information export LCG_GFAL_INFOSYS=gilda-bdii.ct.infn.it:2170 TheVO_<VO>_DEFAULT_SEvariable specifies the default SE for the VO. export VO_GILDA_DEFAULT_SE=aliserv6.ct.infn.it
Uploading a file to the Grid /1 $ lcg-cr --vo gilda -d aliserv6.ct.infn.it \ file:/home/larocca/file1 guid:6ac491ea-684c-11d8-8f12-9c97cebf582a where the only argument is the local file to be uploaded and the-d <destination>option indicates the SE used as the destination for the file. The command returns the file GUID. If no destination is given, the SE specified by the VO_<VO>_DEFAULT_SE environmental variable is taken. The-P option allows the user to specify a relative path name for the file in the SE. If no -P option is given, the relative path is automatically generated.
The following are examples of the different ways to specify a destination: -d aliserv6.ct.infn.it -d srm://aliserv6.ct.infn.it/data/gilda/my_file -d aliserv6.ct.infn.it -P my_dir/my_file The –l <lfn> option can be used to specify a LFN: $ lcg-cr --vo gilda -d aliserv6.ct.infn.it \ -l lfn:/grid/gilda/myalias1 \ file:/home/larocca/file1 guid:db7ddbc5-613e-423f-9501-3c0c00a0ae24 Uploading a file to the Grid /2
Replicating a file $ lcg-rep -v --vo gilda -d <SECOND_SE> \ guid:db7ddbc5-613e-423f-9501-3c0c00a0ae24 Source URL: sfn://aliserv6.ct.infn.it/data/gilda/larocca/file1 File size: 30 Destination specified: <SECOND_SE> Source URL for copy: gsiftp://aliserv6.ct.infn.it/data/gilda/larocca/file1 Destination URL for copy: gsiftp://<SECOND_SE>/data/gilda/generated/2004-07-09/ file50c0752c-f61f-4bc3-b48e-af3f22924b57 # streams: 1 Transfer took 2040 ms Destination URL registered in LRC: srm://<SECOND_SE>/data/gilda/generated/2004-07-09/file50c0752c-f61f-4bc3-b48e-af3f22924b57
Listing replicas $ lcg-lr --vo gilda \ lfn:/grid/gilda/tutorials/larocca/my_alias1 srm://aliserv6.ct.infn.it/data/gilda/generated/2004-07-09/file79aee616-6cd7-4b75-8848-f091 srm://<SECOND_SE>/data/gilda/generated/2004-07-08/file0dcabb46-2214-4db8-9ee8-2930 Again, a LFN, the GUID or a SURL can be used to specify the file.
Copying files out the Grid $ lcg-cp --vo gilda -t 100 -v lfn:/grid/gilda/tutorials/mytext.txt file:/tmp/mytext.txt Source URL: lfn:/grid/gilda/mytext.txt File size: 104857600 Source URL for copy: gsiftp://aliserv6.ct.infn.it:/storage/gilda/2007-07-06/input2.dat.10.0 Destination URL: file:///tmp/myfile # streams: 1 # set timeout to 100 (seconds) 85983232 bytes 8396.77 KB/sec avg 9216.11 Transfer took 12040 ms