880 likes | 901 Views
Job Submission II. Marco Bencivenni – Daniele Cesini INFN CNAF Grid School Bologna, 21-24 February 2011. Outline. Summary JDL Advanced Attributes Multi node Jobs More JDL attributes and UI client options Direct submission to CREAM CE. GLUESchema.
E N D
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Job Submission II Marco Bencivenni – Daniele Cesini INFN CNAF Grid School Bologna, 21-24 February 2011
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Outline • Summary • JDL Advanced Attributes • Multi node Jobs • More JDL attributes and UI client options • Direct submission to CREAM CE
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 GLUESchema • GLUESchema (Grid Logical Uniform Environment) • Provides a standardized description of the Grid • Allows to present resources and services to users and external services in a uniform way. • The intended uses are: • resource discovery (“what is out there?”) • selection (“what are the properties?”) • monitoring (“what is the state of the system?”) • LDAP is used to publish GLUESchema information • GLUESchema published information can be used inside a JDL to make requirements and ranking expressions
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Before submission: First of all…..theproxy! voms-proxy-init --voms <vo_name> voms-proxy-init --voms infngrid:/infngrid/Role=SoftwareManager voms-proxy-info --all Listing resources that can execute the job and match the JDL requirements glite-wms-job-list-match –a first.jdl glite-wms-job-list-match -a --rank -c wms_rb00.conf first.jdl Submitting a JDL glite-wms-job-submit –a first.jdl glite-wms-job-submit -a -c wms_rb00.conf first.jdl
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 The JobID • Upon submission each job is assigned a unique, virtually non-recyclable job identifier. In an URL form: https://<LB_hostname>[:<port>]/<unique_string> • <LB hostname> is the hostname of the Logging and Bookkeeping (LB) server for the job • The remainder is a random generated sequence • The JobId is used for any other further operation on the job after submission
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 After submission: • Retrieving the job status: glite-wms-job-status <Job_ID> https://albalonga.cnaf.infn.it:9000/TWr2bZ0QlaWsBrd43zslAg • When the status of the job is Done (Success) the Output Sandbox can be retrieved with: • glite-wms-job-output <Job_ID> • Output directory can be changed with - - dir <OutDir> • Note that the OSB are periodically purged from the WMS, do not wait for too long before retrieving them • All information stored on the LB about a job can be queried using: • glite-wms-job-logging-info –v<1|2|3> <Job_ID> • A job can be cancelled after submission using: • glite-wms-job-cancel <Job_ID> Try to increase verbosity up to –v 3
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Jobs State Machine Submitted The job has been submitted from the UI but it is still waiting to be accepted by the WMProxy Waiting The job has been accepted by the WMproxy and it is waiting to be processed by the WM Ready The job has been processed by the WM but it hasn't been transferred to the CE yet Running job is executing! Scheduled job is waiting in the CE queue Aborted The processed job has been abortedby the WMS (for too long in a queue on the WM or on the CE, expired credentials etc.) Done The job has terminated, either successfully or to be terminated with some error. (i. e.: due to unrecoverable errors on the CE side) CancelledJob has been cancelled by the user Cleared The output has been transferred by the user or removed because of some timeout
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 A minimal JDL • Executable = <string>; • StdOutput = <string>; && StdError = <string>; Executable = "/bin/hostname"; StdOutput = "std.out"; StdError = "std.err"; Executable = "test.sh"; InputSandbox = {"/home/cesini/corso/test.sh"}; StdOutput = "std.out"; StdError = "std.err"; • InputSandbox = < string | list of strings > • OutputSandbox = < string | list of strings > [cesini@lcg-ui cesini]$ cat first.jdl Executable = "test.sh"; Arguments = "fileA fileB"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"test.sh", "fileA", "fileB"}; OutputSandbox = {"std.out", "std.err"}; • Arguments = < string > • Used to pass arguments to the executable : Arguments = "fileA 10";
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Work in progress [cesini@lcg-ui DataReq]$ cat data-req.jdl ################################## # JDL with Data Requirements # ################################## Executable = "calc-pi.sh"; # Arguments is the number of digits, must be < 1000000 Arguments = "1000"; StdOutput = "std.out"; StdError = "std.err"; Prologue = "prologue.sh"; FuzzyRank = true; InputSandbox = {"calc-pi.sh", "fileA", "fileB"}; OutputSandbox = {"std.out", "std.err","out-PI.txt","out-e.txt“,”prologue.sh”}; Requirements = other.GlueCEInfoHostName != "spacin-ce1.dma.unina.it"; DataRequirements = { [ DataCatalogType = "DLI"; DataCatalog = "http://lfcserver.cnaf.infn.it:8085"; InputData = {"lfn:/grid/infngrid/cesini/PI_1M.txt", "lfn:/grid/infngrid/cesini/e-2M.txt"}; ] }; [cesini@lcg-ui cesini]$ cat first.jdl Executable = "test.sh"; Arguments = "fileA fileB"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"test.sh", "fileA", "fileB"}; OutputSandbox = {"std.out", "std.err"};
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Outline • Summary • JDL Advanced Attributes • Multi node Jobs • More JDL attributes and UI client options • Direct submission to CREAM CE
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 JDL • The Job Description Language (JDL) is a high-level language based on the Classified Advertisement (ClassAd) language, used to describe jobs and aggregates of jobs with arbitrary dependency relations. • The JDL is used to specify the desired job characteristics and constraints, which are taken into account by the WMS to select the best resource to execute the job. • A job description is a file (called JDL file) consisting of lines having the format: attribute = expression; • Expressions can consist of several lines, but only the last one must be terminated by a semicolon. Literal strings are enclosed in double quotes.
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 JDL Hints • Key/Value pair • Value can be: • A number • A string ““ • A list { } • A classAD [ ] • Statements end with semicolon • In general, special characters such as &, |, >, < are only allowed if specified inside a quoted string or preceded by triple \. The character ' (single quote) cannot be specified in the JDL • Comments must be preceded by a sharp character (#) or a double slash (//) at the beginning of each line. Multi-line comments must be enclosed between “/*”and “*/” . • Attention! The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line.
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 ClassAd Operators
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 InputSandBox InputSandbox = <string | or list of string> • identifies the list of files on: • the UI local file system • A gridFTP server • An HTTPS server (but this requires to have the GridSite htcp client command installed on the WN; this is not true in current CE standard configuration) Executable = “cms_sim.exe”; InputSandbox = {“gsiftp://neo.datamat.it:5678/tmp/cms_sim.exe” , ……… }; Executable = “cms_sim.exe”; InputSandbox = {“/home/edguser/sim/cms_sim.exe”, ……… }; InputSandbox = { "/tmp/ns.log", "myscript.sh", "gsiftp://neo.datamat.it:5678/home/fpacini/cms_sim.exe ", };
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 InputSandBoxBaseURI • InputSandboxBaseURI = < string > • Changes the InputSandbox path pointing to gsiFTP server • InputSandbox = “myfile.dat”; • InputSandboxBaseURI =“gsiftp://gridit-se-01.cnaf.infn.it/tmp”; • means that InputSandbox = “myfile.dat”;is: • InputSandbox = “gsiftp://gridit-se-01.cnaf.infn.it/tmp/myfile.dat”;
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 OutputSandBox OutputSandbox = <string | or list of string> • identifies the list of files generated by the job on the WN at runtime, which the user wants to retrieve • Glite-wms-job output OutputSandbox = { "myjobOutput", "myjobError", "run1/event1", "run1/event2", };
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 OutputSandBoxDesrUri OutputSandboxDestURI = <string> or <string list> • represents the URI(s) on a gridFTP/HTTPS server where the files listed in the OutputSandbox attribute have to be transferred at job completion. • allows to have the output directly copied to specified locations running: • a gridFTP server • an HTTPS server (but this requires to have the GridSite htcp client command installed on the WN; this is not true in current WN standard configuration). • Note that output files managed in this way are not retrieved by the glite-wms-job-output command. • The OutputSandboxDestURI list must have the same cardinality as the OutputSandbox list, otherwise the JDL will be considered as invalid. OutputSandboxBaseDestURI = <string> • represents the base URI on a gridFTP/HTTPS server
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 example OutputSandbox = {"fileA", "data/fileB", "fileC"}; OutputSandboxDestURI = { "gsiftp://lxb0707.cern.ch/cms/doe/fileA", "gsiftp://lxb0707.cern.ch/cms/doe/fileB", "fileC"}; • where the first two files have to be copied to a GridFTP server, while the third file will be copied back to the WMS with the usual mechanism. • Clearly, glite-wms-job-output will retrieve only the third file.
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 InputSandBox hand-on Make sure that remote files are available! Or your jobs will remain stuck forever waiting for ISB [cesini@lcg-ui SandBox]$ cat remote-ISB.jdl #################################### # JDL with advanced ISB handling # #################################### Executable = "test.sh"; Arguments = "fileA fileB"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"test.sh", "gsiftp://gridit-se-01.cnaf.infn.it/tmp/fileA“, "gsiftp://gridit-se-01.cnaf.infn.it/tmp/fileB"}; OutputSandbox = {"std.out", "std.err"}; globus-url-copy file:///home/cesini/corso/fileA gsiftp://sunstorm.cnaf.infn.it/tmp/fileA globus-url-copy file:///home/cesini/corso/fileB gsiftp://sunstorm.cnaf.infn.it/tmp/fileB globus-url-copy file:///home/cesini/corso/test.sh gsiftp://sunstorm.cnaf.infn.it/tmp/test.sh [cesini@lcg-ui SandBox]$ cat remote-ISB-BaseURI.jdl #################################### # JDL with advanced ISB handling 2 # #################################### Executable = "test.sh"; Arguments = "fileA fileB"; StdOutput = "std.out"; StdError = "std.err"; InputSandboxBaseURI = "gsiftp://gridit-se-01.cnaf.infn.it/tmp"; InputSandbox = {"test.sh", "fileA","fileB"}; # You can force to use a local file explicitly indicating the file with file:// # InputSandbox = {"file://home/cesini/corso/SandBox/test.sh", "fileA","fileB"}; # or only with the complete path # InputSandbox = {"/home/cesini/corso/SandBox/test.sh", "fileA","fileB"}; OutputSandbox = {"std.out", "std.err"};
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 OutputSandBox handling [cesini@lcg-ui SandBox]$ cat remote-ISB-OSB.jdl ############################################## # JDL with advanced ISB and OSB handling # ############################################## Executable = "test.sh"; Arguments = "fileA fileB"; StdOutput = "std.out"; StdError = "std.err"; InputSandboxBaseURI = "gsiftp://sunstorm.cnaf.infn.it/tmp"; InputSandbox = {"test.sh", "fileA","fileB"}; OutputSandbox = {"std.out", "std.err"}; OutputSandboxBaseDestURI = "gsiftp://sunstorm.cnaf.infn.it/tmp";
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Prologue • Prologue = < string > • <string> is the name of a script/executable that must be run as a prologue within the job wrapper before the user job is started • It can be used for preliminary (i.e.): • data transfers • environment checks • DB updates • If shallow resubmission is enabled and prologue fails the job will be shallow resubmitted otherwise deeply. • Use PrologueArguments = <string>to pass arguments to the prologue executable Prologoue = “my_prologue_script” Prologue = “/bin/false” #Can be used to test shallow resubmission
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Epilogue • Epilogue = < string > • <string> is an executable/script run within the WMS job wrapper after the user job completion • It can be used for post (i.e.): • Data transfers • DB updates • Job functionality checks • If epilogue fails the job will be deeply resubmitted. • Use EpilogueArguments = <string>to pass arguments to the epilogue executable Epilogoue = “my_epilogue_script” Epilogue = “/bin/false” #Can be Used to test the deep resubmission
Job Resubmission GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 • These attributes are used to handle failed jobs and their resubmission • Deep resubmission: • when the user’s job has started running on the WN and then the job itself or the WMS JobWrapper has failed. • on every grid failure (even before the job started on the WN) if the shallow is disabled • Shallow resubmission: • when the WMS JobWrapper has failed before starting the actual user’s job.
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 RetryCount/ShallowRetryCount • RetryCount = <positive integer > • Sets how many deepresubmissions have to be done before aborting the job • Limited by MaxRetryCount on server side (default for MaxRetryCount is 10). • Zeroes the shallow retry counter • 0 disable the deep retry • ShallowRetryCount = < integer greater (equal) than -1 > • Sets how many shallow resubmissions have to be done before aborting the job • Limited by MaxShallowRetryCount on server side • -1 disable the shallow retry (it is different from 0)
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 RetryCount & ShallowRetryCount [cesini@lcg-ui Retry]$ Status info for the Job : https://albalonga.cnaf.infn.it:9000/PSpMatGORSXfUk-P7pihEA Current Status: Aborted Status Reason: hit job shallow retry count (2) [cesini@lcg-ui Retry]$ grep -i shallow log_info-v3_retry1.txt ShallowRetryCount = 2; - result = SHALLOW - result = SHALLOW - reason = hit job shallow retry count (2) [cesini@lcg-ui Retry]$ cat retry1.jdl ################################## # JDL with retry control activated # ################################## Executable = "test.sh"; Arguments = "fileA fileB"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"test.sh", "fileA", "fileB"}; OutputSandbox = {"std.out", "std.err"}; # This will resubmit deeply once RetryCount = 1; # This will resubmit shallowly twice ShallowRetryCount = 2; # This is a trick, will be resubmitted Prologue = "/bin/false";
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 RetryCount & ShallowRetryCount ShallowRetryCount = 0; Status info for the Job : https://albalonga.cnaf.infn.it:9000/EXQTAHeA4kLOPo7XkATyHQ Current Status: Aborted Status Reason: hit job shallow retry count (0) # This will resubmit deeply once RetryCount = 1; # This will resubmit shallowly zero times ShallowRetryCount = 0; # This is a trick, will be resubmitted Prologue = "/bin/false"; ShallowRetryCount = -1; # This will resubmit deeply once RetryCount = 1; # This will resubmit shallowly zero times ShallowRetryCount = -1; # This is a trick, will be resubmitted Prologue = "/bin/false"; Status info for the Job : https://albalonga.cnaf.infn.it:9000/D4DvhrfV1BL95fz38x12xA Current Status: Aborted Status Reason: hit job retry count (1)
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 ZippedISB • ZippedISB = <string | list of string> • string or a list of strings containing the file name of the compressed (gziped) tarball containing input sandbox files for the job, e.g.: • ZippedISB = “BossArchive_1_2_1.tgz”; • WMProxy service takes the specified archive and explodes it in the right locations. • Note that this attribute MUST NOT be set when the submission is done through the WMProxy client commands and the AllowZippedISB attribute is set to true
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 AllowZippedISB AllowZippedISB = <bool> • When set to true makes the WMProxy client commands archive and compress all job input sandbox files into a single tar, gzipped file that is then transferred to the WMS. • particularly useful when the job sandbox is composed by a large number of files • Not mandatory. If not specified in the JDL it is assumed to be set to false. • If AllowZippedISB is set to true, then the ZippedISB attribute is set by the client command irrespective of what it contains
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 AllowZippedISB [cesini@lcg-ui Retry]$ cat ../SandBox/allowZippedISB.jdl ############################################ # Example JDL with Allow ZippedISB Enabled # ############################################ Executable = "test.sh"; Arguments = "fileA fileB"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"test.sh", "../fileA", "../fileB"}; AllowZippedISB = true; OutputSandbox = {"std.out", "std.err"};
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Requirements • Requirements =< Boolean ClassAd expression> • describes which CEs in the IS are eligible to run the job. • the attributes that can be used are those expressed on the GlueSchema with the “other.” prefix • It is mandatory • If this attribute is not included in the JDL the client sets it to: • Requirements = other.GlueCEStateStatus == "Production"; • as “Production” is the nominal working state for a CE Requirements = other.GlueCEInfoHostName == “gridit-ce-001.cnaf.infn.it"; Requirements = other.GlueCEInfoTotalCPUs > 2 && other.GlueCEPolicyMaxRunningJobs < 2; Requirements = other.GlueCEPolicyMaxCPUTime >= 1800; Requirements = (other.GlueCEUniqueID == "gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert"); Requirements= Member("INFN-CNAF“ other.GlueHostApplicationSoftwareRunTimeEnvironment);
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 An ATLAS Requirement Requirements = ( ( ( ( ( ( other.GlueCEStateStatus == "Production" ) && ( !( RegExp(".*blah.*",other.GlueCEUniqueID) || RegExp("ce02.grid.acad.bg.*",other.GlueCEUniqueID) || RegExp(".*jobmanager-condor.*",other.GlueCEUniqueID) || RegExp(".*jinr.ru.*",other.GlueCEUniqueID) || RegExp("mars-ce2.mars.lesc.doc.ic.ac.uk.*",other.GlueCEUniqueID) || RegExp(".*.na.infn.it.*",other.GlueCEUniqueID) || RegExp(".*.ph.liv.ac.uk.*",other.GlueCEUniqueID) || RegExp("atlasce.lnf.infn.it.*",other.GlueCEUniqueID) || RegExp("ce-iep-grid.saske.sk.*",other.GlueCEUniqueID) || RegExp("ce.phy.bg.ac.yu.*",other.GlueCEUniqueID) || RegExp("ce.polgrid.pl.*",other.GlueCEUniqueID) || RegExp("grid-ce.physik.uni-wuppertal.de.*",other.GlueCEUniqueID) || RegExp(".*.cern.ch.*",other.GlueCEUniqueID) ) ) ) && ( Member("VO-atlas-cloud-ES",other.GlueHostApplicationSoftwareRunTimeEnvironment) || RegExp("ce04.pic.es",other.GlueCEUniqueID) || RegExp("lcg2ce.ific.uv.es",other.GlueCEUniqueID) || RegExp("ce01.ific.uv.es",other.GlueCEUniqueID) || RegExp("ifaece01.pic.es",other.GlueCEUniqueID) || RegExp("grid003.ft.uam.es",other.GlueCEUniqueID) || RegExp("ce02.lip.pt",other.GlueCEUniqueID) || RegExp("grid006.lca.uc.pt",other.GlueCEUniqueID) || Member("VO-atlas-tier-T0",other.GlueHostApplicationSoftwareRunTimeEnvironment) || Member("VO-atlas-tier-T1",other.GlueHostApplicationSoftwareRunTimeEnvironment) || Member("VO-atlas-tier-T2",other.GlueHostApplicationSoftwareRunTimeEnvironment) ) ) && ( Member("VO-atlas-release-12.0.7",other.GlueHostApplicationSoftwareRunTimeEnvironment) || Member("VO-atlas-offline-12.0.7",other.GlueHostApplicationSoftwareRunTimeEnvironment) || Member("VO-atlas-production-12.0.7",other.GlueHostApplicationSoftwareRunTimeEnvironment) ) ) && ( ( other.GlueCEPolicyMaxCPUTime * other.GlueHostBenchmarkSI00 ) >= 1333350 ) ) && ( other.GlueHostMainMemoryRAMSize >= 800 ) ) && ( other.GlueHostNetworkAdapterOutboundIP == true );
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Requirements [cesini@lcg-ui Requirements]$ glite-wms-job-list-match -a ../first.jdl |grep -c 2119 CE number = 522 #no requirements using dteam VO [cesini@lcg-ui Requirements]$ glite-wms-job-list-match -a req1.jdl |grep -c 2119 119 #Requirements = (other.GlueCEInfoLRMSType == "PBS" && other.GlueCEInfoTotalCPUs > 25 ); dteam VO [cesini@lcg-ui Requirements]$ glite-wms-job-list-match -a req2.jdl |grep -c 2119 28 # With the previous ATLAS requirements [cesini@lcg-ui Requirements]$ glite-wms-job-list-match -a req3.jdl # False requirements Connecting to the service https://glite-rb-00.cnaf.infn.it:7443/glite_wms_wmproxy_server ==================== glite-wms-job-list-match failure ==================== No Computing Element matching your job requirements has been found! ================================================================
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Rank • Rank = < ClassAd Floating-Point expression> • States how to rank CEs that met the Requirements • The WMS will submit the job to the CE with the highest rank. • It is mandatory • If not specified in the jdl, the clients on the UI add • Rank = -other.GlueCEStateEstimatedResponseTime; • (CE with the minimal Estimated time for traversing the local batch system – calculated by the CE itself) • Rank = other.GlueCEPolicyMaxRunningJobs-other.GlueCEStateRunningJobs; • (CE with the max number of free slots) • Rank = <some constant> • (e.g. Rank = 1; constant value, all CE should be treated in the same way by the WMS. In this case the CE for submitting the job is chose randomly by the WMS.)
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Rank • Rank = other.GlueCEPolicyMaxRunningJobs-other.GlueCEStateRunningJobs; • (CE with the max number of free slots) [cesini@lcg-ui Rank]$ glite-wms-job-list-match --rank -a rank1.jdl Connecting to the service https://glite-rb-00.cnaf.infn.it:7443/glite_wms_wmproxy_server ======================================================================= COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* *Rank* - ce01-lhcb-t2.cr.cnaf.infn.it:2119/jobmanager-lcglsf-cert_t 22884 - ce02-lhcb-t2.cr.cnaf.infn.it:2119/jobmanager-lcglsf-cert_t 22884 - gridce2.pi.infn.it:2119/jobmanager-lcglsf-cert4 1360 - grid012.ct.infn.it:2119/jobmanager-lcglsf-cert 164 - prod-ce-01.pd.infn.it:2119/jobmanager-lcglsf-cert 104 - gridce.pi.infn.it:2119/jobmanager-lcglsf-cert 100
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Rank Rank = 1 [cesini@lcg-ui Rank]$ glite-wms-job-list-match --rank -a rank2.jdl Connecting to the service https://glite-rb-00.cnaf.infn.it:7443/glite_wms_wmproxy_server ====================================================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* *Rank* - atlasce.lnf.infn.it:2119/jobmanager-lcgpbs-cert 1 - atlasce01.na.infn.it:2119/jobmanager-lcgpbs-cert 1 - beagle14.ba.itb.cnr.it:2119/jobmanager-lcgpbs-cert1 - bogrid5.bo.infn.it:2119/jobmanager-lcgpbs-cert 1 - ce.grid.unipg.it:2119/jobmanager-lcgpbs-cert 1 - ce01-lhcb-t2.cr.cnaf.infn.it:2119/jobmanager-lcglsf-cert_t21 - ce02-lhcb-t2.cr.cnaf.infn.it:2119/jobmanager-lcglsf-cert_t21 - ce03-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-infngrid1 - ce05-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-infngrid1 - ce06-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-infngrid1 - cex.grid.unipg.it:2119/jobmanager-lcgpbs-cert 1
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 DataRequirements DataRequirements = <list of ClassAds> • Represents the data requirements for a job. • Each ClassAd in the list contains three attributes: • InputData (the list of input data needed by the job) • DataCatalogType (type of data catalog that has to be targeted to resolve logical names) • DataCatalog (the URI of the data catalog if this is not the VO default one) • DataRequirements = { • [ DataCatalogType = “...” ; • DataCatalog = “https://...”; • InputData = {“lfn:…”, “guid:…”, “query:…” }; • ], • …}
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 InputData • InputData =<string | list of strings> • Represents Logical File Names (LFN), Grid Unique IDentifiers (GUID), and/or generic queries. • Used by the WMS to query the related Data Catalog for getting back a list of Physical File names (PFN) that are needed by the job as input for processing. • Listed names have to be prefixed with “lfn:”, “guid:”, and “query:”to indicate that they are respectively LFNs, GUIDs, and generic queries. InputData = { “lfn:/EO.test.file” , “guid:135b7b23-4a6a-11d7-87e7-9d101f8c8b70”, };
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 DataRequirements example [ Type = "job"; JobType = "normal"; Executable = "Test.sh"; Arguments = "lfn:/grid/eumed/alaa/satimage lfn:/grid/eumed/alaa/satimagejpg"; InputSandbox = {"Test.sh"}; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = {"std.out", "std.err"}; DataRequirements = { [ InputData = {"lfn:/grid/eumed/alaa/satimage"}; DataCatalogType = "DLI"; DataCatalog = "http://lfc.isabella.grnet.gr:8085"; ] } Rank = 0; ]
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 OutputSE • OutputSE = < string > • Represents the URI of the Storage Element where the user wants to store the output data. • Used by the WMS to find a CE being “close” to this SE and schedule the job there. • OutputSE = “gridit-se-01.cnaf.infn.it";
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 OutputData OutputData = { ] OutputFile = "dataset_2.out "; StorageElement = "se001.cnaf.infn.it"; ], [ OutputFile = "cms/dataset_3.out"; StorageElement = "se012.to.infn.it"; LogicalFileName = "lfn:/cms/outfile1"; ]. [ OutputFile = "dataset_4.out "; ] }; • Automatic upload and registration to the Replica Catalog of datasets produced by the job on the WN • it is possible to indicate for each output file the LFN to be used for registration and the SE on which the file has to be uploaded. • OutputData is a list of classads where each classad contains the following three attributes: • − OutputFile • − StorageElement • − LogicalFileName
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 OutputSE [cesini@lcg-ui DataReq]$ cat output-se.jdl ##################################### # JDL with OutputSE Requirements # ##################################### Executable = "test.sh"; Arguments = "fileA fileB"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"test.sh", "fileA", "fileB"}; OutputSandbox = {"std.out", "std.err"}; OutputSE = ”sunstorm.cnaf.infn.it"; [cesini@lcg-ui DataReq]$ glite-wms-job-list-match -a output-se.jdl Connecting to the service https://glite-rb-00.cnaf.infn.it:7443/glite_wms_wmproxy_server ================================================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* - gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-cert ==================================================================
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 DataRequirements [cesini@lcg-ui DataReq]$ cat data-req.jdl ################################## # JDL with Data Requirements # ################################## Executable = "calc-pi.sh"; # Arguments is the number of digits, must be < 1000000 Arguments = "1000"; StdOutput = "std.out"; StdError = "std.err"; Prologue = "prologue.sh"; InputSandbox = {"calc-pi.sh", "fileA", "fileB"}; OutputSandbox = {"std.out", "std.err","out-PI.txt","out-e.txt“,”prologue.sh”}; Requirements = other.GlueCEInfoHostName != "spacin-ce1.dma.unina.it"; DataRequirements = { [ DataCatalogType = "DLI"; DataCatalog = "http://lfcserver.cnaf.infn.it:8085"; InputData = {"lfn:/grid/infngrid/cesini/PI_1M.txt", "lfn:/grid/infngrid/cesini/e-2M.txt"}; ] }; [cesini@lcg-ui DataReq]$ lcg-lr --vo infngrid lfn:/grid/infngrid/cesini/e-2M.txt Shows available replicas [cesini@lcg-ui DataReq]$ glite-wms-job-list-match -a -c ../wms_rb00.conf data-req.jdl Connecting to the service https://glite-rb-00.cnaf.infn.it:7443/glite_wms_wmproxy_server ==================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* - t2-ce-01.mi.infn.it:2119/jobmanager-lcgpbs-cert - gridba2.ba.infn.it:2119/jobmanager-lcgpbs-cert - prod-ce-01.pd.infn.it:2119/jobmanager-lcglsf-cert ….
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 Outline • Summary • JDL Advanced Attributes • Multi node Jobs • More JDL attributes and UI client options • Direct submission to CREAM CE
WMS Supported Job Types Batch-like DAG workflow Collection Parametric MPI GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 - 44 compound k
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 - 45 JobType and Type • JobType = <“string”> • Normal - A simple job • Parametric – A series of jobs depending on a parameter • Type = <“string”> • DAG – A Directed Acyclic Graph of jobs • Collection – A flat DAG
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 - 46 DAG JOBS • Type = "dag”; • ADAG(directed acyclic graph) represents a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs. • The jobs are nodes in the graph and the edges identify the dependencies.
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 - 47 Nodes • Nodes = < classads > • It is the core of the DAG description and it is used for specifying the nodes and their dependencies. • nodes = [ • a = [ • description = [ • JobType = “Normal”; • Executable = “a.exe”; • InputSandbox = {…}; • ]; • ]; • b = [ • file=node_b,jdl; • ]; • … • ]; Node “a” The classad contains a JDL wich describes the node • max_nodes_running = < positive integer > • Sets the number of maximum number of nodes that DAGMAN can submit to CEs at a given time
DAG Dependencies GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 - 48 • dependencies = < list of string lists > • describe the nodes dependencies • the strings are the node names • format:{ { a, b }, { a, c }, { a, d } , { {a,b,c}, e } } • dependencies = { • { a, b };// means that “b” cannot start before “a” has completed its execution successfully • { {a,b,c}, e };// node “e” depends from node ”a”,“b”.“c” • }; dependencies = { { nodeA, nodeB }, { nodeA, nodeC }, {nodeA, mynode }, { { nodeB, nodeC, mynode }, nodeD } ;
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 - 49 DAG example nodeD = [ description = [ JobType = "Normal"; Executable = "test.sh"; Arguments = "isb_nodeD"; InputSandbox = {"isb_nodeD","test.sh"}; ];]; nodeC = [ description = [ JobType = "Normal"; Executable = "test.sh"; Arguments = "isb_nodeC"; InputSandbox = {"isb_nodeC","test.sh"}; ];]; nodeB = [ description = [ JobType = "Normal"; Executable = "test.sh"; Arguments = "isb_nodeB"; InputSandbox = {"isb_nodeB",root.InputSandbox}; ];];]; dependencies = { { nodeA, nodeB }, { nodeA, nodeC }, {nodeA, mynode }, { { nodeB, nodeC, mynode }, nodeD }}; ]; [cesini@lcg-ui dag]$ cat dag.jdl ################################# # Example of a simple DAG jod # ################################# [ Type = "dag"; InputSandbox = {"test.sh"}; nodes = [ nodeA = [ description = [ JobType = "Normal"; Executable = "test.sh"; InputSandbox = {"isb_nodeA","test.sh"}; Arguments = "isb_nodeA"; ]; ]; mynode = [ description = [ JobType = "Normal"; Executable = "test.sh"; InputSandbox = {"isb_nodeMYNODE","test.sh"}; Arguments = "isb_nodeMYNODE"; ]; ];
GridSchool, INFN-CNAF-Bologna, 21-24 Feb 2011 - 50 DAG InputSandbox • All nodes that do not contain the InputSandbox and/or the InputSandboxBaseURI attributes in their descriptions inherit the value of these attributes from the one specified for the DAG. • “shared sandbox”, i.e. a sandbox that is common to multiple jobs (some of the nodes of the DAG) and that needs to be transferred on the WMS node only once. [ Type = "dag”; InputSandbox = { "/tmp/foo/*.exe", "/home/gliteuser/bar", "gsiftp://neo.datamat.it:5678/tmp/cms_sim.exe ", "file:///tmp/myconf" }; InputSandboxBaseURI = "gsiftp://matrix.datamat.it:5432/tmp"; nodes = [ nodeA = [ description = [ JobType = "Normal"; Executable = "a.exe"; InputSandbox = { "/home/data/myfile.txt", root.InputSandbox }; ]; ]; nodeC = [ file = "/home/test/c.jdl"; ]; mynode = [ description = [ JobType = "Normal"; Executable = "b.exe"; OutputSandbox = {"myoutput.txt", "myerror.txt" }; OutputSandboxDestURI = "gsiftp://neo.datamat.it:5432/tmp"; ]; ]; nodeD = [ description = [ JobType = "Normal"; Executable = "b.exe"; InputSandbox = { "file:///home/pippo", root.nodes.mynode.description.OutputSandbox[0] }; ]; ]; nodeB = [ file = "foo.jdl"; node_retry_count = 2; ]; ]; dependencies = { { nodeA, nodeB }, { nodeA, nodeC }, {nodeA, mynode }, { { nodeB, nodeC, mynode }, nodeD } }; ];