310 likes | 433 Views
gLite job submission. Fokke Dijkstra Donald Smits Centre for Information Technology, University of Groningen Utrecht, Grid Tutorial 2008. Introduction. ?. LCG File Catalog (LFC). Information System (BDII). JDL. User Interface (UI). Workload Management System (WMS). Storage
E N D
gLite job submission Fokke Dijkstra Donald Smits Centre for Information Technology, University of Groningen Utrecht, Grid Tutorial 2008
Introduction ? gLite job submission
LCG File Catalog (LFC) Information System (BDII) JDL User Interface (UI) Workload Management System (WMS) Storage Element (SE) Computing Element (CE) Components in the EGEE Grid gLite job submission
Workload Management System • Tasks of the WMS • Find the best resource for your tasks (jobs) • Submit jobs to compute resources • Logging and book keeping • Delegated Grid credential management gLite job submission
Job preparation • You need to provide • A complete (enough) job description • What program? • What data? • Any requirements on OS, installed software, ?? • Possibly a program • You’re submitting in unknown territory! • Program portably! • Don’t rely on hard-coded paths or special locations • The program you send may not even be in $HOME! • Perhaps some input data • Perhaps instructions on what to do with the output gLite job submission
How to Write a Job Description • Here is a minimal job description (call it hello.jdl) • We specified • The program to run and its arguments • Directed the standard error and output streams to files • Told it what to do with the output Executable = “/bin/echo”;Arguments = “Goedemiddag”;StdError = “stderr.log”;StdOutput = “stdout.log”;OutputSandbox = {“stderr.log”, “stdout.log”}; gLite job submission
Job Submission Example • User issues a voms-proxy-init • enters his certificate’s password • Receives a valid Globus proxy • User issues a: glite-wms-job-submit -a mytest.jdl and gets back from the system a unique Job Identifier (JobId) • User issues a: glite-wms-job-status JobId to get logging information about the current status of his Job • When the “Done” status is reached, the user can issue a glite-wms-job-output JobId and the system returns the name of the temporary directory where the job output can be found on the UI machine. gLite job submission
Submitting it $ voms-proxy-init --voms tutor Cannot find file or dir: /admins/fokke/.glite/vomses Enter GRID pass phrase: Your identity: /O=dutchgrid/O=users/O=rug/OU=rc/CN=Fokke Dijkstra Creating temporary proxy ........................................... Done Contacting voms.grid.sara.nl:30007 [/O=dutchgrid/O=hosts/OU=sara.nl/CN=voms.grid.sara.nl] "tutor" Done Creating proxy ................................................ Done Your proxy is valid until Wed Nov 5 23:11:27 2008 $ glite-wms-job-submit -a hello.jdl Connecting to the service https://wms.grid.sara.nl:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://wms.grid.sara.nl:9000/V7pw7lTR4MeFMVAz12larQ ========================================================================== JobId gLite job submission
waiting UI JDL ready Job + Input sandbox scheduled Job + Input sandbox A Job Submission Example LCG File Catalog (LFC) Information System (IS) Job Status submitted User Interface (UI) Workload Management System (WMS) Storage Element (SE) Computing Element (CE) gLite job submission
Checking the status $ glite-wms-job-status https://wms.grid.sara.nl:9000/V7pw7lTR4MeFMVAz12larQ ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://wms.grid.sara.nl:9000/V7pw7lTR4MeFMVAz12larQ Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: ce.grid.rug.nl:2119/jobmanager-pbs-long Submitted: Wed Nov 5 11:12:15 2008 CET ************************************************************* gLite job submission
Check status using browser gLite job submission
waiting UI JDL ready scheduled running done Output Sandbox A Job Submission Example LCG File Catalog (LFC) Information System (IS) Job Status submitted User Interface (UI) Workload Management System (WMS) Storage Element (SE) Computing Element (CE) gLite job submission
Getting the Output $ glite-wms-job-output https://wms.grid.sara.nl:9000/V7pw7lTR4MeFMVAz12larQ Connecting to the service https://wms.grid.sara.nl:7443/glite_wms_wmproxy_server ============================================================================== JOB GET OUTPUT OUTCOME Output sandbox files for the job: https://wms.grid.sara.nl:9000/V7pw7lTR4MeFMVAz12larQ have been successfully retrieved and stored in the directory: /tmp/jobOutput/fokke_V7pw7lTR4MeFMVAz12larQ =============================================================================== $ cat /tmp/jobOutput/fokke_V7pw7lTR4MeFMVAz12larQ Goedemiddag gLite job submission
UI JDL Output Sandbox cleared A Job Submission Example LCG File Catalog (LFC) Information System (IS) Job Status submitted waiting User Interface (UI) ready Workload Management System (WMS) scheduled Storage Element (SE) running done Computing Element (CE) gLite job submission
Job Description Language • Job Description Language based on Classified Advertisement language • Lines: • Attribute = expression; • Can be multiple lines, semicolon is separator • ”for strings” • # and // for comments • No blanks after ; !! gLite job submission
Types of Attributes • The supported attributes are grouped in two categories: • Job Define the job itself • Resources • Taken into account by the WMS for carrying out the matchmaking algorithm • Computing Resource (Attributes) Used to build expressions of Requirements and/or Rank attributes by the user Have to be prefixed with “other.” • Data and Storage resources (Attributes) Input data to process, SE where to store output data, protocols spoken by application when accessing SEs gLite job submission
Job Definition Attributes • Executable (mandatory) • The command name • Arguments (optional) • Job command line arguments • StdInput, StdOutput, StdErr (optional) • Standard input/output/error of the job • Environment (optional) • List of environment settings • InputSandbox (optional) • List of files on the UI local disk needed by the job for running • The listed files are staged from the UI to the remote CE • Wildcards allowed • Unique filenames required • OutputSandbox (optional) • List of files, generated by the job, which have to be retrieved gLite job submission
Resource Attributes • Requirements • Job requirements on computing resources • Specified using attributes of resources published in the Information System • other.GlueCEStateStatus == "Production" always included (the resource has to be in the Production grid) • Useful requirements: • Wallclock time and specific sites: Requirements = other.GlueCEPolicyMaxWallClockTime > 720 && RegExp(“nikhef.nl", other.GlueCEUniqueID); • Specific tag published: Requirements = Member("VO-ncf-gromacs-3.3.2",other.GlueHostApplicationSoftwareRunTimeEnvironment); • Logical expressions: • &&: and • ||: or • !: not gLite job submission
Data Attributes • InputData (optional) • Refers to data used as input by the job: these data are published in the Replica Catalog and stored in the SEs) • GUIDs and/or LFNs • Job must be sent to CE that has the data nearby • DataAccessProtocol (mandatory if InputData specified) • The protocol or the list of protocols which the application is able to speak with for accessing InputData on a given SE gLite job submission
WMS match making and ranking • The WMS has to find the best suitable CE where the job will be executed • It interacts with Data Management service and Information System • The CE chosen has to match the job requirements • If 2 or more CEs satisfy all the requirements, the one with the best Rank is chosen • Specified using attributes of resources published in the Information Service • If not specified, default value is used: Rank = -other.GlueCEStateEstimatedResponseTime; quickest response time gLite job submission
Example JDL File Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“/home/joda/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; InputData = “lfn:/grid/tutor/testbed0-00019”; DataAccessProtocol = “gridftp”; Requirements = other.Architecture==“INTEL” && \ other.OpSys==“CentOS” && other.FreeCpus >=4; Rank = “other.GlueHostBenchmarkSF00”; gLite job submission
Job Submission • glite-wms-job-submit [-a] [-d <delegationid>] [-o <output file>] <job.jdl> -o the generated jobId is written in the <output file> • Useful for other commands, e.g.:glite-wms-job-status –i <input file> (or jobId) -i the status information about edg_jobId contained in the <input file> are displayed -a use automatic delegation -d use an existing delegated proxy at the WMS e.g. one generated using: glite-wms-job-delegate-proxy –d <delegationid> gLite job submission
Other WMS UI Commands • glite-wms-job-list-match Lists resources matching a job description Performs the matchmaking without submitting the job • glite-wms-job-cancel Cancels a given job • glite-wms-job-status Displays the status of the job • glite-wms-job-output Returns the job-output (the OutputSandbox files) to the user • glite-wms-job-logging-info Displays logging information about submitted jobs (all the events “pushed” by the various components of the WMS) Very useful for debug purposes gLite job submission
Proxy Renewal • Why? • To avoid job failure because it outlived the validity of the initial proxy • To prevent long term proxies from lying around • Use a safe system for storing long term proxies • WMS support automatic proxy renewal mechanism as long as the user credentials are handled by a proxy server. • Create a proxy using voms-proxy-init --voms <voname> • Register this proxy with the MyProxy server using myproxy-init -s <server> [-t <cred> -c <proxy>] -d -n server is the server address (e.g. px.matrix.sara.nl) cred is the number of hours the proxy should be valid on the server proxy is the number of hours renewed proxies should be valid • The Proxy is automatic renewed by WMS without user intervention for all the job life gLite job submission
Advanced Job types: Job Collection • Set of independent jobs • Collect jobs in single directory: glite-wms-job-submit -a --collection <directory> • Advanced collection using global set of attributes [ Type = "Collection"; InputSandbox = {"myjob.exe", "fileA"}; OutputSandboxBaseDestURI = "gsiftp://lxb0707.cern.ch/data/doe"; DefaultNodeShallowRetryCount = 5; Nodes = { [ Executable = "myjob.exe"; InputSandbox = {root.InputSandbox, "fileB"}; OutputSandbox = {"myoutput1.txt"}; Requirements = other.GlueCEPolicyMaxWallClockTime > 1440; ], [ NodeName = "mysubjob"; Executable = "myjob.exe"; OutputSandbox = {"myoutput2.txt"}; ShallowRetryCount = 3; ], [ File = "/home/doe/test.jdl"; ] } ] gLite job submission
Advanced Job types: Parametric • Identical jobs, except the value of a parameter • Parameters • List of items • Number • ParameterStart and ParameterStep necessary • _PARAM_ replaced by parameter value [ JobType = "Parametric"; Executable = "myjob.exe"; StdInput = "input_PARAM_.txt"; StdOutput = "output_PARAM_.txt"; StdError = "error_PARAM_.txt"; Parameters = 100; ParameterStart = 1; ParameterStep = 1; InputSandbox = {"myjob.exe", "input_PARAM_.txt“}; OutputSandbox = {"output_PARAM_.txt", "error_PARAM_.txt"}; ] gLite job submission
Advanced job types: MPI • For programs using the MPI parallel library • JobType=”MPICH” • NodeNumber = <n> • Request n cores on the remote cluster • Scheduling is determined at remote site • Submit script that starts up your program using MPI • You can use mpi-start for this • Scheduling SMP nodes not yet possible gLite job submission
nodeA nodeB nodeC mynode nodeD Other Advanced Job types • Direct Acyclic Graph • Graph shows dependencies between jobs • Interactive • Opens graphical windowthat connects to job gLite job submission
Pilot jobs • Send agent to a site first • Will fetch workload from central service • Both single and multi user frameworks exist • Advantages • Hides problematic sites from the user • Probably less overhead per workload • Makes central scheduling possible • Disadvantage • Less efficient scheduling at site • Security concerns with multiple users gLite job submission
Send it with job Binary package Compile it on the fly Use software manager accounts Write permission on special shared directory Publish tags in information system Use preinstalled packages Only possible for special collaborations Example: VL-e software stack Advantages You will be able to run anywhere Disadvantages Portability extremely important, otherwise jobs will fail Overhead Advantages Software can be validated Central management takes burden from users Disadvantages Sites have to support this Lot of work for software manager Advantages Very easy for users Disadvantages Sites have to support it Lot of work for software packager How to get your program on the Grid? gLite job submission
Pointers to advanced topics • Can be found in gLite user guide: http://glite.web.cern.ch/glite/documentation/ • Advanced sandbox play • Gridftp instead of local files • No space on WMS needed • Brokerinfo • Information about local environment (CE, SEs, etc.) • Job perusal • Peek at the output while your job is running • Automatic retries • RetryCount • ShallowRetryCount gLite job submission