290 likes | 591 Views
Submitting and Monitoring jobs End User Training. Objectives. LSF and IDN Grid e nvironment Job cycle and Job states Submitting Jobs Monitoring Jobs Controlling Jobs Queues on IDN Grid Grid Enabled Display Manager Session Links to Documentation. What is LSF?.
E N D
Submitting and Monitoring jobs End User Training
Objectives • LSF and IDN Grid environment • Job cycle and Job states • Submitting Jobs • Monitoring Jobs • Controlling Jobs • Queues on IDN Grid • Grid Enabled Display Manager Session • Links to Documentation Submitting to LSF
What is LSF? • Load Sharing Facility • Distributes work across existing heterogeneous IT resources • Creates a shared, scalable, and fault-tolerant infrastructure • Delivers more consistent, reliable workload performance while reducing cost • Developed and maintained by Platform Computing (now acquired by IBM) • Provides a resource management framework that takes your job requirements, finds the best resources to run the job, and monitors its progress • Jobs always run according to host load and site policies • With some effort (coding or change in settings) most jobs can be submitted to the Grid IDN Grid Cluster Submitting to LSF
LSF Terms 1/2 • Cluster • Group of computers (hosts) running LSF that work together as a single unit, combining computing power, workload, and resources • Provides a single-system image for a network of computing resources • IDN Cluster Name: IDN_GRID • Note: Do not confuse this with IMGRID which is the login load balancer • Hosts • Master host: An LSF server host that acts as the overall coordinator for the cluster, doing all job scheduling and dispatch. • Server host: A host that submits and executes jobs. • Client host: A host that only submits jobs and tasks. • Execution host: A host that executes jobs and tasks. • Submission host: A host from which jobs and tasks are submitted. bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV sppma562.ipc.us.ae ok - 48 24 24 0 0 0 sppma563.ipc.us.ae ok - 96 22 22 0 0 0 sppma568.ipc.us.ae ok - 96 22 22 0 0 0 sppma569.ipc.us.ae ok - 96 27 27 0 0 0 sppma570.ipc.us.ae ok - 96 12 12 0 0 0 Submitting to LSF
LSF Terms 2/2 • Job • Unit of work run in the LSF system • Jobs can be complex problems, simulation scenarios, extensive calculations, or anything that needs compute power • Command submitted to LSF for execution. LSF schedules, controls, and tracks the job according to configured policies. • Job Slot • A job slot is a bucket into which a single unit of work is assigned in the LSF system • Queue • Cluster-wide container for jobs. Jobs wait in queues until they are scheduled and dispatched to hosts. • Do not correspond to individual hosts; each queue can use all server hosts in the cluster, or a configured subset of the server hosts. • No need to specify an execution host. LSF dispatches the job to the best available execution host in the cluster to run that job. • Queues implement different job scheduling and control policies • Queue names in the IDN Grid: normal, short, express, sasplus, legacy (refer to slide18 for additional details about queues) • Resources • Objects in your cluster available to run work. For example, machines, CPU slots, and licenses Submitting to LSF
IDN Grid EnvironmentArchitecture Active LSF version on IDN Grid: LSF 7.0.5 – version 7 update 5 Submitting to LSF
Jobs on LSF Submitting to LSF
Job Life Cycle 1/2 • Submit Job • From an LSF client or server with the bsub command • Submitted to the default queue “normal” if queue not specified • Jobs held in a queue waiting to be scheduled - have PEND state • LSF assigns each job a unique job ID • Can also assign a name to the job with the -J option of bsub. • Unlike the job ID, the job name is not necessarily unique. • Schedule job • LSF collects resource information and makes scheduling decisions • Dispatch job • Jobs are dispatched to hosts • Run job • Environment copied from submission host to execution host • Environment variables needed by the job • Working directory where the job begins running • Other system-dependent environment settings, for example on UNIX, resource limits and umask • Job runs under user account that submitted the job • Job has the status RUN Submitting to LSF
Job Life Cycle 2/2 • Return output • DONE status if the job was completed without any problems • EXIT status if errors prevented the job from completing • Send email to client • Job output, job error, and job information returned to the submission host through email • Use the -o and -e options of bsub to send job output and errors to a file Note: • By default sent to UNIXID@aexp.com • Explicitly specify EmailID@aexp.com to send output to your Outlook Inbox • create a .forward file in your home folder on UNIX that contains your email ID, i.e. EmailID@aexp.com • Job report is sent by email to the LSF client. Includes job information such as: • CPU use • Memory use • Name of the account that submitted the job • Job output • Errors Submitting to LSF
Job States • PEND — Waiting in a queue for scheduling and dispatch • RUN — Dispatched to a host and running • DONE — Finished normally with zero exit value • EXITED — Finished with non-zero exit value • PSUSP — Suspended while pending • USUSP — Suspended by user • SSUSP — Suspended by the LSF system • POST_DONE— Post-processing completed without errors • POST_ERR — Post-processing completed with errors • WAIT — Members of a chunk job that are waiting to run Submitting to LSF
Methods of Submitting to IDN Grid • Using LSF bsub command – preferred method • bsub /idn/app/bin/sas92 programname.sas • Note: When used by itself, the /idn/app/bin/sas92 wrapper script executes code locally on the server you are logged onto, but when used in conjunction with bsubthe job gets submitted to the Grid • Using the wrapper script that has SAS grid-enable statements bundled • /idn/app/bin/sas programname.sas & • Putting the SAS grid-enabling statements in your code yourself Submitting to LSF
Access and Environment setup • Access • UNIX access to IMGRID is required • IMGRID credentials will be authenticated by LSF when submitting or scheduling jobs • Environment setup • If you are using ksh or bash shells add the following commands to your .profile . /idn/sas/LSF/conf/profile.lsf . /idn/sas/JS/conf/profile.js Note: Notice the space between “.” and the /command. • If you are using csh shell add the following commands to your .cshrc source /idn/sas/LSF/conf/cshrc.lsf source /idn/sas/JS/conf/cshrc.js Submitting to LSF
Submitting jobsCommand: bsub 1/2 bsub is a command to submit a job in LSF % bsub /idn/app/bin/sas92 p.sas Job <114612> is submitted to default queue <normal>. % bsub uname -a Job <114619> is submitted to default queue <normal>. • Job is submitted to a queue • If no queue is specified, submitted to default queue. • Default queue name is normal on the IDN Grid • Job is dispatched to a server • LSF will choose the most available server for you Submitting to LSF
Submitting jobsCommand: bsub 2/2 • Commonly used bsub options Submitting to LSF
Wrapper scripts and bsub in the IDN environment • Do not use the /idn/app/bin/saswrapper script with the bsub command. • This will result in 2 jobs getting submitted to the grid. (1) The bsub command (2) the sas script which will in turn use the grid enabling code to submit the job. The sas wrapper script will be decommissioned when the new wrapper script will be ready for use. • Use the /idn/app/bin/sas92 wrapper script with the bsub command instead. • sas92 wrapper when used in conjunction with bsub will submit the job to the Grid. • nohupor & (ampersand) options are not necessary on the bsub command line • Unless you use these options for specific reasons • LSF and the command prompt is returned almost instantly after command is submitted • -b option can be used for one time scheduling only (at 7am on 15FEB) • NOT for periodic scheduling (every other day at 7am, etc.) Submitting to LSF
Monitoring jobsCommands: bjobs, bhist, bpeek • bjobs • Displays the current status of one or more jobs. • If used without any options, displays all of your own pending, running or suspended jobs. % bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 111763 userid DONE normal sppma562.ip sppma563.ip *Grid:9887 Jun 12 14:15 • Useful options for the bjobs command include: job_ID information about a particular job ID -a Displays information about all of your jobs, including those which finished recently -l Long format; multiple lines giving additional information -u userid Information about a particular user, identified by userid; "-u all" displays information on all jobs in the system -w Wide format; doesn't truncate host names • bhist • Displays historical information about jobs. –a Displays information about both finished and unfinished jobs. –l Most commonly used. Displays information in long format. -d Only displays information about finished jobs -e Only displays information about exited jobs • bpeek • Displays the stdout and stderr of a job while it is running Submitting to LSF
Controlling jobsCommands: bkill, bstop, bresume • bkill • Kill a running, pending or suspended job • Causes LSF to send SIGINT and SIGTERM to a job to give it a chance to clean up, then LSF sends SIGKILL to kill the job. • You can only kill your own jobs • bstop • Suspends a job by sending it the SIGSTOP signal • bresume • Resumes a suspended job by sending it the SIGCONT signal Submitting to LSF
Queue AttributesCommand: bqueues • Name, which uniquely identifies the queue • A list of the hosts which can run jobs from the queue • Limits on the number of hosts, jobs, users, processors, etc. • Standard UNIX limits such as memory, CPU, processes, etc. scheduling policy used, such as first-come-first-served, fairshare, and exclusive load-sharing thresholds • UNIX nice value, which determines the relative priority of the processes in the job • Command to list defined queues • bqueues Submitting to LSF
Queues in IDN environment • Priority - Specifies the relative queue priority for dispatching jobs. A higher value indicates a higher job dispatching priority, relative to other queues. • CPU time - Limits the total CPU time the job can use. Submitting to LSF
Grid enabled Display Manager Session Instructions for Grid enabling the SAS Display Manager (interactive SAS) Session • Get IP address of client (PC) • Get the IP address for your laptop • Start, Run “cmd”, Type “ipconfig” in the command window • write down the IP address for your laptop • This changes frequently • Set DISPLAY variable in UNIX • csh users type at the shell prompt – • setenvDISPLAY <IP>:0.0 • All others type • export DISPLAY=<IP>:0.0 • Replace <IP> with the IP for your laptop • This must be done for each session (window) • Submit to LSF • To start a DM session using the normal queue • bsub /idn/app/bin/sas92 • To start a DM session using a specific queue (legacy queue for example) • bsub –q legacy /idn/app/bin/sas92 Submitting to LSF
Common LSF commands Submitting to LSF
Documentation and Help • Location of documentation on the IDN Wiki • IDN Portal Documentation SAS Resources LSF Usage Resources LSF Documentation • Link: https://idnoae.idn.aexp.com/idnwiki/Wiki.jsp?page=Public__LSFDocumentation • FAQs • Documents Available • IDN document: Submitting Jobs to LSF.docx • LSF Users Guide from Platform Computing: lsf_users_guide.pdf • For questions, issues, concerns • Open a ticket to the IDN Service Desk https://idnservicedesk.idn.aexp.com/Scripts/Texcel/ServiceWise/CLogin.dll Submitting to LSF
Thank you! • Questions? Submitting to LSF