350 likes | 546 Views
Scheduling Jobs on IDN Grid End User Training. Objectives. LSF and Platform Process Manager Terms Calendars Define a Flow Exceptions Run a Flow Control a Flow Command Line Quick Reference In the IDN Grid Environment Documentation. What is LSF?. L oad S haring F acility
E N D
Scheduling Jobs on IDN Grid End User Training
Objectives • LSF and Platform Process Manager • Terms • Calendars • Define a Flow • Exceptions • Run a Flow • Control a Flow • Command Line Quick Reference • In the IDN Grid Environment • Documentation LSF Scheduling
What is LSF? • Load Sharing Facility • Distributes work across existing heterogeneous IT resources • Creates a shared, scalable, and fault-tolerant infrastructure • Delivers more consistent, reliable workload performance while reducing cost • Developed and maintained by Platform Computing (now acquired by IBM) • Provides a resource management framework that takes your job requirements, finds the best resources to run the job, and monitors its progress • Jobs always run according to host load and site policies • With some effort (coding or change in settings) most jobs can be submitted to the Grid IDN Grid Cluster LSF Scheduling
What is Platform Process Manager? • A workload management tool that allows automation of business processes in UNIX and Windows environments • Provides flexible scheduling capabilities • Provides load balancing in an extensible, robust execution environment • Comprises of three client applications on (Windows and UNIX) • Process Manager Designer: Flow Editor • Calendar Editor • Flow Manager • A server application • Process Manager Server on UNIX • Scheduling interface between the client applications and the execution agent, Process Manager LSF Scheduling
Installing Process Manager Client Software on Windows • Installation • IDN Portal Documentation SAS Resources LSF Documentation and Client Software Process Manager Client Software • Follow instructions in the Platform Process Manager Client Installation and Verification.docx • Access • UNIX access to IMGRID is required • IMGRID credentials will be authenticated by LSF when using submitting or scheduling jobs • Environment setup • If you are using ksh or bash shells add the following commands to your .profile . /idn/sas/LSF/conf/profile.lsf . /idn/sas/JS/conf/profile.js Note: Notice the space between “.” and the /command. • If you are using csh shell add the following commands to your .cshrc source /idn/sas/LSF/conf/cshrc.lsf source /idn/sas/JS/conf/cshrc.js LSF Scheduling
Process Manager Terms 1/4 • Jobs • Program or command that is scheduled to run in a specific environment • Have attributes specifying its scheduling and execution requirements • Example: Extracting data from IQAX, Check if data in a IQAX table is refreshed • Dependencies • Order in which something happens within a flow • Shown as a line with a arrow • Indicates dependency between flow elements • Job dependencies • Dependency that a job (or job array or subflow) has on the completion of a predecessor job • dependency can control a job’s execution upon the completion, failure, or startup of other jobs • Example: Execution of a SAS job dependent on availability of file on Mainframe (defined via Event trigger) • Job arrays • Group of homogeneous jobs that share the same executable and resource requirements, but have different input files • Helps with submitting, controlling and monitoring all of the jobs in the array as a single unit • Each job submitted from a job array shares the same job ID • Example: Load data for three markets – Asia, Europe and US from input files mkt1, mkt2, mkt3 LSF Scheduling
Process Manager Terms 2/4 • Job submission script • Shell script or a batch file, which you can define to submit a job • Job array submission script • group of submission scripts— that share the same executable and resource requirements, but have different input files, for example script1, script2, script3, and so on • Manual Jobs • Place-holder in a flow where some manual activity must take place • Successors cannot run until the manual job is explicitly completed • Flow definitions • Container for a group of related jobs • Describes both jobs and their relationships to each other, dependencies jobs have on files or dates & times • Can be stored locally on your own machine, or within a shared file system – as a .XML file • Can see and import flow definitions created by other users but cannot control unless with administrative authority • Flows • Particular occurrence of a flow definition created when the flow definition is triggered • Process Manager assigns each occurrence a unique ID called the flow ID LSF Scheduling
Process Manager Terms 3/4 • Adhocflows • Run directly by the Process Manager Server without the server saving a copy of the flow definition • Run directly from the Flow Editor • Subflows • Flow definition imbedded within another flow definition • Simple method to share and reuse common routines • Events • Change or occurrence in the system (creation of a specific file, a prior job completing with a particular exit code, or arrival of a file at a particular date and time) that can be used to trigger a flow or one or more jobs within a flow • Time events - points of time (defined by calendars and time expressions) that can be used to trigger the scheduling of jobs • File events - changes in a file’s status • Proxy events - events used to represent another flow or a work item that runs within another flow • Link events - events used to consolidate the output of other events • Exceptions • specific error condition that is detected when a job does not process as expected • Exception Handlers • Function used to respond when an exception occurs • Jobs or Flow can be defined as exception handlers • Process Manager’s Built-in exception handles can be used (Kill, Rerun, Alarms) LSF Scheduling
Process Manager Terms 4/4 • Calendars • Consists of sequence of days on which the calendar is considered valid • Job is scheduled when the calendar is valid and a time of day specification is met • Calendars are defined and manipulated independently of jobs so multiple jobs and flows can share the same calendar • Each user can maintain a private set of calendars, or use the calendars defined as system calendars. • If a calendar is changed, jobs associated with the calendar will automatically run according to the new definition • Calendars are stored within Process Manager’s private storage, and cannot be stored locally or edited outside of the Calendar Editor • File Naming conventions • Are dependent on the Operating Systems (UNIX, Windows) LSF Scheduling
Flow Definitions and Flows 1/2 • Flow Definition • Collection of Process Manager work items (jobs, job arrays and subflows) and their relationships • Defined graphically in the Flow Editor • What can I do with a flow definition? • Submit and run the flow immediately, where the definition of the flow is not stored in the Process Manager system. Process Manager is only aware of the specific, adhoc occurrence of the flow. • Submit a flow definition to be triggered manually at a later time. • Submit a flow definition to run on a recurring basis, on a particular schedule. • Submit a flow definition to run when a file reaches a particular state. • Define specific routines as individual flow definitions, so that each can be reused like a subroutine within other flow definitions. • Set exit conditions on a flow definition that contains multiple branches, so that completion of any single branch constitutes completion of the flow, or require that all branches complete before the flow is complete. • Imbed a flow definition as a subflow within another flow definition. • Use a flow to handle an exception in another flow. • What can I do with a flow? • Kill, suspend, or resume an entire flow • Rerun a failed flow, starting at the first job that failed in each path through the flow or from any rerun starting points that you set in the flow LSF Scheduling
Flow Definitions and Flows 2/2 • What can I do with a job? • Kill a running job • Hold a waiting job in a running flow • Rerun a job in a completed flow • Where do I store my flow definitions? • Locally on your own computer, or on a shared file system • Flow definitions that perform common routines should be on a shared file systems so they can be shared with other users • A flow can be saved as a .XML file • File name should be reflective of the contents of the flow • Good practice is to prefix or suffix the flow name with a short project name or userID associated with the flow • What makes a flow Done? • Successful with a status of Done only when all jobs in the flow complete successfully • Failed with a status of Exit if any job in the flow fails • What happens if a job exits? • Under the default behavior no additional jobs in the flow are dispatched • Any currently running jobs will continue running until they complete. • If you do not want the flow to exit if a job fails, specify exit condition for flow and handle the exit condition explicitly • How does Process Manager know when my flow is complete? • Based on the job completion attributes you define for the flow • Default behavior : Every work item in the flow has to complete successfully before the flow is considered complete LSF Scheduling
Process Manager Calendars • Create calendars that define dates on which you want some action to take place – using Calendar Editor • Required to create time events to trigger flows or dispatch jobs at a particular time • Using the Process Manager Calendar Editor Client • On your desktop: All Programs Platform Computing Process Manager Calendar Editor • From UNIX: /idn/sas/JS/8/bin/caleditor • Use a system calendar: CalendarName@Sys • Use a calendar defined by other users: CalendarName@userid • Define a new calendar • Clicking on dates, Specifying a pattern, Combining calendars • Defining expressions • An existing calendar can be edited or deleted LSF Scheduling
Define a Flow • Use Flow Editor to create or edit flow definitions • Group related jobs, job arrays and subflows, so that they can be triggered, run, and controlled as a unit • Ways to create a flow definition • Completely define one job at a time • Draw all of the work items in the flow and then fill in the details • A combination of the above methods—draw some of the work items and define them, then draw more work items and define them, and so on • Features that are worth reviewing and will be useful when defining flows are: • Details of a job • Job Dependencies • Flow completion attributes – define a description of the flow, notification preferences • Exception Handling • Flow Attributes • Saving Flow definition • Loop a flow or subflow • Process Manager Exceptions LSF Scheduling
Define a Flow - Flow Attributes • Some flow attributes worth considering when defining a flow • A description of the flow • Email notification about the flow • By default, Process Manager notifies you by email only if your flow exits. You can set the notification options to send an email to you or another user when the flow: • Exits • Ends, regardless of its success • Starts • Starts and exits • Starts and ends, regardless of its success • You can also turn off flow email notification entirely. • Preventing concurrent versions of the same flow • Useful when you need to run a flow repeatedly, but any occurrence of the flow must have exclusive access to a database, for example. LSF Scheduling
Process Manager Exceptions • Failure of a job to process is indicated by an exception • PM let you define what to do when these failures occur • Provides built-in exception handlers to automate the recovery process, • Provides an alarm facility you can use to notify people of particular failures • Process Manager monitors for the following exceptions • Misschedule • Occurs when a job, job array, flow or subflow depends on a time event, but is unable to start during the duration of that event • Reasons can be many. For example, a dependency specified was not satisfied while the time event was active • Overrun • Occurs when a job, job array, flow or subflow exceeds its maximum allowable runtime • Use this exception to detect run away or hung jobs • Underrun • Occurs when a job, job array, flow or subflow finishes sooner than its minimum expected run time • Start Failed • Occurs when a job or job array is unable to run because its execution environment could not be set up properly • Typical reasons for this exception include lack of system resources such as a process table was full on the execution host, or a file system was not mounted properly • Cannot Run • Occurs when a job or job array cannot proceed because of an error in submission • Typical reason for this exception might be an invalid job parameter LSF Scheduling
Running a Flow – Triggering a Flow • Triggering a flow is the act of telling Process Manager to take a flow definition and create a flow from it • Manual Triggers • Flow that can be run more than once, no schedule by which the flow should be run • Implicitly when you run a flow immediately from the Flow Editor • Explicitly trigger any submitted flow from within the Flow Manager at any time • Automatic Triggers • Using a time event, which triggers it at a certain time on the specified dates • Using a file event, which triggers it when a certain file condition occurs • Using a proxy event, which triggers it when another flow, or work item within another flow reaches a certain state • Using an exception event, which triggers it when another flow generates a specific exception LSF Scheduling
Running a Flow - Scheduling • Schedule a flow using an event • Run at a particular date and time, when a file arrives, or a combination of these. • Time driven • In flow definition define a time event as trigger • On command line: jsub-T time_event flow_file_name • Run at specific time • Run multiple times on a single date • By specifying time expressions • Based on file activity • If a file exists or does not exist, upon arrival of a file, some time • In flow definition define a time event as trigger • On command line: jsub-F “file_event” flow_file_name jsub-F "arrival(/tmp/*.tar)" testflow.xml • Based on another flow • When another flow completes • When a proxy job completes • From the command line—trigger when job fails: jsub-p “job(exit(userid:testflow:J2))” LSF Scheduling
Running a Flow – Submitting a Flow Definition • Until submitted, Process Manager is not aware of the flow • Submitting places a flow under the control of Process Manager • Once Process Manager determines when the flow is to run, and triggers it as appropriate. • Save the flow before submitting it • Submitting with versioning comments is optional but a good practice and recommended • Easier to track different versions of the flow • To submit a flow, From the Action menu, select Submit • When submitted successfully, you will receive a confirmation message with the version number of the flow • New flows are submitted as version 1.0. • After submitting the flow to Process Manager, it is not published by default. • To publish a flow, run the Flow Manager as an administrator, right-click the flow, and select Publish • To unpublish the flow, right -click the published flow and select Unpublish LSF Scheduling
Controlling a Flow 1/6 • Copy of the flow definition is stored within the Process Manager system • When flow definition is scheduled or submitted to be triggered manually • Flow can be triggered at any time once the Process Manager know the definition • Can be triggered using command line (jsub) or from Flow Manager • Flow is created when flow definition is • Triggered manually • Run (Run Now) • Triggered automatically via an event LSF Scheduling
Controlling a Flow 2/6 • Using Flow Manager (Process Manager Server must be running) • View the status of flows, jobs, job arrays and subflows that are currently in the system • Trigger a flow • Place a flow definition on hold, or release it from hold • Kill, suspend, resume or rerun a flow • Kill, run or rerun a job • Force a job complete • Flow Manager User interface • The left-hand pane controls the flow data that is displayed in the right-hand pane. You can look at the data in the following views: • By User—flow definitions and flows are sorted by user ID • By State—flows are sorted by their current state • By Event—flows are sorted by their triggering events • When a view is selected, the right-hand pane shows the graphical illustration of the currently selected flow definition or flow. • The left-hand pane also contains an optional legend, which displays the meaning of each of the states you may see in the left pane. • You can also view the following information by selecting it in the left-hand pane: • Alarms—the current list of alarms that have been opened • Manual jobs—the list of manual jobs requiring acknowledgement LSF Scheduling
Controlling a Flow 3/6 • What can be done with the Flow Manager? • Real-time data - The data displayed in the Flow Manager is intended to reflect real-time status of the flows in the system. The Flow Manager display is set to refresh automatically every 5 minutes. • Refresh the data displayed manually • You can change the automatic refresh value • Print data - from any view in the Flow Manager, you can print the data displayed. • Filter data displayed in tree view • Limit the flows displayed to those owned by a user • Limit the flows displayed to last x hours • Limit the flows displayed to a time period • Trigger a Flow • jtriggerflow_definition_name • View Flow Definition and Statistics • View inter-flow relationships • Determine the status of jobs in a flow • By the text displayed when you place your mouse over the work item icon LSF Scheduling
Controlling a Flow 4/6 • By the state shown in the Runtime Attributes dialog • By the state shown in the Runtime Attributes dialog LSF Scheduling
Controlling a Flow 5/6 • Kill a running job • From the command line: jjob-i flow_id -k flow_name[:subflow_name]:job_name • Run or rerun a single job • To debug a flow, or to run a single job to fix a flow • From the command line: jjob-i flow_id -r flow_name[:subflow_name]:job_name • Mark a job complete • You can mark a job complete without actually running the job • Used when you want a flow to continue running, even though the job failed or did not run. • Marking a job complete does not actually run the job—it just changes its state. • You mark a job complete so that its successor jobs can run when you rerun the flow. • You can only complete a job in a flow that has exited. • From the command line: jjob-i flow_ID -c flow_name[:subflow_name]:job_name • Work with manual jobs • View the manual jobs awaiting for completion (command line: jmanuals) • Complete a manual job (command line: jcomplete) • Work with proxies - proxies are used to represent work items that run within another flow, or to represent another flow. • See if any proxies of a work item exist • See a list of those work items that depend on them • To determine the impact that a work item has on other flows • Locate a proxy dependant • Manually complete a proxy dependency • View the inter-flow relationships established by defining proxies, using the global view. LSF Scheduling
Controlling a Flow 6/6 • Kill a running flow (command line: jkill flow_id) • Suspend a running flow (command line: jstop flow_id) • Resume a suspended flow (command line: jresume flow_id) • Rerun an exited flow (command line: jrerun flow_id) • Provided that the flow was not killed • If you need to rerun a flow that was killed, retrigger the flow • Rerun an exited job array • Hold a flow definition (command line: jhold flow_name) • Releasing a flow definition from hold (command line: jrelease flow_name) • View a flow definition and Statistics • Remove a flow definition (command line: jremove flow_name) LSF Scheduling
Command Line Quick Reference 1/2 • Calendar commands • caleditor—to start the Calendar Editor graphical user interface • jcadd—to create a calendar • jcals—to display a list of calendars • jcdel—to delete a calendar • jcmod—to edit a calendar • Flow definition and execution commands • floweditor—to start the Flow Editor graphical user interface • jrun—to submit and run a flow immediately, without storing the flow definition in Process Manager • jsub—to submit a flow definition to Process Manager • jtrigger—to trigger the creation of a flow • jhold—to place a flow definition on hold, preventing automatic triggering of the flow • jrelease—to release a flow definition from hold, enabling automatic triggering of the flow • jdefs—to display information about flow definitions • jremove—to remove a flow definition from Process Manager LSF Scheduling
Command Line Quick Reference 2/2 • Flow monitor and control commands • flowmanager—to start the Flow Manager graphical user interface • jalarms—to list open alarms • jcomplete—to complete a manual job • jflows—to display information about a flow • jjob—to kill or run a job, or to mark a job complete • jkill—to kill a flow • jmanuals—to list all manual jobs waiting for completion • jrerun—to rerun an exited flow • jstop—to suspend a flow • jresume—to resume a suspended flow • Other commands • jid—to verify the connection between the Process Manager Client and the Process Manager Server • jhist—to view the historic information about server, flow definitions, flows, and jobs. NoteFor a detailed description of commands please see Chapter 8 in the ppm_using.pdf. Location of the guide is given in the Documentation section below. LSF Scheduling
In the IDN Grid environment Migrating cron jobs to LSF • What is Cron? • Time-based job scheduler in Unix-like computer operating systems • Reason for disabling cron in IDN • Not Grid aware, available on one node only (IDNADHOC) • What is LSF? • Load balancing of resources, flow definition, flow management, scheduling, reporting • Advantages of using LSF / Process Manager • GUI interface for defining Job Flows • GUI and Command Line interfaces for running and monitoring flows • Reporting capabilities on executed jobs via Flow Manager • Authentication to SAS Metadata not required • Better fault tolerance and job exception management, if job or node fails, we can reschedule again on same node (or different node) • More advanced job dependencies • Cost efficient to develop & maintain, No special skills required • Capture best practices, ensure process repeatability and promote modularity & re-use • Promote collaboration LSF Scheduling
In the IDN Grid environment Instructions and Tips • Specific Instructions for defining cron jobs in the Flow Editor • Appropriate calendar is identified or defined • Define a “Time Event” trigger for the flow • Right click in a blank area in the flow designer select Flow Attribute –OR— Click on Action Add Flow Attribute • Triggering Events Add Select “Time Event” in the “Select type of event” drop down • Specify other details as required • Submit flow for scheduling • Click on Action Submit or Submit with comments • From command line “jsub myflowname.xml” • Note If you click on “Run Now” the flow will run immediately and only once. It will not be scheduled for repeated execution. • Tips for defining flows • Define a Trigger Event • Time Event - calendar for periodic scheduling • File Event - dependence on the existence of a file for example • Proxy Event- completion of another flow for example • Chose the queue if you want to submit to a queue other than the default queue (normal). • Set appropriate notifications options so you receive emails as desired • Note: See section “4.3.2 Where is the output from this job?” in the Submitting jobs to LSF.docx for more details about emails from LSF • Specify Flow completion attributes to ensure that the flow executes as desired LSF Scheduling
In the IDN Grid environment Mainframe support in Process Manager • How does it work? • Process Manager daemon supports mainframe by submitting an LSF proxy job which controls the FTP to the mainframe host • LSF proxy job (through FTP) submits, monitors, and retrieves output of mainframe job. • Mainframe jobs specify both mainframe and LSF details • Requirements • A valid z/OS mainframe user ID • Limitations • z/OS does not support suspending or resuming jobs • Job arrays for mainframe jobs are not supported • Using Mainframe • Copy the template file • From: C:\Program Files\Platform Computing\Platform LSF Process Manager\8.0\examples\ z/OS_Template.xml • To: C:\Program Files\Platform Computing\Platform LSF Process Manager\8.0\work\templates • Define your template job in Flow Editor • Status of jobs • Status of mainframe jobs is displayed in Flow Manager just like any other job • Killing a job (UNIX) • Mainframe can be killed regularly if you are on a UNIX platform For more details on defining the job and exit codes refer Platform Document: Using Process Manager - ppm_using.pdf on IDN SAS Wiki. LSF Scheduling
In the IDN Grid environment Scheduling using the jsub command • Flow definition (.XML file) has to exist in the IDN Grid environment • Define and save the flow using the UNIX Flow Editor client via the X-Windows interface • Command for invoking Flow Editor Client on UNIX is floweditor • Copy a flow file created on your PC using the Windows Flow Editor client to the UNIX environment • Example of jsub command: jsub flowname LSF Scheduling
In the IDN Grid environment Scheduling jobs via the Change Control Process • Define the flow with your own user ID • Specify the flow name in the properties file as under • * FLOWNM=projectname_flow.xml • Upload the flow file (.XML file) to SVN along with other code. Make sure the flow files reside in the project root folder in SVN • /trunk/SASGrid/yourproject • Specify the test and production IDs that will be used for scheduling and executing the flow in the properties file. Flow will be submitted / scheduled as: • * RUNAS_TEST ID in the Test (E2) environment • *RUNAS_PROD ID in the Production (E3) environment • Note*Specifying these properties is a requirement. For more information on the SAS Change Control process and the properties file see documentation on IDN Wiki SAS Resources SAS Change Control Documentation LSF Scheduling
In the IDN Grid environment Flows defined in SAS Management Console (SMC) • Flow defined in the SAS Management Console (SMC) can be saved and opened in the Flow Editor. To save a flow defined in SMC to.XML file: • Specify the Flow XML output directory to local folder – PC or UNIX depending on where you invoked SMC from. • Right Click on “Schedule Manager” in the left panel Options Platform Computing Scheduler Flow XML Output Directory • Schedule the Flow – this will save the flow as a XML file in the location defined in the previous step • Right Click on a flow under Schedule Manager Schedule Flow Options New Time Event • For already scheduled flows, reschedule the flow • Saved XML file can be opened in the Flow Editor LSF Scheduling
Documentation and Help • Location of documentation on the IDN Wiki • IDN Portal Documentation SAS Resources LSF Usage Resources LSF Documentation • Link: https://idnoae.idn.aexp.com/idnwiki/Wiki.jsp?page=Public__LSFDocumentation • FAQs • Documents Available • IDN Deck - LSF scheduling.pptx • IDN Document - LSF scheduling.docx (FAQs in Section 12.1) • Platform Document: Using Process Manager - ppm_using.pdf • Platform Document: Guide to Using Templates - ppm_using_templates.pdf • For questions, issues, concerns • Open a ticket to the IDN Service Desk https://idnservicedesk.idn.aexp.com/Scripts/Texcel/ServiceWise/CLogin.dll LSF Scheduling
Thank you! • Questions? LSF Scheduling