1 / 21

Never Lose a SAS Job

Never Lose a SAS Job. Not Again!!. Unexpected re-boot, system failures Long running job didn’t complete Must manually re-start job from step 1. It can drive you crazy!!!. SAS Grid Gets the Stars Aligned. SAS checkpoint-restart features + LSF requeue capabilities

adonis
Download Presentation

Never Lose a SAS Job

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Never Lose a SAS Job

  2. Not Again!! • Unexpected re-boot, system failures • Long running job didn’t complete • Must manually re-start job from step 1 It can drive you crazy!!!

  3. SAS Grid Gets the Stars Aligned... SAS checkpoint-restart features + LSF requeue capabilities + SASGSUB batch submission utility --------------------------------------------------- Completion of SAS Jobs in Minimal Time Ideal for critical long-running SAS jobs

  4. SAS Checkpoint/Restart Checkpoint mode Record info about data/proc steps in checkpoint library Restart mode Global statements and macros re-executed SAS reads data in checkpoint library to determine which steps completed Program execution resumes with step that was executing when failure occurred Data/proc steps that completed successfully will not be re-executed

  5. To Set Up for Checkpoint-Restart • Specify following options on batch SAS invocation: • STEPCHKPT – enables checkpoint mode • STEPRESTART – causes SAS to use checkpoint-restart data • NOWORKINIT – does not init WORK library when SAS starts • NOWORKTERM – saves WORK library when SAS exits • ERRORCHECK STRICT – puts SAS in syntax check mode when error in libname, filename, %include and lock stmts • ERRORABEND – causes SAS to terminate for most errors

  6. The WORK Directory • WORK is default location for checkpoint library • Can use STEPCHKPTLIB to point to permanent library • Must include libname as first statement in batch program • WORK directory must be on shared storage • Example: • sas92 -noworkinit -noworkterm -work abc

  7. Use of Both STEPCHKPT and STEPRESTART • Initial invocation • Results in checkpoint mode only • No data in checkpoint library • Subsequent invocations • Uses data from checkpoint library • Continues checkpoint mode for remainder of program

  8. SAS Grid Manager – Queues HOST A SAS Application Normal Queue SAS Grid Manager HOST B HOST C

  9. Automatic Job Requeue Configure queue to automatically requeue job with specific exit value REQUEUE_EXIT_VALUES=all ~0 ~1 Any exit code other than 0 or 1 (success & warnings) will be requeued REQUEUE_EXIT_VALUES=EXCLUDE(all ~0 ~1) Run requeued job on different host Jobs requeued 5 times by default MAX_JOB_REQUEUE lets you configure requeue limit, can be globally specified for all queue or on per queue basis

  10. Automatic Job Rerun A job is automatically rerun when Execution host becomes unavailable while a job is running System fails while a job is running RERUNNABLE=yes

  11. LSF Queue Definition Jobs dispatched from this queue will be rerun if system failures Begin QueueQUEUE_NAME   = sas_rerunPRIORITY     = 40NICE         = 10RERUNNABLE   = YESREQUEUE_EXIT_VALUES = all ~0 ~1DESCRIPTION  = Jobs submitted to this queue will be requeued automatically and also rerunnable.End Queue Jobs with fatal exit code will be requeued

  12. SASGSUB Capabilities Standalone utility that will allow user to Submit SAS program to grid for processing Display status of user’s jobs on the grid Retrieve output from user’s jobs to local directory Kill jobs

  13. Using SASGSUB Advantages Submit and forget View job output while job is running Eliminate need for full SAS install on client Make use of SAS checkpoint/restart capability NOTE - requires shared file system between client and grid

  14. Submitting a Job Command line interface sasgsub –gridsubmitpgm <sas_pgm> Example output Job ID: 6772 Job directory: "/CNT/sasgsub/gridwork/sascnn1/SASGSUB-2009-03-17_14.09.52.847_testPgm" Job log file: "/CNT/sasgsub/gridwork/sascnn1/SASGSUB-2009-03-17_14.09.52.847_testPgm/testPgm.log“

  15. Submitting a Job for Checkpoint-Restart GRIDRESTARTOK Automatically adds the following options to batch SAS invocation STEPCHKPT, STEPRESTART, ERRORCHECK STRICT, ERRORABEND, NOWORKINIT, NOWORKTERM Sets RERUNNABLE parm on job Command line interface sasgsub –gridsubmitpgm <sas_pgm> -gridrestartok

  16. Getting Job Status Current Job Information Job 1917 (testPgm) is Finished: Submitted: 08Dec2008:10:28:57, Started: 08Dec2008:10:28:57 on Host d15003, Ended: 08Dec2008:10:28:57 Job 1918 (testPgm) is Finished: Submitted: 08Dec2008:10:28:57, Started: 08Dec2008:10:28:57 on Host d15003, Ended: 08Dec2008:10:28:57 Job 1925 (testPgm) is Submitted: Submitted: 08Dec2008:10:28:57 • Command line interface • sasgsub –gridgetstatus <job_id | _ALL_> • Example output

  17. Retrieving Results Command line interface sasgsub –gridgetresults <job_id | _ALL_> Example Output Current Job Information Job 1917 (testPgm) is Finished: Submitted: 08Dec2008:10:53:33, Started: 08Dec2008:10:53:33 on Host d15003, Ended: 08Dec2008:10:53:33 Moved job information to .\SASGSUB-2008-11-21_21.52.57.130_testPgm Job 1918 (testPgm) is Finished: Submitted: 08Dec2008:10:53:33, Started: 08Dec2008:10:53:33 on Host d15003, Ended: 08Dec2008:10:53:33 Moved job information to .\SASGSUB-2008-11-24_13.13.39.167_testPgm Job 1925 (testPgm) is Submitted: Submitted: 08Dec2008:10:53:34

  18. Putting It All Together HOST A normal queue SAS Application SAS Grid Manager HOST B sas_rerun queue HOST C

  19. Putting It All Together HOST A normal queue SAS Application SAS Grid Manager HOST B sas_rerun queue HOST C

  20. A simple solution • Record a checkpoint number, save it in WORK • If restarting, skip PROC / DATA steps to there • Tokenize everything • Execute all global statements

More Related