70 likes | 209 Views
Queues & Priority. Krishna Muriki Oct 16 , 2006 kmuriki@sdsc.edu SDSC User Services. Backfill window – show_bf. Queue drains for running big jobs. Many nodes will sit idle for all the required nodes to finish currently running jobs. Use ‘show_bf’ command to identify all these
E N D
Queues & Priority Krishna Muriki Oct 16 , 2006 kmuriki@sdsc.edu SDSC User Services
Backfill window – show_bf • Queue drains for running big jobs. • Many nodes will sit idle for all the required nodes to finish currently running jobs. • Use ‘show_bf’ command to identify all these nodes and use them immediately.
Node Reservations. • In special situations users can request for node reservations. • Using Catalina tools SDSC consultants can make reservations on nodes. • All guest accounts have an active reservation on the machine.
SDSC Job Priorities - 1 • Priorities determined by a number of weighting factors: • Job size • > 64 nodes (512 procs) get highest priority • Prevents wasted machine “dry-outs” • Favors jobs that can be run no where else • Allocation size • PIs with 1.2M hours need to run more jobs than those with 10k hours
SDSC Job Priorities - 2 • Priorities determined by a number of weighting factors (cont.) • Priority • High, normal, express queues • 4 nodes reserved 24 * 7 just for express jobs • Wait time • Jobs increase in priority as they age • Big boost for normal jobs older than 4 days or high jobs older than 2 day • Boosted just under the large jobs
SDSC Job Priorities - 3 • How does this work? • RESOURCE_WEIGHT • EXPANSION_FACTOR_WEIGHT • SYSTEM_QUEUE_TIME_WEIGHT • SUBMIT_TIME_WEIGHT • LOCAL_USER_WEIGHT • LOCAL_ADMIN_WEIGHT • WALL_TIME_WEIGHT • QOS_PRIORITY_WEIGHT • QOS_TARGET_EXPANSION_FACTOR_WEIGHT • QOS_TARGET_QUEUE_WAIT_TIME_WEIGHT • Priority= RESOURCE_WEIGHT*job size in procs + SYSTEM_QUEUE_TIME_WEIGHT*queue time in seconds + …
Tips to reduce queue wait time. • Every time you submit a job look for any possible backfill windows (use show_bf) • Try to estimate your job runtime and ask for exact amount you would need [do not just ask for the max 18 hrs] • If possible scale up your job to more number of processors/nodes.