90 likes | 226 Views
Batch Scheduling at JLab. Sandra Philpott Scientific Computing Manager Physics Computer Center. HEPiX Spring 2005. Overview of Resources. Experimental Physics Batch Farm + Mass Storage Raw Data Storage Data Replay and Analysis 200 dual Xeons http://auger.jlab.org/scicomp
E N D
Batch Scheduling at JLab Sandra Philpott Scientific Computing Manager Physics Computer Center HEPiX Spring 2005
Overview of Resources Experimental Physics Batch Farm + Mass Storage • Raw Data Storage • Data Replay and Analysis • 200 dual Xeons http://auger.jlab.org/scicomp Theoretical Physics High Performance Computing (HPC) - Lattice QCD • 3 clusters of meshed machines • 384 GigE, 256 GigE, 128 Myrinet • parallel jobs http://www.jlab.org/hpc
Schedulers • LSF – Physics Offline Reconstruction and Analysis • Auger locally developed front end • Tight integration with JASMine, our mass storage system • Consider PBS in time for Hall D and GlueX? • Cost savings, compatibility with HPC jsub user command • OpenPBS – Lattice QCD parallel computing • Torque • UnderLord locally developed scheduler • Also provides trend analysis, projections, graphs, etc. • Considering Maui as a replacement for UnderLord qsub user command
Queue Configuration - LSF • Production – bulk of the jobs • Priority – quick jobs – less than 30 min. • Low priority – intended for simulations • Idle – typically mprime • Maintenance – for SysAdmin
Queue Configuration - PBS Batch Queue Names: • 2m: Master@qcd2madm • 3g: Master@qcd3gadm • 4g: Panel01@qcd4gadm, Panel23@qcd4gadm, Panel45@qcd4gadm Queue & Machine Limits: • 2m: 24 hours, 8 GB /scratch, 256 MB memory • 3g: 24 hours, 20 GB /scratch, 256 MB memory • 4g: 24 hours, 20 GB /scratch, 512 MB memory • Jobs that use the most nodes have the highest priority • UnderLord scheduling policy defined by Admin • Job Age, Job Duration, Queue Priority, User Share, User Priority
Sample Script – LSF JOBNAME: job2 PROJECT: clas COMMAND: /home/user/test/job2.script OPTIONS: -debug OS: solaris INPUT_FILES: /mss/clas/raw/run1001.data /mss/clas/raw/run1002.data /mss/clas/raw/run1003.data INPUT_DATA: fort.11 OTHER_FILES: /home/xxx/yyy/exp.database TOTAPE OUTPUT_DATA: recon.run100 OUTPUT_TEMPLATE: /mss/clas/prod1/OUTPUT_DATA
Sample Script – PBS #! /bin/csh -f setenv DEPEND "" if ($#argv > 0) then setenv DEPEND "-W depend=afterok" foreach job ($argv) setenv DEPEND "${DEPEND}:$job" endendif qsub \ -c n \ -m ae -M akers@jlab.org\ -l nodes=64:ppn=2,walltime=30\ -v SLEEPTIME=30\ -N MPI_CPI_Test \ -p 1 \ -q Master@qcdadm01 ${DEPEND}\ /home/akers/TestJobs/MPI/cpi.csh
Resource Utilization ExpPhy • Efficient Data Flow - prestaging of data before jobs are admitted to farm • Spread data over multiple file servers transparently • Keeps batch farm CPUs 100% busy; no waiting on data to arrive • Workaround to specify specific resources to imply newer systems with more memory DISK_SPACE: 125 GB HPC/Lattice • jobs may have an optimal resource spec but can use other configs if optimal not available
Summary We would like • Common job submission for users • for both experimental and LQCD jobs • For both experimental and LQCD clusters • For grid jobs • common set of resource descriptors; user can specify only the ones required We are collaborating with STAR at BNL for RDL – Request Description Language http://www.star.bnl.gov/STAR/comp/Grid/scheduler/rdl/index.html We will soon become an Open Science Grid site http://www.opensciencegrid.org