1 / 9

HEPiX Spring 2005

Batch Scheduling at JLab. Sandra Philpott Scientific Computing Manager Physics Computer Center. HEPiX Spring 2005. Overview of Resources. Experimental Physics Batch Farm + Mass Storage Raw Data Storage Data Replay and Analysis 200 dual Xeons http://auger.jlab.org/scicomp

wauna
Download Presentation

HEPiX Spring 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Batch Scheduling at JLab Sandra Philpott Scientific Computing Manager Physics Computer Center HEPiX Spring 2005

  2. Overview of Resources Experimental Physics Batch Farm + Mass Storage • Raw Data Storage • Data Replay and Analysis • 200 dual Xeons http://auger.jlab.org/scicomp Theoretical Physics High Performance Computing (HPC) - Lattice QCD • 3 clusters of meshed machines • 384 GigE, 256 GigE, 128 Myrinet • parallel jobs http://www.jlab.org/hpc

  3. Schedulers • LSF – Physics Offline Reconstruction and Analysis • Auger locally developed front end • Tight integration with JASMine, our mass storage system • Consider PBS in time for Hall D and GlueX? • Cost savings, compatibility with HPC jsub user command • OpenPBS – Lattice QCD parallel computing • Torque • UnderLord locally developed scheduler • Also provides trend analysis, projections, graphs, etc. • Considering Maui as a replacement for UnderLord qsub user command

  4. Queue Configuration - LSF • Production – bulk of the jobs • Priority – quick jobs – less than 30 min. • Low priority – intended for simulations • Idle – typically mprime • Maintenance – for SysAdmin

  5. Queue Configuration - PBS Batch Queue Names: • 2m: Master@qcd2madm • 3g: Master@qcd3gadm • 4g: Panel01@qcd4gadm, Panel23@qcd4gadm, Panel45@qcd4gadm Queue & Machine Limits: • 2m: 24 hours, 8 GB /scratch, 256 MB memory • 3g: 24 hours, 20 GB /scratch, 256 MB memory • 4g: 24 hours, 20 GB /scratch, 512 MB memory • Jobs that use the most nodes have the highest priority • UnderLord scheduling policy defined by Admin • Job Age, Job Duration, Queue Priority, User Share, User Priority

  6. Sample Script – LSF JOBNAME: job2 PROJECT: clas COMMAND: /home/user/test/job2.script OPTIONS: -debug OS: solaris INPUT_FILES: /mss/clas/raw/run1001.data /mss/clas/raw/run1002.data /mss/clas/raw/run1003.data INPUT_DATA: fort.11 OTHER_FILES: /home/xxx/yyy/exp.database TOTAPE OUTPUT_DATA: recon.run100 OUTPUT_TEMPLATE: /mss/clas/prod1/OUTPUT_DATA

  7. Sample Script – PBS #! /bin/csh -f setenv DEPEND "" if ($#argv > 0) then             setenv DEPEND "-W depend=afterok"             foreach job ($argv)                     setenv DEPEND "${DEPEND}:$job"             endendif qsub         \ -c n \         -m ae -M akers@jlab.org\         -l nodes=64:ppn=2,walltime=30\         -v SLEEPTIME=30\         -N MPI_CPI_Test \         -p 1 \         -q Master@qcdadm01 ${DEPEND}\         /home/akers/TestJobs/MPI/cpi.csh

  8. Resource Utilization ExpPhy • Efficient Data Flow - prestaging of data before jobs are admitted to farm • Spread data over multiple file servers transparently • Keeps batch farm CPUs 100% busy; no waiting on data to arrive • Workaround to specify specific resources to imply newer systems with more memory DISK_SPACE: 125 GB HPC/Lattice • jobs may have an optimal resource spec but can use other configs if optimal not available

  9. Summary We would like • Common job submission for users • for both experimental and LQCD jobs • For both experimental and LQCD clusters • For grid jobs • common set of resource descriptors; user can specify only the ones required We are collaborating with STAR at BNL for RDL – Request Description Language http://www.star.bnl.gov/STAR/comp/Grid/scheduler/rdl/index.html We will soon become an Open Science Grid site http://www.opensciencegrid.org

More Related