70 likes | 121 Views
An Alternative Batch System. Ian Bird JLAB HEPiX-SLAC Oct 6-8, 1999. Rationale. LSF is (too) expensive for large farms where relatively few of it’s capabilities are used HPC group has need for a job-submission scheme spanning labs LQCD project – JLAB/MIT Supports parallel jobs
E N D
An Alternative Batch System Ian Bird JLAB HEPiX-SLAC Oct 6-8, 1999
Rationale • LSF is (too) expensive for large farms where relatively few of it’s capabilities are used • HPC group has need for a job-submission scheme spanning labs • LQCD project – JLAB/MIT • Supports parallel jobs • PPDG (NGI) needs a batch system as part of the tools – meta-facility • MIT (Nuclear Physics) need a replacement for NQS on their central clusters • Nice to have an open source / free codebase that can be developed at will
System Goals • Provide a local batch system that replaces most functions of LSF • Provide a meta-facility framework • Support parallel jobs (MPI, PVM at least) • Interface with current user layer (JOBS) to include those additional features • Timescale • Simple replacement of LSF in JOBS early 2000 • Reasonable prototype – April 2000 • First full version – mid 2000
JOBS • Provide batch system + JOBS layer (optional) • JOBS: • Job arrays (this is now in LSF) • Wide availability • Clients do not need LSF license • Web interface • Transparency of file location (obtain from mss) • Expand to provide data pre-staging • Talks to mss • Does not bind batch system to mss
Description • Base code on PBS • Portable Batch System – NASA • Additional facilities required • Schedulers – hierarchical • Authentication – • particularly cross-site and remote access • Globus security package is strong candidate • Meta-facility including master scheduler • Replace basic accounting package • needed for realistic schedulers and good resource accounting • Tape and file pre-staging • Locally and networked • Full APIs
Current Status • Fleshing out requirements document • Begin replacement of LSF in JOBS • To demonstrate feasibility of PBS • Start work on scheduling algorithms • ~2 FTE at JLAB