70 likes | 122 Views
This project aims to develop a cost-effective batch system to meet the job submission needs of various large-scale computing projects. By replacing LSF with a more efficient system, supporting parallel jobs, and integrating with the JOBS layer, this initiative strives to improve job management efficiency. With a focus on a meta-facility framework, portable batch system integration, and enhanced schedulers, the project will enhance job processing across labs and projects, with a timeline for implementation and development of essential features.
E N D
An Alternative Batch System Ian Bird JLAB HEPiX-SLAC Oct 6-8, 1999
Rationale • LSF is (too) expensive for large farms where relatively few of it’s capabilities are used • HPC group has need for a job-submission scheme spanning labs • LQCD project – JLAB/MIT • Supports parallel jobs • PPDG (NGI) needs a batch system as part of the tools – meta-facility • MIT (Nuclear Physics) need a replacement for NQS on their central clusters • Nice to have an open source / free codebase that can be developed at will
System Goals • Provide a local batch system that replaces most functions of LSF • Provide a meta-facility framework • Support parallel jobs (MPI, PVM at least) • Interface with current user layer (JOBS) to include those additional features • Timescale • Simple replacement of LSF in JOBS early 2000 • Reasonable prototype – April 2000 • First full version – mid 2000
JOBS • Provide batch system + JOBS layer (optional) • JOBS: • Job arrays (this is now in LSF) • Wide availability • Clients do not need LSF license • Web interface • Transparency of file location (obtain from mss) • Expand to provide data pre-staging • Talks to mss • Does not bind batch system to mss
Description • Base code on PBS • Portable Batch System – NASA • Additional facilities required • Schedulers – hierarchical • Authentication – • particularly cross-site and remote access • Globus security package is strong candidate • Meta-facility including master scheduler • Replace basic accounting package • needed for realistic schedulers and good resource accounting • Tape and file pre-staging • Locally and networked • Full APIs
Current Status • Fleshing out requirements document • Begin replacement of LSF in JOBS • To demonstrate feasibility of PBS • Start work on scheduling algorithms • ~2 FTE at JLAB