Jefferson Lab and the Portable Batch System

Jefferson Lab and the Portable Batch System Walt Akers High Performance Computing Group

Jefferson Lab and PBS: Motivating Factors • New Computing Cluster • Alpha Based Compute Nodes • 16 XP1000 Single Processor Nodes (LINPACK 5.61 GFlop/Sec) • 8 UP2000 Dual Processor Nodes (LINPACK 7.48 GFlop/Sec) • Heterogeneous Job Mix • Combination of Parallel and Non-Parallel Jobs • Job execution times range from a few hours to weeks • Data requirements range from minimal to several gigabytes • Modest Budget • Much of our funding was from internal sources • Initial hardware expense was relatively high • Expandability • Can the product be expanded from a few nodes to hundreds

Jefferson Lab and PBS: Alternative Systems • PBS - Portable Batch System • Open Source Product Developed at NASA Ames Research Center • DQS - Distributed Queuing System • Open Source Product Developed by SCRI at Florida State University • LSF - Load Sharing Facility • Commercial Product from Platform Computing • Already Deployed by the Computer Center at Jefferson Lab • Codine • Commercial Version of DQS from Gridware, Inc. • Condor • A Restricted Source ‘Cycle Stealing’ Product From The University of Wisconsin • Others To Numerous To Mention

Jefferson Lab and PBS: Why We Chose PBS? • Portability • The PBS distribution compiled and ran immediately on both the 64 bit Alpha and 32 bit Intel platforms. • Documentation • PBS comes with comprehensive documentation including an Administrators Guide, External Reference, and Internal Reference. • Active Development Community • There is a large community worldwide that continues to improve and refine PBS. • Modularity • PBS is a component oriented system. • A well defined API is provided to allow components to be replaced with locally defined modules. • Open Source • The source code for the PBS system is available without restriction. • Price • Hey, its free…

Jefferson Lab and PBS: The PBS View Of The World • PBS Server • Mastermind of the PBS System • Central Point of Contact • PBS Scheduler • Prioritizes Jobs • Signals Server to Start Jobs • Machine Oriented Mini-Server (MOM) • Executes Scripts on Compute Nodes • Performs User File Staging

Jefferson Lab and PBS: The PBS Server • Routing Queues • Can move jobs between multiple PBS Servers • Execution Queues • Defines default characteristics for submitted jobs • Defines a priority level for queued jobs • Holds jobs before, during and after execution • Node Capabilities • The server maintains a table of nodes, their capabilities and their availability. • Job Requirements • The server maintains a table of submitted jobs that is independent of the queues. • Global Policy • The server maintains global policies and default job characteristics.

Jefferson Lab and PBS: The PBS Scheduler • Prioritizes Jobs • Called periodically by the PBS Server • Downloads job lists from the server sorts them based on locally defined requirements. • Tracks Node Availability • Examines executing jobs to determine projected availability time for nodes. • Using this data the scheduler can calculate future deployments and determine when back-filling should be performed. • Recommends Job Deployment • At the end of the scheduling cycle, the scheduler will submit a list of jobs that can be started immediately to the server. • The PBS Server is responsible for verifying that the jobs can be started, and then deploying them.

Jefferson Lab and PBS: Machine Oriented Mini-Server • Executes Scripts • At the direction of the PBS Server, MOM executes the user provided scripts • For parallel jobs, the primary MOM (Mother Superior) starts the jobs on itself and all other assigned nodes. • Stages Data Files • Prior to script execution, the MOM is responsible for remotely copying user specified data files to Mother Superior. • Following execution, the resultant data files are remote copied back to the user specified host. • Tracks Resource Usage • MOM tracks the cpu time, wall time, memory and disk that has been used by the job. • Kills Rogue Jobs • Kills jobs at the PBS Server’s request

Jefferson Lab and PBS: Our Current Implementation

Jefferson Lab and PBS: What We’ve Learned So Far • PBS Is Reasonably Reliable, But Has Room For Improvement • PBS Server and PBS Scheduler components work well and behave predictably • PBS MOM works okay, but behaves bizarrely in certain situations • Disk full = chaos • Out of process slots = chaos • Improper file transfer or staging = chaos Note: The first two can be avoided by conspicuous system management, the last is the responsibility of the job submitter. • Red Hat Linux 6.2 • We’ve seen many problems associated with NFS. After upgrading to Kernel 2.2.16-3 many of these problems went away. • Klogd occasionally spins out of control and uses all available CPU cycles. • Sshd on SMP machines dies for no apparent reason. • Crontab works intermittently on SMP nodes. We’re considering experimenting with True 64 Unix to see if these problems exist there. • Writing a Scheduler Is Hard Work • We have developed two interim schedulers and are now working on the ‘final’ implementation.

Jefferson Lab and PBS: Ongoing Development • Underlord Scheduling System • Built on the existing PBS Scheduler Framework • Plug-in replacement for the default scheduler • Uses an object oriented interface to the PBS Server • Comprehensive match making scheme • Starts from an ordered list of jobs • Works with a collection of homogeneous or heterogeneous nodes • Locates the optimal node or combination of nodes where a job should be deployed • Uses user specified job parameters to project future job deployment • Uses future job scheduling in combination with backfilling to maximize system utilization. • Multi-layered job sorting algorithm • Time in queue • Projected execution time • Number of processors requested • Queue priority • Progressive user share (similar to the LSF scheme) • Generates a projection table • Allows users to determine when their job is projected to start

Jefferson Lab and PBS: Future Directions • Data Grid Server • In order to provide greater flexibility to the Batch System and allow it to accommodate data provided through the proposed Data Grid system, a Data Grid Server will be added to the existing system components. • This module will have the following capabilities • Will provide time projections for when data will be available • Will perform data migration to a script accessible host • Will provide mechanisms to transfer resultant data to a specified location • Will replace the existing staging capabilities of the PBS Server and PBS MOM. • PBS Meta-Facility - The Overlord Scheduler • The Overlord Scheduler will be a centralized location where jobs are submitted that can be forwarded to other PBS Clusters for execution. The Overlord Scheduler will have the following capabilities. • Will prioritize and sort all jobs based on global Meta-Facility rules • Will consider job requirements, data location and network throughput and will forward each job to the PBS Server where it will be scheduled earliest. • Will not forward jobs to one of the ‘Underlord’ systems until it is eligible for immediate execution there. • We don’t have all of this figured out yet… but, we are confident.

Jefferson Lab and PBS: Places On The Web • Jefferson Lab HPC Home Page • http://www.jlab.org/hpc • Currently we have most of the PBS documentation and some statistics about our cluster and its development. • PBS Home Page • http://www.openpbs.org • Register and download PBS and all documentation from this site`

Jefferson Lab and the Portable Batch System

Jefferson Lab and the Portable Batch System

Presentation Transcript

Advanced Portable Batch System PBS

Advanced Portable Batch System (PBS)

OpenPBS (Portable Batch System)

Jefferson Lab Status and Outlook Hugh Montgomery Jefferson Lab Users Meeting, 2012

Basic Portable Batch System (PBS)

Jefferson Lab Experience

Portable Lab Instructions

Portable Lab Station

Jefferson Lab Status

Jefferson Lab Status

An Overview of the Portable Batch System

Jefferson Lab Infrastructure and Technology

Jefferson Lab Report

Jefferson Lab Site Report

Jefferson Lab Report

Jefferson Lab Printing System

Jefferson Lab

An Introduction to the Portable Batch System (PBS)

Jefferson Lab and the 12GeV Upgrade

Jefferson Lab Status

Jefferson Lab Update

The Manual of Portable Concrete Batch Plant