230 likes | 378 Views
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Aug 26-27, 2004 Argonne, IL. Resource Management and Accounting Working Group. Working group scope Progress since last face-to-face Future Work Other issues. Working Group Scope.
E N D
Scalable Systems Software CenterResource Management and Accounting Working GroupFace-to-Face MeetingAug 26-27, 2004Argonne, IL
Resource Management and Accounting Working Group • Working group scope • Progress since last face-to-face • Future Work • Other issues
Working Group Scope The Resource Management Working Group is involved in the areas of resource management, scheduling and accounting. This working group will focus on the following software components: • Queue Manager • Scheduler • Accounting and Allocation Manager • Meta Scheduler Other critical resource management components are being developed in the Process Management and Monitoring Working Group: • Process Manager • Cluster Monitor
Resource Management Component Architecture Infrastructure Services Grid Scheduler Discovery Service Allocation Manager Cluster Scheduler Information Service Queue Manager Node Monitor Event Manager Security System Process Manager Node Manager
Resource Management Prototype Demonstration This demo runs a simple end-to-end test with a job being submitted running past it’s wallclock limit 4 Create-Reservation Allocation Manager Cluster Scheduler 9 Withdraw-Allocation 2 Query-Job 7 Query-Job 8 Delete-Job 3 Query-Node 5 Run-Job Job Submission Client Queue Manager Node Monitor 1 Submit-Job 0 Service-Lookup 6 Exec-Process Discovery Service Process Manager
General Progress • Updated and implemented SSSRMAP v3 specifications • SSSRMAP Wire Protocol v3.0.3 • Uses chunked HTTP transfer encoding • SSSRMAP Message Format v3.0.3 • Moved condition, assignment and option values into body of Element (instead of in value attribute) • SSS Job Object v3.0.3 • Added job properties in support of input/output, interactive jobs, dynamic jobs, suspend/resume, checkpoint/restart, resource limit enforcement, partitions, charges
General Progress • Completed system testing for Second Alpha Release • on xtorc-sss, a RedHat 9.0 System • Included Maui, Bamboo, Warehouse, Gold, Process Manager, etc. • Released second alpha versions of RMWG components • Fully implements version 3 of the SSSRMAP specification • Bamboo Queue Manager v0.9.6 • Maui Scheduler v3.2.6p9 (production version) • Gold Accounting and Allocation Manager v1.0.a2.1 • Warehouse System Monitor v0.7.0 • RMWG Webpage updated with Second Alpha release • Updated info, docs, downloads, etc. • Added an interactive FAQ engine (FAQOMATIC)
Cluster Scheduler Progress • Completed merger of Maui 3.2 and Maui SSS • Further added intrinsic support for SSS messages • client-server, allocation manager, queue manager, resource manager interfaces, callbacks • Status object • Error codes • Enhanced support for SSS node and job objects • allocation manager, queue manager, resource manager interfaces • extended MCom library to support additional node and job object attributes • improved socket and XML call reliability and security (added buffer checking and detailed failure reporting) • Built the SSS integration guide and updated Maui documentation
Queue Manager Progress • Third release of Bamboo made available • Supports basic SSSRMAP v3 message format • Interactive job support finished and tested • New submission client to handle LoadLeveler job scripts • Packaging updated to separate out components required on the execution nodes. • Added support for job dependencies (ie chained jobs are now supported)
Queue Manager Progress • PM interface updated to use scoping of signal • Job termination code changed to implement a “soft” kill. (ie SIGTERM followed later by a SIGKILL, if needed) • SSS suite was updated on cluster in Ames in July • Appears to resolve most known problems.
Accounting and Allocation Manager Progress • Completed rewrite of Gold server and all business logic in Perl • Significantly improved account/allocation design • Created an account statement report • Implemented hierarchical account nesting and tested trickle down deposits and trickle up charges • Implemented and tested credit accounts • Added support for auto-creation of users, projects and machines • Implemented automatic recursive association deletion/undeletion • Added support for query row limit, object aliases
Accounting and Allocation Manager Progress • Made compliant with SSSRMAP v3 specification • Fully implemented response chunking • Updated clients and Gold User’s Guide • Completed Allocation, Reservation, Quotation, and ChargeRates portions of GUI • Further simplified dependent module installation • Updated Component and Application Binding docs (v3.0.3) • Released Second Alpha release of Gold • Regression and system tested on RedHat 9.0 (xtorc-sss) • Upgraded Gold on PNNL SGI cluster to the latest second alpha version
Grid Scheduler Progress • migrated grid scheduler interface to use SSS message format for all scheduler-grid scheduler interface calls • migrated silver client commands to utilize SSS MCom XML library • enhanced global queue management • Added diagnostic clients • Verified new job management state machine
Grid Scheduler Progress • Introduced three new SSS objects • developed new SSS time range object • defined and implemented support for cluster to grid scheduler interface reservation object • proposed new cluster/machine object for exchanging high level policy and resource availability information
Future Work • Beta release of all components • Including new Silver Meta-scheduler • Portability testing for new components • Tier 1: Linux::RedHat (9.0) • Tier 2: Linux::Sousa, AIX, Tru-64 • Tier 3: OS-X, Unicos • Tier 4: HP-UX, IRIX, Solaris • Fault Tolerance supporting 25% cluster loss • Complete Design Specification documents for new components
Future Work Cluster Scheduler • Convert to using SSS job object for job submission and resource queries • Integrate/test Checkpoint-Restart support • Extend and mature the resource manager and grid scheduler interfaces
Future Work Queue manager • Add job group support (mainly for submission) • Add Task Group support (in progress) • Add Job Submission filter
Future Work Accounting and Allocation manager • Complete and test design for distributed accounting and multi-organizational involvement in job startup • Add support for multi-site authentication/authorization (each site having its own symmetric key) • Complete alpha version of GUI (fully featured) • Beta release of Gold (fully functional multi-site version with GUI) • Production deployment of Gold on 11.8TF Linux cluster (as primary allocation system) and several other sites as beta testers • Documentation to include roles and custom objects • Port Gold to other OS’s (Tiers 1 and 2) • Create regression test suite (w/ APITest when ready) • Performance and scalability testing
Future Work Grid Scheduler • First SSS release of Silver Grid Scheduler • Add additional statistics clients (global information gathering and global policies) • Fault tolerance improvements • Add improved cluster level job start time estimations • Initiate evaluation of peer-to-peer grid scheduling model • Test support for Globus 3.x
Resource Limit Enforcement • Bamboo: PBS JDL Specification, add support to PM • Maui: Scheduler policies • PM: Specification language and setting OS limits at job launch (Thanks!) • Warehouse: Measure the metrics by session and job • PM: Need session id/process id mapping • Maui-Bamboo: Initialization Phase
Dynamic Jobs Maleable Jobs – Ability to change size and duration up until start Dynamically Modifiable Jobs – Change attributes while idle or running Dynamic Jobs – Job changes its size and duration itself while running • Bamboo: Needs to add support for opaque extension attributes and QOS as well as dynamically modifiable jobs • Maui: Policy support (growth bounds, QOS/queue support) • PM: For dynamic jobs, MPI needs to handle growth/shrinkage and have that information reported to QM • Warehouse: Aggregate statistics by session id, job id and process id • (We need to know the model for dynamic job support with MPI)
Checkpoint/Restart {Suspend/Resume, Preempt/Restart, Checkpoint/Continue}? {System Initiated, User Initiated}? • Bamboo: How specify in JDL that a job is checkpointable (also maybe specify other parameters like filesystem, etc) • Bamboo-Maui: Needs to be able to keep track of how much walltime was used up before checkpoint and not count checkpoint idle time • Maui: Policy handling • needs to know which resources released when suspended • Checkpoint Manager: Status from Berkeley? Can we reattempt checkpoint/restart test Thursday evening?
Other Issues • Supercomputing demos