390 likes | 577 Views
LoadLeveler 3.3. Dr. Roland Kunz, IT Specialist Roland_Kunz@de.ibm.com l. LoadLeveler deliverables Highlights Scheduler Improvements Preemption using Backfill Scheduler Advance Reservation Job Launch Performance Improvements Additional Policy Control and Usability Enhancements
E N D
LoadLeveler 3.3 Dr. Roland Kunz, IT Specialist Roland_Kunz@de.ibm.com l
LoadLeveler deliverables • Highlights • Scheduler Improvements • Preemption using Backfill Scheduler • Advance Reservation • Job Launch Performance Improvements • Additional Policy Control and Usability Enhancements • Grid interaction LoadLeveler V3.3 (GA 4/05)
Advance Reservation • Overview of Advance Reservation • Administrator Externals • User Externals • Misc.
Overview • Advance Reservation == Reservation (Terms used interchangeably) • Satisfy Grid Computing Customer Requirements to Reserve Resources for Jobs • Can Reserve Computing Resources (Nodes) for • Workload • Maintenance • … • For BACKFILL Scheduler Only
Reservation Request • Reservation: A set of nodes reserved for a period of time • Unique Reservation ID: c94n04.2.r • Add creation time to make it truly unique • Start Time: 03/23/2005 10:00 • Duration: 120 Minutes • A List of Reserved Nodes: c94n04 c94n03
Reservation Owner and Group • Owner: Peter • usually the user who made the reservation • can be changed by LoadLeveler administrators • can run jobs in the reservation • can modify or cancel the reservation • can allow other users to use the reservation • Group Owner: Research_Group • LoadLeveler group, not AIX/Linux group • For quota checking only
Reservations • Reservations Do Not Overlap • c94n03.5.r • c94n04.3.r nodes • c94n04.5.r time
Setup State of the Reservation • To get reserved nodes ready : • Preempt Running Jobs if any on the Reserved Nodes • Use DEFAULT_PREEMPTION_METHOD • Check Status of Reserved Nodes • Send email to the owner and LoadLeveler administrators if • A node is not usable • A non-preemptable job is running
Active State of the Reservation • Schedule Jobs to Run • No System Preemption in Reservation • Manual Preemption by llpreempt is allowed • Jobs should not require resources other than those on reserved machines.
Admin Control – User/Group • Must Setup Quotas: • max_reservations for User and Group stanzas • Example 1 : Every user can make a reservation: • default: type = user • max_reservations = 1 • Example 2: A group of users can make 10 reservations total • res_group: type = group • max_reservations = 10 • include_users = carol dave alex rich
Config Control - RESERVATION_MIN_ADVANCE_TIME • Default: Reservation Can Start As Soon As Possible • RESERVATION_MIN_ADVANCE_TIME = 0 • No advance time is required • To Require a Reservation Be Made One Day In Advance: • RESERVATION_MIN_ADVANCE_TIME = 1440
Accounting • To Collect Reservation History: • Set ACCT = A_RES in LoadL_config to record reservation usage data • TheRESERVATION_HISTORY keyword in LoadL_config can be used to define the name of a file containing the local history of reservations • llacctmrg -R command can be used to merge reservation history files, similar to job history
Common Questions • How Many Reservations One Can Make and in Which Groups • How Long the Reservation Duration Can Be • Can Jobs Expected to Run Beyond Reservation End Time be Allowed to Run • How Far In Advance a Reservation Needs to be Made • Floating Consumable Resources Should Not Be Used by Jobs in a Reservation
User Commands • Use llmkres to Make a New Reservation • Use llchres to Modify an Existing Reservation • Use llrmres to Remove an Existing Reservation • Use llbind to Bind Jobs to an Existing Reservation • Use LL_RES_ID=<Reservation ID>llsubmit to submit Jobs to an Existing Reservation • Use llqres to Query Reservations • Use llq -l to Check Whether a Job Step is Bound to a Reservation
llmkres Examples • Reserve 2 nodes at 2pm today for 60 minutes • llmkres -t 14:00 -d 60 -n 2 • Reserve a Specific Node c94n04 • llmkres -t 03/23/2005 14:00 -d 120 -h c94n04 • Reserve All Available Nodes • llmkres -t 09:00 -d 60 -h all • Reserve Nodes to Run Job Step c94n04.3.0 • llmkres -t 14:00 -d 60 -j c94n04.3.0
llchres Examples • To Move the Start Time of Reservation c94n04.2.r Earlier by 60 Minutes • llchres -t -60 -R c94n04.2.r • To Add a Reserved Node: • llchres -n +1 -R c94n04.2.r • To Remove a Specific Reserved Node: • llchres -h -c94n04 -R c94n04.2.r
llrmres Examples • To Remove Reservation c94n04.2.r • llrmres -R c94n04.2.r • To Remove All Reservations: • llrmres -R all Note: regular users can only remove all of their own reservations LoadLeveler administrators can remove all reservations
llbind and llsubmit Examples • To Submit a job to Reservation c94n04.2.r • LL_RES_ID=c94n04.2.rllsubmit weather.cmd • To Bind an Idle Job Step to Reservation c94n04.2.r • llbind -R c94n04.2.r c94n03.1.0 • To Unbind Job Steps from Reservation c94n03.1.r • llbind –r c94n03.1.r c94n04.1
API • Reservation APIs • Sample: /usr/lpp/LoadL/full/samples/llres/res.c • RESERVATIONS query in Data Access APIs • /usr/lpp/LoadL/full/samples/lldata_access/qres.c • See LoadLeveler U&A Guide for More Information
Preemption with Backfill Scheduler • Similar concept as in Gang Scheduler • Options to terminate job in addition to suspend • Support for AIX and Linux • Suspend not supported on Linux • Configuration keyword PREEMPTION_SUPPORT • Applicable to Gang Scheduler only ; default for gang scheduler is FULL • When using the backfill scheduler, either do not set it or set it to NONE • Configuration keyword DEFAULT_PREEMPT_METHOD • Default is suspend • Must be set for Linux
Preemption Methods • Configuration Keyword (for Backfill Scheduler Only) - DEFAULT_PREEMPT_METHOD = rm | sh | su | vc | uh • Preemption Methods • Remove (rm) • System Hold (sh) • Suspend (su) • Vacate (vc) • User Hold (uh)
Preemption Commands and API • llpreempt command • llpreempt -? | -H | -v | [-q] [-r | -m method] { [-u userlist] [-h hostlist] | [joblist] } • -r option only for jobs preempted by suspend • llhold -r to resume jobs preempted by system hold and user hold • Jobs preempted by remove must be resubmitted • Jobs preempted by vacate will restart when resources available • ll_preempt_jobs API • int ll_preempt_jobs (int version, void *errObj, LL_preempt_param **param); • Replaces ll_preempt API
llmodify and llq • llmodify command • llmodify [-?] | [-H] | [-v] | [-q] | {-x <execution_factor> | -c <consumable_cpus> | -m <consumable_real_memory> | -W <wclimit_add_min> | -C <job_class> | -a <account_no> | -s <q_sysprio> | -p {preempt|nopreempt} } <jobstep> • llq command • llq –l from Central Manager • Preemptable: yes • llq –s • This job step is scheduled to run but is waiting for the following job steps to be preempted:
Accounting • Correlating AIX and LoadLeveler Accounting Records • Find all AIX accounting records for all processes in LoadLeveler job • Add unique identifier in both AIX and LoadLeveler accounting • LoadLeveler invokes setsubproj() to set accounting key in AIX • Supported in AIX 5.2I and AIX 5.3 • AIX library dynamically loaded: /usr/lib/libaacct.a • Use llsummary –l to find accounting key in LoadLeveler history file • Job Step Id: c188f2n07.ppd.pok.ibm.com.1.0 • Step Name: 0 • Queue Date: Wed Feb 9 15:12:15 EST 2005 • Job Accounting Key: 4758737585449234063 • …
Modifying System Priority • Modifying job priorities • llmodify [-?] | [-H] | [-v] | [-q] | {-x <execution_factor> | -c <consumable_cpus> | -m <consumable_real_memory> | -W <wclimit_add_min> | -C <job_class> | -a <account_no> | -s <q_sysprio> | -p {preempt|nopreempt} } <jobstep> • LL_MODIFY_SYSPRIO enum with the ll_modify API • New priority is fixed • llq –l from Central Manager • System Priority: -1560 • q_sysprio: 97 • Previous q_sysprio: -1560
Integration in a Grid environment Running LoadLeveler jobs on the IBM Grid Toolbox (Globus) - LoadLeveler GAR to deploy (llgrid.gar) - GAR contains loadleveler.pm, globus-gram-loadleveler-provider, rips- loadleveler-provider.xml, mjs-ll-server-deploy.wsdd, server-deploy.wsdd, deploy/loadleveler-preDeploy.sh. The IBM Grid Toolbox must be installed and gars /mmjfs.gar and gars/gram-rips.gar must be deployed before deploying LoadLeveler GAR. • Log in as ibmgrid that is the owner ID of the Grid Toolbox. • export GLOBUS_LOCATION=/opt/IBMGrid • Copy llgrid.gar into $GLOBUS_LOCATION/gars directory. cp /usr/lpp/LoadL/full/lib/llgrid.gar $GLOBUS_LOCATION/gars/llgrid.gar • As ibmgrid, run the following commands to deploy the llgrid.gar: cd $GLOBUS_LOCATION . igt-setenv.sh
Integration in a Grid environment igt-deploy-gar gars/llgrid.gar • After deploying two new files exist: $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/loadleveler.pm $GLOBUS_LOCATION/etc/globus-gram-loadleveler-provider. LoadlevelerManagedJobFactoryService service name added in $GLOBUS_LOCATION/local-server-config.wsdd MasterLoadlevelerManagedJobFactoryService service added in $GLOBUS_LOCATION/AppServer/installedApps/DefaultNode/IBMGrid.ear/ogsa.war/WEB-INF/server-config.wsdd. LoadLevelerInformation service data provider enabled in $GLOBUS_LOCATION/AppServer/installedApps/DefaultNode/IBMGrid.ear/ogsa.war/WEB-INF/etc/rips-service-config.xml.
Integration in a Grid • Remove handler="jobDataHandler" in $GLOBUS_LOCATION/AppServer/installedApps/DefaultNode/IBMGrid.ear/ogsa.war/WEB-INF/etc/rips-service-config.xml if this xml file contains handler="jobDataHandler".