1 / 39

LoadLeveler 3.3

LoadLeveler 3.3. Dr. Roland Kunz, IT Specialist Roland_Kunz@de.ibm.com l. LoadLeveler deliverables Highlights Scheduler Improvements Preemption using Backfill Scheduler Advance Reservation Job Launch Performance Improvements Additional Policy Control and Usability Enhancements

dung
Download Presentation

LoadLeveler 3.3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LoadLeveler 3.3 Dr. Roland Kunz, IT Specialist Roland_Kunz@de.ibm.com l

  2. LoadLeveler deliverables • Highlights • Scheduler Improvements • Preemption using Backfill Scheduler • Advance Reservation • Job Launch Performance Improvements • Additional Policy Control and Usability Enhancements • Grid interaction LoadLeveler V3.3 (GA 4/05)

  3. Linux Deliverables in 2004

  4. LL V3.3 on Linux – GA 8/05

  5. Backfill scheduler

  6. Advance Reservation

  7. Advance Reservation • Overview of Advance Reservation • Administrator Externals • User Externals • Misc.

  8. Overview • Advance Reservation == Reservation (Terms used interchangeably) • Satisfy Grid Computing Customer Requirements to Reserve Resources for Jobs • Can Reserve Computing Resources (Nodes) for • Workload • Maintenance • … • For BACKFILL Scheduler Only

  9. Reservation Request • Reservation: A set of nodes reserved for a period of time • Unique Reservation ID: c94n04.2.r • Add creation time to make it truly unique • Start Time: 03/23/2005 10:00 • Duration: 120 Minutes • A List of Reserved Nodes: c94n04 c94n03

  10. Reservation Owner and Group • Owner: Peter • usually the user who made the reservation • can be changed by LoadLeveler administrators • can run jobs in the reservation • can modify or cancel the reservation • can allow other users to use the reservation • Group Owner: Research_Group • LoadLeveler group, not AIX/Linux group • For quota checking only

  11. Reservations • Reservations Do Not Overlap • c94n03.5.r • c94n04.3.r nodes • c94n04.5.r time

  12. Setup State of the Reservation • To get reserved nodes ready : • Preempt Running Jobs if any on the Reserved Nodes • Use DEFAULT_PREEMPTION_METHOD • Check Status of Reserved Nodes • Send email to the owner and LoadLeveler administrators if • A node is not usable • A non-preemptable job is running

  13. Active State of the Reservation • Schedule Jobs to Run • No System Preemption in Reservation • Manual Preemption by llpreempt is allowed • Jobs should not require resources other than those on reserved machines.

  14. Administrator Externals

  15. Admin Control – User/Group • Must Setup Quotas: • max_reservations for User and Group stanzas • Example 1 : Every user can make a reservation: • default: type = user • max_reservations = 1 • Example 2: A group of users can make 10 reservations total • res_group: type = group • max_reservations = 10 • include_users = carol dave alex rich

  16. Config Control - RESERVATION_MIN_ADVANCE_TIME • Default: Reservation Can Start As Soon As Possible • RESERVATION_MIN_ADVANCE_TIME = 0 • No advance time is required • To Require a Reservation Be Made One Day In Advance: • RESERVATION_MIN_ADVANCE_TIME = 1440

  17. Accounting • To Collect Reservation History: • Set ACCT = A_RES in LoadL_config to record reservation usage data • TheRESERVATION_HISTORY keyword in LoadL_config can be used to define the name of a file containing the local history of reservations • llacctmrg -R command can be used to merge reservation history files, similar to job history

  18. User Externals

  19. Common Questions • How Many Reservations One Can Make and in Which Groups • How Long the Reservation Duration Can Be • Can Jobs Expected to Run Beyond Reservation End Time be Allowed to Run • How Far In Advance a Reservation Needs to be Made • Floating Consumable Resources Should Not Be Used by Jobs in a Reservation

  20. User Commands • Use llmkres to Make a New Reservation • Use llchres to Modify an Existing Reservation • Use llrmres to Remove an Existing Reservation • Use llbind to Bind Jobs to an Existing Reservation • Use LL_RES_ID=<Reservation ID>llsubmit to submit Jobs to an Existing Reservation • Use llqres to Query Reservations • Use llq -l to Check Whether a Job Step is Bound to a Reservation

  21. llmkres Examples • Reserve 2 nodes at 2pm today for 60 minutes • llmkres -t 14:00 -d 60 -n 2 • Reserve a Specific Node c94n04 • llmkres -t 03/23/2005 14:00 -d 120 -h c94n04 • Reserve All Available Nodes • llmkres -t 09:00 -d 60 -h all • Reserve Nodes to Run Job Step c94n04.3.0 • llmkres -t 14:00 -d 60 -j c94n04.3.0

  22. llchres Examples • To Move the Start Time of Reservation c94n04.2.r Earlier by 60 Minutes • llchres -t -60 -R c94n04.2.r • To Add a Reserved Node: • llchres -n +1 -R c94n04.2.r • To Remove a Specific Reserved Node: • llchres -h -c94n04 -R c94n04.2.r

  23. llrmres Examples • To Remove Reservation c94n04.2.r • llrmres -R c94n04.2.r • To Remove All Reservations: • llrmres -R all Note: regular users can only remove all of their own reservations LoadLeveler administrators can remove all reservations

  24. llbind and llsubmit Examples • To Submit a job to Reservation c94n04.2.r • LL_RES_ID=c94n04.2.rllsubmit weather.cmd • To Bind an Idle Job Step to Reservation c94n04.2.r • llbind -R c94n04.2.r c94n03.1.0 • To Unbind Job Steps from Reservation c94n03.1.r • llbind –r c94n03.1.r c94n04.1

  25. API • Reservation APIs • Sample: /usr/lpp/LoadL/full/samples/llres/res.c • RESERVATIONS query in Data Access APIs • /usr/lpp/LoadL/full/samples/lldata_access/qres.c • See LoadLeveler U&A Guide for More Information

  26. Backfill Preemption

  27. Preemption with Backfill Scheduler • Similar concept as in Gang Scheduler • Options to terminate job in addition to suspend • Support for AIX and Linux • Suspend not supported on Linux • Configuration keyword PREEMPTION_SUPPORT • Applicable to Gang Scheduler only ; default for gang scheduler is FULL • When using the backfill scheduler, either do not set it or set it to NONE • Configuration keyword DEFAULT_PREEMPT_METHOD • Default is suspend • Must be set for Linux

  28. Preemption Methods • Configuration Keyword (for Backfill Scheduler Only) - DEFAULT_PREEMPT_METHOD = rm | sh | su | vc | uh • Preemption Methods • Remove (rm) • System Hold (sh) • Suspend (su) • Vacate (vc) • User Hold (uh)

  29. Preemption Commands and API • llpreempt command • llpreempt -? | -H | -v | [-q] [-r | -m method] { [-u userlist] [-h hostlist] | [joblist] } • -r option only for jobs preempted by suspend • llhold -r to resume jobs preempted by system hold and user hold • Jobs preempted by remove must be resubmitted • Jobs preempted by vacate will restart when resources available • ll_preempt_jobs API • int ll_preempt_jobs (int version, void *errObj, LL_preempt_param **param); • Replaces ll_preempt API

  30. llmodify and llq • llmodify command • llmodify [-?] | [-H] | [-v] | [-q] | {-x <execution_factor> | -c <consumable_cpus> | -m <consumable_real_memory> | -W <wclimit_add_min> | -C <job_class> | -a <account_no> | -s <q_sysprio> | -p {preempt|nopreempt} } <jobstep> • llq command • llq –l from Central Manager • Preemptable: yes • llq –s • This job step is scheduled to run but is waiting for the following job steps to be preempted:

  31. Misc. Enhancements

  32. Accounting • Correlating AIX and LoadLeveler Accounting Records • Find all AIX accounting records for all processes in LoadLeveler job • Add unique identifier in both AIX and LoadLeveler accounting • LoadLeveler invokes setsubproj() to set accounting key in AIX • Supported in AIX 5.2I and AIX 5.3 • AIX library dynamically loaded: /usr/lib/libaacct.a • Use llsummary –l to find accounting key in LoadLeveler history file • Job Step Id: c188f2n07.ppd.pok.ibm.com.1.0 • Step Name: 0 • Queue Date: Wed Feb 9 15:12:15 EST 2005 • Job Accounting Key: 4758737585449234063 • …

  33. Modifying System Priority • Modifying job priorities • llmodify [-?] | [-H] | [-v] | [-q] | {-x <execution_factor> | -c <consumable_cpus> | -m <consumable_real_memory> | -W <wclimit_add_min> | -C <job_class> | -a <account_no> | -s <q_sysprio> | -p {preempt|nopreempt} } <jobstep> • LL_MODIFY_SYSPRIO enum with the ll_modify API • New priority is fixed • llq –l from Central Manager • System Priority: -1560 • q_sysprio: 97 • Previous q_sysprio: -1560

  34. Integration in a Grid environment Running LoadLeveler jobs on the IBM Grid Toolbox (Globus) - LoadLeveler GAR to deploy (llgrid.gar) - GAR contains loadleveler.pm, globus-gram-loadleveler-provider, rips- loadleveler-provider.xml, mjs-ll-server-deploy.wsdd, server-deploy.wsdd, deploy/loadleveler-preDeploy.sh. The IBM Grid Toolbox must be installed and gars /mmjfs.gar and gars/gram-rips.gar must be deployed before deploying LoadLeveler GAR. • Log in as ibmgrid that is the owner ID of the Grid Toolbox. • export GLOBUS_LOCATION=/opt/IBMGrid • Copy llgrid.gar into $GLOBUS_LOCATION/gars directory. cp /usr/lpp/LoadL/full/lib/llgrid.gar $GLOBUS_LOCATION/gars/llgrid.gar • As ibmgrid, run the following commands to deploy the llgrid.gar: cd $GLOBUS_LOCATION . igt-setenv.sh

  35. Integration in a Grid environment igt-deploy-gar gars/llgrid.gar • After deploying two new files exist: $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/loadleveler.pm $GLOBUS_LOCATION/etc/globus-gram-loadleveler-provider. LoadlevelerManagedJobFactoryService service name added in $GLOBUS_LOCATION/local-server-config.wsdd MasterLoadlevelerManagedJobFactoryService service added in $GLOBUS_LOCATION/AppServer/installedApps/DefaultNode/IBMGrid.ear/ogsa.war/WEB-INF/server-config.wsdd. LoadLevelerInformation service data provider enabled in $GLOBUS_LOCATION/AppServer/installedApps/DefaultNode/IBMGrid.ear/ogsa.war/WEB-INF/etc/rips-service-config.xml.

  36. Integration in a Grid • Remove handler="jobDataHandler" in $GLOBUS_LOCATION/AppServer/installedApps/DefaultNode/IBMGrid.ear/ogsa.war/WEB-INF/etc/rips-service-config.xml if this xml file contains handler="jobDataHandler".

  37. Questions

More Related