Resource Management and Balancing

Resource Managementand Balancing ECI, July 2005

RMS – Overview • Resource management • Job management • Monitoring • Resource balancing • Information dissemination

Job Management • The need • Operating system offers job and resource management service for a single computer • The batch job control on multi-user mainframes was performed outside the operating system • Main advantages are: • Structured resource utilization planning and control • Abstraction, easy-to-{understand,use} for user • Provide a vendor independent user interface

Manager vs. Scheduler • Resource manager • Locating and allocate resources • Authentication • Process creation and migration • Resource scheduler • Queuing applications • Drive manager (enforce policy)

Job Management - Requirements • A typical job management system offers • Heterogeneous support • Batch support • Parallel support • Interactive support • Checkpointing and process migration • Load balancing • Job run-time limits • GUI

RMS Architecture • Prerequisites • Multi-user & multitasking capabilities • Homogeneous OS are not a restriction • In practice • “Similar” operating systems run on all machines • UNIX (in all variants) is very customary in the context of using RMS

Resource Description • Requirements • Easy to generate simple description • Powerful to generate complex description • Portable representation • Attributed components • RDL: Language to specify resources • Administrator: describe what’s available • User: describe what’s required • Hierarchical

RDL Example • A 1024 nodes transputer with unix front-end BEGIN PROC Frontend DECLARATION { PROC; OS=Unix; Repeat = 4; } END PROC CONNECTION FOR I = 0 to 3 DO Backend LINK i  Frontend LINK i OD END PROC DECLARATION BEGIN PROC Transputer DYNAMIC; EXCLUSIVE; DECLARATION BEGIN PROC Backend DECLARATION { PROC; CPU=T8; MEMORY=4; SPEED=30; REPEAT=1024; } { PORT; REPEAT=4; } END PROC

RMS Components • User interface • At the minimum - command line user interface • GUI becoming indispensable • Typical commands • Job submission to register for execution • status display to monitor progress or failure of a job • Job deletion to cancel jobs no longer needed

RMS Components (contd) • Administrative environment • Specify nodes characteristics • Define feasible job classes and map to hosts • Define user access permissions • Specify resource limitations for users and jobs • Specify policies for the assignment of jobs • Control and ensure proper operation of the RMS • Analyze accounting data to tune the system

RMS Entities • Queues • Queues bound to hosts, jobs assigned to queues • Hosts • Compute hosts, control hosts • Users • Capabilities, permissions, priorities • Jobs • Resources • Policies

RMS Entities – Jobs • Job: collection of computational tasks • A single program, or several interacting programs • In the context of RMS • Batch Jobs: require no manual interaction as soon as started • Interactive Jobs: require input during runtime • Parallel Jobs: subtasks spread across several hosts in a cluster • Check-pointing Jobs: periodically save status to the file system and can be aborted anytime

RMS Entities – Jobs • Batch jobs • Dispatch jobs according to policy and availability • Suspend/Resume & checkpoint/restart • Interactive jobs • Need to maintain a terminal connection • “Watchdog” monitor withdraw from pool • Parallel jobs • Need to integrate with parallel environment • Scheduling policy is more complex

RMS Entities - Resources • Available memory, CPU time, network bandwidth, and peripheral devices, licenses • Jobs declare resource requirements • RMS enforces resource consumption • ensures quality of service • prevents over-subscription • detects over-usage

RMS Entities - Policies • Abstract mechanisms to automate control • imbalanced load is common in clusters • important/urgent work starved • unauthorized users may take advantage • users may exceed desired resource usage over time • Resource Utilization Policies • Monitor resource consumption • Dispatch of new jobs • Scheduling Policies • Dispatch of new jobs • Relocation of jobs

Resource Utilization Policies • Share based • Resource “credit” is assigned to users, depts… • Hierarchical share tree defines sharing • Establish entitlements within time frame • Fair distribution of resources • Functional • Assignment by functional importance (priority) • Past usage is not taken into account • Deadline • Time-critical applications • Manual override • Administrators like power…

Scheduling Policies • Dispatch time – who, where • First-Come-First-Served • Select-Least-Loaded • Select-Fixed-Sequence • Combinations above • Relocation – who, when, where • Dynamic resource balancing

Scheduling of Parallel Processes • Gang scheduling • Requires tight-coupling (MPP’s) • Co-scheduling • Demand-based • False priority • Concurrent applications • Implicit • Busy wait to not relinquish cpu

RMS Challenges • Open Interfaces • Export load balancing/distributed capabilities • Export status info (load, job status, queues) • Control/assistance from application • Integration with other environments (MPI) • Extend functionality for special cases • API must be: simple, usable, abstract, robust

RMS Example: CODINE • CODINE/GRD • cod_qmaster: master daemon • cod_schedd: scheduler daemon • cod_execd: execution daemon • Continuously match utilization with policies • GRD monitors and adjusts resource usage correlated to all processes of a job • Feedback to adjust shares towards changing requirements

Static Scheduling Scheme

Dynamic Scheduling Scheme

RMS Example: PBS • Portable Batch Sysetm • Scheduler – job to node mapping, queues • Server – communications, logs • Control daemon (per node) – executive agent • Scope – single node • Job arrays • Task Management interface

RMS Example: Condor • Condor: a distributed job scheduler • Harvest idle workstations • Job scheduling and migration • Advertising mechanism • Both job and W/S advertise presence • Jobs advertise requirements (job description file) • W/S advertise their capabilities

Condor: Example JDF universe = vanilla # select runtime environment executable = some_job requirements = (Arch=="INTEL" && OpSys=="LINUX") rank = (Memory * 10000) + KFlops #target arguments = -verbose input = in.dat # redirect to stdin output = out.dat # redirect to stdout log = log.txt Queue # add job to queue

RMS: Condor (contd) • Universe • Vanilla: sequential apps (shared FS) • MPI, PVM: integrated with parallel environment • Globus: grid computing environment • Standard: enables process migration • Process migration • Reschedule higher priority job • User reclaims her W/S • Must be linked with a special library

RMS: Condor (contd) • Access to data • Shared file system • Condor file transfer mechanism • Automatically prefetch, postfetch • Remote I/O calls (in standard universe) • Architecture • Central manager • Server on each node

Known Condor Pools

Monitoring • Design choices • Centralized  decentralized • Periodic  request driven • Flat  hierarchical • Resolution of information • Focused view

Monitoring Example: Parmon • Features: • Online creation of Node and Group database • Component, Node, Group, or entire Cluster level • Monitoring of CPU, memory, disk and network, processes, log files etc • Facility to define events & automatic notification • Misc: message broadcast, remote admin, GUI

Load Balancing • Application  system • Static  dynamic  adaptive • Centralized  decentralized • Receiver initiated  sender initiated • Parallel applications • On-line nature

LB: Application Level • Application level • Round robin • Randomized • Recursive bisection • Other optimization • Hard to estimate execution times • Indeterminate no. of steps • Unpredictable load • Communication delays

LB: System Level • System level • Round robin • Randomized • Estimate run time • Specified by job description • Estimate from past experience

LB Example: MOSIX • Decentralized • Symmetric • Deterministic • Responsive • Stable • Competitive • Resources: CPU, memory, I/O

Load Balancing Over Network • Distribute workload or network traffic load across the cluster • Nodes may be interconnected among themselves • Must be connected to the balancing device • Processing nodes provide status information • current processor load • the application system load • number of active users • the availability of network protocol buffers • other specific resources

Load Balancing Over Network • Balancing device • monitors the status of all nodes • dictates where to direct the next job • a single unit or a group in tree hierarchy • use one or more algorithms or methods • static or dynamic setting • Decide which node gets the next incoming connection request

Factors in Network Balancing • Wire-speed processing • Node operating system limitation • Packet processing, no. of connections, interrupts • Balancing device limitations • Tables, memory • Session based traffic, non-session UDP • Application dependencies (affinity)

Simple Balancing Methods • Weighting • Assign weights to the nodes of different capacities • Randomization • Works good in identical node environment • Round-Robin • Commonly used by itself in DNS (address caching) • Effective where all the nodes in the cluster are identical in capacity and performance • Hashing • Packets from the same source address will always get assigned to the same server

Simple Balancing Methods • Least Connections • Assigns to the node which currently has the least connections ( ≠ least load ) • Minimum Misses • Assign to the nodes which has processed the least number of incoming request in its history • Fastest Response • Assigns to the node with the fastest response • Requires active monitoring of the individual nodes • Sending ICMP packets with the ‘ping’ command • Proprietary mechanism based upon UDP packets

Advanced Balancing Methods • Primary optimization vectors • Node traffic – predict volume of traffic • Network traffic – monitor node state • Node-load based balancing – (which load ?) • DNS load balancing - simple • Topology-based – reduce latency • Application-specific performance • Policy based optimization • Application ,bandwidth, admin, security

Common Errors • There are four common errors • Overflow • Underflow • Routing errors • Induced network errors • May destabilize efficient network clustering

Information Dissemination • Central  Decentralized • Load incurred on system • Processing load • Network load • Partial knowledge – gossip algorithms • Example: finding average load

Resource Management and Balancing