290 likes | 542 Views
Resource Management. Reading: “A Resource Management Architecture for Metacomputing Systems”. What is Resource Management?. Mechanisms for locating and allocating computational resources Authentication Process creation Remote job submission Scheduling
E N D
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
What is Resource Management? • Mechanisms for locating and allocating computational resources • Authentication • Process creation • Remote job submission • Scheduling • Other resources that can be managed: • Memory • Disk • Networks
Resource Management Issues for Grid Computing • Site autonomy • Resources owned by different organizations, in different administrative domains • Local policies for use, scheduling, security • Heterogeneous substrate • Different local resource management systems • Policy extensibility • Local sites need ability to customize their resource management policies
More Issues for Grid Computing • Co-allocation • May need resources at several sites • Mechanism for allocating multiple resources, initiating computation, monitoring and managing • On-line control • Adapt application requirements to resource availability
Specifying Resource and Job Requirements • Resource requirements: • Machine type • Number of nodes • Memory • Network • Job or scheduler parameters: • Directory • Executable • Arguments • Environment • Maximum time required
Resource and Job Specification • Globus: Resource Specification Language (RSL) • &(executable=myprog) (|(&(count=5)(memory>=64)) (&(count=10)(memory>=32))) • Condor: Classified ads • Resource owners advertise abilities and constraints • Applications advertise resource requests • Matchmaking: match offers & requests
Components of Globus Resource Management Architecture • Resource specification using RSL • Resource brokers: translate resource requirements into specifications • Co-allocators: break down requests for multiple sites • Local resource managers: apply local, site-specific resource management policies • Information about available compute resources and their characteristics
Resource Specification Language • Common notation for exchange of information between components • API provided for manipulating RSL
RSL Syntax • Elementary form: parenthesis clauses • (attribute op value [ value … ] ) • Operators Supported: • <, <=, =, >=, > , != • Some supported attributes: • executable, arguments, environment, stdin, stdout, stderr, resourceManagerContact,resourceManagerName • Unknown attributes are passed through • May be handled by subsequent tools
Constraints: “&” • For example: & (count>=5) (count<=10) (max_time=240) (memory>=64) (executable=myprog) • “Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours”
Multirequest: “+” • A multirequest allows us to specify multiple resource needs, for example + (& (count=5)(memory>=64) (executable=p1)) (&(network=atm) (executable=p2)) • Execute 5 instances of p1 on a machine with at least 64M of memory • Execute p2 on a machine with an ATM connection • Multirequests are central to co-allocation
Resource Broker • Takes high-level RSL specification • Transforms into concrete specifications through “specialization” process • Locate resources that meet requirements • Multiple brokers may service single request • Application-specific brokers translate application requirements • Output: complete specification of locations of resources; given to co-allocator
Examples of Resource Brokers • Nimrod-G • Automates creation and management of large parametric experiments • Run application under wide range of input conditions and aggregate results • Queries MDS to find resources • Generates number of independent jobs • GRAM allocates jobs to computational nodes • Higher-level broker: allows user to specify time and cost constraints
Examples of Resource Brokers • AppLeS • Application Level Scheduler • Map large number of independent tasks to dynamically varying pool of available computers • Use GRAM to locate resources and initiate and manage computation
Resource co-allocators • May request resources at multiple sites • Two or more computers and networks • Break multi-request into components • Pass each component to resource manager • Provide means for monitoring job status or terminating job • Complex: • Two or more resource managers • Global state like availability of resources difficult to determine
Different co-allocation services • Require all resources to be available before job proceeds; fail globally if failure occurs at any resource • Allocate at least N out of M resources and return • Return immediately, but gradually return more resources as they become available • Each useful for some class of applications
Concurrent Allocation • If advance reservations are available: • Obtain list of available time slots from each participating resource manager and choose timeslot • Without reservations: • Optimistically allocate resources • Hope desired set will be available at future time • Use information service (MDS) to determine current availability of resources • Construct RSL request that is likely to succeed • If allocation fails, all started jobs must be terminated
Disadvantages of Concurrent Allocation Scheme • Computational resources wasted while waiting for all requested resources to become available • Application must be altered to perform barrier to synchronize startup across components • Detecting failure of a resource is difficult, e.g. in queue-based local resource managers
Local Resource Managers • Implemented with Globus Resource Allocation Manager (GRAM) • Processing RSL specifications representing resource requests • Deny request • Create one or more processes (jobs) that satisfy request • Enable remote monitoring and management of jobs • Periodically update MDS information service with current availability and capabilities of resources
GRAM (cont.) • Interface between grid environment and entity that can create processes • E.g., Parallel scheduler or Condor pool • GRAM may schedule resource itself • More commonly, maps resource specification into a request to a local resource allocation mechanism • E.g., Condor, LoadLeveler, LSF • Co-exists with local mechanisms
GRAM (cont.) • GRAM API has functions for: • Submitting a job request: produces globally unique job handle • Canceling a job request • Asking when job request is expected to run • Upon submission, can request that progress be signaled asynchronously to callback URL
GRAM Scheduling Model • Jobs are either: • Pending: resources have not yet been allocated to the job • Active: resources allocated, job running • Done: when all processes have terminated and resources have been deallocated • Failed: job terminates due to : • explicit termination • error in request format • failure in resource management system • denial of access to resource
GRAM Components • Gatekeeper Responds to a request: • Performs mutual authentication of user and resource • Determines local user name for remote user • Starts a job manager that executes as local user and handles request
GRAM Components (cont.) • Job manager • Creates processes requested by user • Submits resource allocation requests to underlying resource management system (or does fork) • Monitors state of created processes • Notifies callback contact of state transitions • Implements control operations like termination
GRAM Components (cont.) • GRAM reporter Responsible for storing into MDS (information service) info about: • Scheduler structure • Support reservations? • Number of queues • Scheduler state • Currently active jobs • Expected wait time in queue • Total number of nodes and available nodes
Broker Co-allocator Resource Management Architecture RSL specialization RSL Application Information Service Queries & Info Ground RSL Simple ground RSL Local resource managers GRAM GRAM GRAM LSF EASY-LL NQE
Job Submission Interfaces • Globus Toolkit includes several command line programs for job submission • globus-job-run: Interactive jobs • globus-job-submit: Batch/offline jobs • globusrun: Flexible scripting infrastructure