500 likes | 620 Views
Global Community. Slide Courtesy of Ian Foster. Resource Management in Grid Computing. AZIZOL ABDULLAH,PhD DEPARTMENT OF COMMUNICATION TECHNOLOGY AND NETWORK. Resource Management. What needs to be managed: Resources
E N D
Global Community Slide Courtesy of Ian Foster
Resource Management in Grid Computing AZIZOL ABDULLAH,PhD DEPARTMENT OF COMMUNICATION TECHNOLOGY AND NETWORK
Resource Management • What needs to be managed: Resources • Physical resources (computer, disks, databases, networks, scientific instruments). • Logical resources (jobs, executing applications, complex workflows etc.). • What is the Goal • Resources must be available and meet performance criteria.
Resource Management (Cont.) • What is Management: • The process of locating various types of capability, arranging for their use, utilizing them and monitoring their state. • Maintenance of resources and environment • Monitoring their state and performance • Reacting to internal and external changes in resource or its environment • Initiating routine operations: initialization, start/stop and tuning
What is Resource Management? • Mechanisms for locating and allocating computational resources • Authentication • Process creation • Remote job submission • Scheduling • Other resources that can be managed: • Memory • Disk • Networks
Resource Management Issues for Grid Computing • Site autonomy • Resources owned by different organizations, in different administrative domains • Local policies for use, scheduling, security • Heterogeneous substrate • Different local resource management systems • Policy extensibility • Local sites need ability to customize their resource management policies
More Issues for Grid Computing • Co-allocation • May need resources at several sites • Mechanism for allocating multiple resources, initiating computation, monitoring and managing • On-line control • Adapt application requirements to resource availability
Manageability • The ability of a resource to be managed • Manageability interfaces support common operations (control and monitor) • Manageability standards specify standard interfaces • Problem: • Existing interfaces are generally resource-specific • Almost impossible to add standard interfaces to legacy resources • New standards may require additional interfaces • Solution: • Common standards • Based on Service orientation, integration and virtualization.
Service orientation • Software services • A service provides some capability to its clients through message exchanges • represent the physical manageable entities • understand the unique interfaces for the entities they represent • implement applicable standard interfaces • Integration • Encapsulated application in services become Integratable building blocks
Service orientation (Cont.) • The management process • Manager invokes the operation (service’s standard interface) • Service performs operation on managed entity (resource’s unique interface) • Service returns result to manager (through the standard interface) • Problem • Need a common way to implement service • Solution: Web Services
MANAGER COMMON INTERFACES OTHER SERVICE PROVIDERS WEB SERVICES COMPUTERS TELESCOPES DISKS RESOURCE SPECIFIC INTERFACES Cluster Blades Mainframe R R R R R R R R R Virtualization PHYSICAL RESOURCES
Traditional Resource Management • Batch schedulers, workflow engines, operating systems • Designed and operated under the assumption that: • They have complete control over a resource • They can implement the mechanisms and policies needed for effective use of that resource in isolation • This is not the case for Grid Resource management • Separate administrative domains • Resource Hetrogeneity • Lack of control and difference policies
Grid Resource Management • What is Grid Resource Management? • Identifying application requirements, resource specification • Matching resources to applications • Allocating/scheduling and monitoring those resources and applications over time in order to run as effectively as possible.
Grid Resource Management (Cont.) • Challenges in Grid Resource Management • Resources are heterogeneous in nature • Processors, disks, data, networks, other services. • Application has to compete for resources • Lack of available data about current systems, needs of users, resource owners and administrators
Grid RM Mechanisms • Resource Information Dissemination • Published by the Resource(push) or gathered by GIS (pull) • On-demand dissemination (by agents) • Resource Discovery • Centralized or distributed quesries, agents, distributed queries + agents • Resources are described in schema/language or objects • Resource Scheduling/Job execution • Assigning resourses, centralized, hierarchical, distributed • Resource Monitoring and Re-Scheduling • Monitoring can be done by application (polling) or by resource (notification to the app or periodic status updates).
Grid Resouce Brokerage • Discovering suitable resources for user's job • Currently scenario: Manual or semi-manual • users manually target their work at the machine that is already known to them. • For larger grids, manual solution is not feasible • Solution is Grid Resource Broker: • The user describes their needs to a third party (software) • which searches for suitable resources, and passes the result(s) back to the user.
Grid Resouce Brokerage • Role of the Broker in a Management System • Resource descovery • Authorization filtering, Application definition, Minimum Requirement filtering • System Selection • Dynamic information gathering, system selection • Allocation and Advance reservation • Grid Information System • Organize a set of sensors on resources so that client or broker can have easy access to data (static or dynamic)
Matchmaking • Process of selecting resources based on application requirements • Symmetric matchmaking • Attribute-based matching • Resource provider and resource user have to agree on a schema, attribute names and value ranges • Syntax based like ClassAds • Asymmetric matchmaking • Ontology based matching • Ontologies, domain background knowledge, matchmaking rules
Specifying Resource and Job Requirements • Resource requirements: • Machine type • Number of nodes • Memory • Network • Job or scheduler parameters: • Directory • Executable • Arguments • Environment • Maximum time required
Resource and Job Specification • Globus: Resource Specification Language (RSL) • &(executable=myprog) (|(&(count=5)(memory>=64)) (&(count=10)(memory>=32))) • Condor: Classified ads • Resource owners advertise abilities and constraints • Applications advertise resource requests • Matchmaking: match offers & requests
Components of Globus Resource Management Architecture • Resource specification using RSL • Resource brokers: translate resource requirements into specifications • Co-allocators: break down requests for multiple sites • Local resource managers: apply local, site-specific resource management policies • Information about available compute resources and their characteristics
Resource Specification Language • Common notation for exchange of information between components • API provided for manipulating RSL
RSL Syntax • Elementary form: parenthesis clauses • (attribute op value [ value … ] ) • Operators Supported: • <, <=, =, >=, > , != • Some supported attributes: • executable, arguments, environment, stdin, stdout, stderr, resourceManagerContact,resourceManagerName • Unknown attributes are passed through • May be handled by subsequent tools
Constraints: “&” • For example: & (count>=5) (count<=10) (max_time=240) (memory>=64) (executable=myprog) • “Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours”
Multirequest: “+” • A multirequest allows us to specify multiple resource needs, for example + (& (count=5)(memory>=64) (executable=p1)) (&(network=atm) (executable=p2)) • Execute 5 instances of p1 on a machine with at least 64M of memory • Execute p2 on a machine with an ATM connection • Multirequests are central to co-allocation
Resource Broker • Takes high-level RSL specification • Transforms into concrete specifications through “specialization” process • Locate resources that meet requirements • Multiple brokers may service single request • Application-specific brokers translate application requirements • Output: complete specification of locations of resources; given to co-allocator
Examples of Resource Brokers • Nimrod-G • Automates creation and management of large parametric experiments • Run application under wide range of input conditions and aggregate results • Queries MDS to find resources • Generates number of independent jobs • GRAM allocates jobs to computational nodes • Higher-level broker: allows user to specify time and cost constraints
Examples of Resource Brokers • AppLeS • Application Level Scheduler • Map large number of independent tasks to dynamically varying pool of available computers • Use GRAM to locate resources and initiate and manage computation
Broker Co-allocator Resource Management Architecture RSL specialization RSL Application Information Service Queries & Info Ground RSL Simple ground RSL Local resource managers GRAM GRAM GRAM LSF EASY-LL NQE
Resource co-allocators • May request resources at multiple sites • Two or more computers and networks • Break multi-request into components • Pass each component to resource manager • Provide means for monitoring job status or terminating job • Complex: • Two or more resource managers • Global state like availability of resources difficult to determine
Different co-allocation services • Require all resources to be available before job proceeds; fail globally if failure occurs at any resource • Allocate at least N out of M resources and return • Return immediately, but gradually return more resources as they become available • Each useful for some class of applications
Concurrent Allocation • If advance reservations are available: • Obtain list of available time slots from each participating resource manager and choose timeslot • Without reservations: • Optimistically allocate resources • Hope desired set will be available at future time • Use information service (MDS) to determine current availability of resources • Construct RSL request that is likely to succeed • If allocation fails, all started jobs must be terminated
Disadvantages of Concurrent Allocation Scheme • Computational resources wasted while waiting for all requested resources to become available • Application must be altered to perform barrier to synchronize startup across components • Detecting failure of a resource is difficult, e.g. in queue-based local resource managers
Local Resource Managers • Implemented with Globus Resource Allocation Manager (GRAM) • Processing RSL specifications representing resource requests • Deny request • Create one or more processes (jobs) that satisfy request • Enable remote monitoring and management of jobs • Periodically update MDS information service with current availability and capabilities of resources
GRAM (cont.) • Interface between grid environment and entity that can create processes • E.g., Parallel scheduler or Condor pool • GRAM may schedule resource itself • More commonly, maps resource specification into a request to a local resource allocation mechanism • E.g., Condor, LoadLeveler, LSF • Co-exists with local mechanisms
GRAM (cont.) • GRAM API has functions for: • Submitting a job request: produces globally unique job handle • Canceling a job request • Asking when job request is expected to run • Upon submission, can request that progress be signaled asynchronously to callback URL
GRAM Scheduling Model • Jobs are either: • Pending: resources have not yet been allocated to the job • Active: resources allocated, job running • Done: when all processes have terminated and resources have been deallocated • Failed: job terminates due to : • explicit termination • error in request format • failure in resource management system • denial of access to resource
GRAM Components • Gatekeeper Responds to a request: • Performs mutual authentication of user and resource • Determines local user name for remote user • Starts a job manager that executes as local user and handles request
GRAM Components (cont.) • Job manager • Creates processes requested by user • Submits resource allocation requests to underlying resource management system (or does fork) • Monitors state of created processes • Notifies callback contact of state transitions • Implements control operations like termination
GRAM Components (cont.) • GRAM reporter Responsible for storing into MDS (information service) info about: • Scheduler structure • Support reservations? • Number of queues • Scheduler state • Currently active jobs • Expected wait time in queue • Total number of nodes and available nodes
Job Submission Interfaces • Globus Toolkit includes several command line programs for job submission • globus-job-run: Interactive jobs • globus-job-submit: Batch/offline jobs • globusrun: Flexible scripting infrastructure
Scheduling in Grid • Optimize Performance: execution time, throughput, fairness and etc. (QoS) • Load balancing. • Help to design an effective program model. • Ubiquity. process scheduling in operating system, task scheduling in parallel computing and scheduling in real life too.
Scheduling in GRID • Application level. • resource e.g. data, communication bandwidth. • Models, scheduling policy, program model, performance model, performance measurement. • Current performance measure, minimize execution time.
Requirements on GRID scheduling model • Adaptive to the dynamic environment. • Adaptive to the varying performance metrics upon the course of application execution. • Performance predictions over time. • Coarse and fine-tuning the component parameters.
Techniques commonly employed • Parameterize the components in an application. • Make use of dynamic information, e.g. CPU slots available percentage, network bandwidth available percentage. • Compositional scheduling model, structural character of application and dynamic interaction with grid environment.
Scheduling Policy • Choose a set of resources to achieve the performance goal. • Fist Come, First Serve. • Preemptive. • Fair Queuing. • And etc.
AppLes: Application-Level Scheduler • Everything evaluated in terms of the impact on the application, so the resources are evaluated in terms of the predicted capacities and their potential for requirements. • No resource manager is assumed. • On User-level, no specific privilege required. • Heterogeneous and cross organization. • Depends on use Network Weather Service for the dynamic resource load and availability.
AppLes(Cont’d) • Information gathered by the network weather service is used to parameterize performance models and to predict the state of grid resources at the time the application will be scheduled. • Time balancing, all processors are assigned some possibly nonuniform amount of the goal that they will all finish at roughly the same time. • Compositional component models is deployed. • Adaptive scheduling scheme.
Conclusion • Scheduling is the key for performance in grid environment. • Coordinating resources in grid environment • Most advanced grid application are targeted to specific resources. • High-Performance Scheduling Evolution.
Open issues • Multiple layers of schedulers • The higher level scheduler has less information about the remote resources, local resource managers actually control the resources • Lack of control over resources • Grid scheduler does not have ownership or control over the resources • Shared resources and variance • No dedicated access to the resources (resources are shared) • This results in a high degree of variance and unpredictability • Conflicting performance goals • Many participants have different/conflicting preferences • Many different local policies, cost models, security