1 / 50

Global Community

Global Community. Slide Courtesy of Ian Foster. Resource Management in Grid Computing. AZIZOL ABDULLAH,PhD DEPARTMENT OF COMMUNICATION TECHNOLOGY AND NETWORK. Resource Management. What needs to be managed: Resources

anakin
Download Presentation

Global Community

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Global Community Slide Courtesy of Ian Foster

  2. Resource Management in Grid Computing AZIZOL ABDULLAH,PhD DEPARTMENT OF COMMUNICATION TECHNOLOGY AND NETWORK

  3. Resource Management • What needs to be managed: Resources • Physical resources (computer, disks, databases, networks, scientific instruments). • Logical resources (jobs, executing applications, complex workflows etc.). • What is the Goal • Resources must be available and meet performance criteria.

  4. Resource Management (Cont.) • What is Management: • The process of locating various types of capability, arranging for their use, utilizing them and monitoring their state. • Maintenance of resources and environment • Monitoring their state and performance • Reacting to internal and external changes in resource or its environment • Initiating routine operations: initialization, start/stop and tuning

  5. What is Resource Management? • Mechanisms for locating and allocating computational resources • Authentication • Process creation • Remote job submission • Scheduling • Other resources that can be managed: • Memory • Disk • Networks

  6. Resource Management Issues for Grid Computing • Site autonomy • Resources owned by different organizations, in different administrative domains • Local policies for use, scheduling, security • Heterogeneous substrate • Different local resource management systems • Policy extensibility • Local sites need ability to customize their resource management policies

  7. More Issues for Grid Computing • Co-allocation • May need resources at several sites • Mechanism for allocating multiple resources, initiating computation, monitoring and managing • On-line control • Adapt application requirements to resource availability

  8. Manageability • The ability of a resource to be managed • Manageability interfaces support common operations (control and monitor) • Manageability standards specify standard interfaces • Problem: • Existing interfaces are generally resource-specific • Almost impossible to add standard interfaces to legacy resources • New standards may require additional interfaces • Solution: • Common standards • Based on Service orientation, integration and virtualization.

  9. Service orientation • Software services • A service provides some capability to its clients through message exchanges • represent the physical manageable entities • understand the unique interfaces for the entities they represent • implement applicable standard interfaces • Integration • Encapsulated application in services become Integratable building blocks

  10. Service orientation (Cont.) • The management process • Manager invokes the operation (service’s standard interface) • Service performs operation on managed entity (resource’s unique interface) • Service returns result to manager (through the standard interface) • Problem • Need a common way to implement service • Solution: Web Services

  11. MANAGER COMMON INTERFACES OTHER SERVICE PROVIDERS WEB SERVICES COMPUTERS TELESCOPES DISKS RESOURCE SPECIFIC INTERFACES Cluster Blades Mainframe R R R R R R R R R Virtualization PHYSICAL RESOURCES

  12. Traditional Resource Management • Batch schedulers, workflow engines, operating systems • Designed and operated under the assumption that: • They have complete control over a resource • They can implement the mechanisms and policies needed for effective use of that resource in isolation • This is not the case for Grid Resource management • Separate administrative domains • Resource Hetrogeneity • Lack of control and difference policies

  13. Grid Resource Management • What is Grid Resource Management? • Identifying application requirements, resource specification • Matching resources to applications • Allocating/scheduling and monitoring those resources and applications over time in order to run as effectively as possible.

  14. Grid Resource Management (Cont.) • Challenges in Grid Resource Management • Resources are heterogeneous in nature • Processors, disks, data, networks, other services. • Application has to compete for resources • Lack of available data about current systems, needs of users, resource owners and administrators

  15. Grid RM Mechanisms • Resource Information Dissemination • Published by the Resource(push) or gathered by GIS (pull) • On-demand dissemination (by agents) • Resource Discovery • Centralized or distributed quesries, agents, distributed queries + agents • Resources are described in schema/language or objects • Resource Scheduling/Job execution • Assigning resourses, centralized, hierarchical, distributed • Resource Monitoring and Re-Scheduling • Monitoring can be done by application (polling) or by resource (notification to the app or periodic status updates).

  16. Grid Resouce Brokerage • Discovering suitable resources for user's job • Currently scenario: Manual or semi-manual • users manually target their work at the machine that is already known to them. • For larger grids, manual solution is not feasible • Solution is Grid Resource Broker: • The user describes their needs to a third party (software) • which searches for suitable resources, and passes the result(s) back to the user.

  17. Grid Resouce Brokerage • Role of the Broker in a Management System • Resource descovery • Authorization filtering, Application definition, Minimum Requirement filtering • System Selection • Dynamic information gathering, system selection • Allocation and Advance reservation • Grid Information System • Organize a set of sensors on resources so that client or broker can have easy access to data (static or dynamic)

  18. Matchmaking • Process of selecting resources based on application requirements • Symmetric matchmaking • Attribute-based matching • Resource provider and resource user have to agree on a schema, attribute names and value ranges • Syntax based like ClassAds • Asymmetric matchmaking • Ontology based matching • Ontologies, domain background knowledge, matchmaking rules

  19. Specifying Resource and Job Requirements • Resource requirements: • Machine type • Number of nodes • Memory • Network • Job or scheduler parameters: • Directory • Executable • Arguments • Environment • Maximum time required

  20. Resource and Job Specification • Globus: Resource Specification Language (RSL) • &(executable=myprog) (|(&(count=5)(memory>=64)) (&(count=10)(memory>=32))) • Condor: Classified ads • Resource owners advertise abilities and constraints • Applications advertise resource requests • Matchmaking: match offers & requests

  21. Components of Globus Resource Management Architecture • Resource specification using RSL • Resource brokers: translate resource requirements into specifications • Co-allocators: break down requests for multiple sites • Local resource managers: apply local, site-specific resource management policies • Information about available compute resources and their characteristics

  22. Resource Specification Language • Common notation for exchange of information between components • API provided for manipulating RSL

  23. RSL Syntax • Elementary form: parenthesis clauses • (attribute op value [ value … ] ) • Operators Supported: • <, <=, =, >=, > , != • Some supported attributes: • executable, arguments, environment, stdin, stdout, stderr, resourceManagerContact,resourceManagerName • Unknown attributes are passed through • May be handled by subsequent tools

  24. Constraints: “&” • For example: & (count>=5) (count<=10) (max_time=240) (memory>=64) (executable=myprog) • “Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours”

  25. Multirequest: “+” • A multirequest allows us to specify multiple resource needs, for example + (& (count=5)(memory>=64) (executable=p1)) (&(network=atm) (executable=p2)) • Execute 5 instances of p1 on a machine with at least 64M of memory • Execute p2 on a machine with an ATM connection • Multirequests are central to co-allocation

  26. Resource Broker • Takes high-level RSL specification • Transforms into concrete specifications through “specialization” process • Locate resources that meet requirements • Multiple brokers may service single request • Application-specific brokers translate application requirements • Output: complete specification of locations of resources; given to co-allocator

  27. Examples of Resource Brokers • Nimrod-G • Automates creation and management of large parametric experiments • Run application under wide range of input conditions and aggregate results • Queries MDS to find resources • Generates number of independent jobs • GRAM allocates jobs to computational nodes • Higher-level broker: allows user to specify time and cost constraints

  28. Examples of Resource Brokers • AppLeS • Application Level Scheduler • Map large number of independent tasks to dynamically varying pool of available computers • Use GRAM to locate resources and initiate and manage computation

  29. Broker Co-allocator Resource Management Architecture RSL specialization RSL Application Information Service Queries & Info Ground RSL Simple ground RSL Local resource managers GRAM GRAM GRAM LSF EASY-LL NQE

  30. Resource co-allocators • May request resources at multiple sites • Two or more computers and networks • Break multi-request into components • Pass each component to resource manager • Provide means for monitoring job status or terminating job • Complex: • Two or more resource managers • Global state like availability of resources difficult to determine

  31. Different co-allocation services • Require all resources to be available before job proceeds; fail globally if failure occurs at any resource • Allocate at least N out of M resources and return • Return immediately, but gradually return more resources as they become available • Each useful for some class of applications

  32. Concurrent Allocation • If advance reservations are available: • Obtain list of available time slots from each participating resource manager and choose timeslot • Without reservations: • Optimistically allocate resources • Hope desired set will be available at future time • Use information service (MDS) to determine current availability of resources • Construct RSL request that is likely to succeed • If allocation fails, all started jobs must be terminated

  33. Disadvantages of Concurrent Allocation Scheme • Computational resources wasted while waiting for all requested resources to become available • Application must be altered to perform barrier to synchronize startup across components • Detecting failure of a resource is difficult, e.g. in queue-based local resource managers

  34. Local Resource Managers • Implemented with Globus Resource Allocation Manager (GRAM) • Processing RSL specifications representing resource requests • Deny request • Create one or more processes (jobs) that satisfy request • Enable remote monitoring and management of jobs • Periodically update MDS information service with current availability and capabilities of resources

  35. GRAM (cont.) • Interface between grid environment and entity that can create processes • E.g., Parallel scheduler or Condor pool • GRAM may schedule resource itself • More commonly, maps resource specification into a request to a local resource allocation mechanism • E.g., Condor, LoadLeveler, LSF • Co-exists with local mechanisms

  36. GRAM (cont.) • GRAM API has functions for: • Submitting a job request: produces globally unique job handle • Canceling a job request • Asking when job request is expected to run • Upon submission, can request that progress be signaled asynchronously to callback URL

  37. GRAM Scheduling Model • Jobs are either: • Pending: resources have not yet been allocated to the job • Active: resources allocated, job running • Done: when all processes have terminated and resources have been deallocated • Failed: job terminates due to : • explicit termination • error in request format • failure in resource management system • denial of access to resource

  38. GRAM Components • Gatekeeper Responds to a request: • Performs mutual authentication of user and resource • Determines local user name for remote user • Starts a job manager that executes as local user and handles request

  39. GRAM Components (cont.) • Job manager • Creates processes requested by user • Submits resource allocation requests to underlying resource management system (or does fork) • Monitors state of created processes • Notifies callback contact of state transitions • Implements control operations like termination

  40. GRAM Components (cont.) • GRAM reporter Responsible for storing into MDS (information service) info about: • Scheduler structure • Support reservations? • Number of queues • Scheduler state • Currently active jobs • Expected wait time in queue • Total number of nodes and available nodes

  41. Job Submission Interfaces • Globus Toolkit includes several command line programs for job submission • globus-job-run: Interactive jobs • globus-job-submit: Batch/offline jobs • globusrun: Flexible scripting infrastructure

  42. Scheduling in Grid • Optimize Performance: execution time, throughput, fairness and etc. (QoS) • Load balancing. • Help to design an effective program model. • Ubiquity. process scheduling in operating system, task scheduling in parallel computing and scheduling in real life too.

  43. Scheduling in GRID • Application level. • resource e.g. data, communication bandwidth. • Models, scheduling policy, program model, performance model, performance measurement. • Current performance measure, minimize execution time.

  44. Requirements on GRID scheduling model • Adaptive to the dynamic environment. • Adaptive to the varying performance metrics upon the course of application execution. • Performance predictions over time. • Coarse and fine-tuning the component parameters.

  45. Techniques commonly employed • Parameterize the components in an application. • Make use of dynamic information, e.g. CPU slots available percentage, network bandwidth available percentage. • Compositional scheduling model, structural character of application and dynamic interaction with grid environment.

  46. Scheduling Policy • Choose a set of resources to achieve the performance goal. • Fist Come, First Serve. • Preemptive. • Fair Queuing. • And etc.

  47. AppLes: Application-Level Scheduler • Everything evaluated in terms of the impact on the application, so the resources are evaluated in terms of the predicted capacities and their potential for requirements. • No resource manager is assumed. • On User-level, no specific privilege required. • Heterogeneous and cross organization. • Depends on use Network Weather Service for the dynamic resource load and availability.

  48. AppLes(Cont’d) • Information gathered by the network weather service is used to parameterize performance models and to predict the state of grid resources at the time the application will be scheduled. • Time balancing, all processors are assigned some possibly nonuniform amount of the goal that they will all finish at roughly the same time. • Compositional component models is deployed. • Adaptive scheduling scheme.

  49. Conclusion • Scheduling is the key for performance in grid environment. • Coordinating resources in grid environment • Most advanced grid application are targeted to specific resources. • High-Performance Scheduling Evolution.

  50. Open issues • Multiple layers of schedulers • The higher level scheduler has less information about the remote resources, local resource managers actually control the resources • Lack of control over resources • Grid scheduler does not have ownership or control over the resources • Shared resources and variance • No dedicated access to the resources (resources are shared) • This results in a high degree of variance and unpredictability • Conflicting performance goals • Many participants have different/conflicting preferences • Many different local policies, cost models, security

More Related