340 likes | 356 Views
Explore the architecture and toolkit for enforcing service level agreements (SLAs) in grid environments, with implementations for GT3 and GT4. Evaluate novel capabilities and research problems addressed by GRUBER.
E N D
GRUBER: A Grid Resource Usage SLA Broker Catalin L. Dumitrescu The University of Chicago Ian Foster Argonne National Laboratory & The University of Chicago
Introduction • Large distributed Grid systems pose new challenges • Overwhelming resource characteristics • Complex workload characteristics • Complex interactions and resource allocations • Automated resource discovery and usage SLA enforcement represent important elements GRUBER: A Grid Resource Usage SLA Broker
Talk Outline / Part I • Part I: • Introduction • Our Approach: GRUBER • Motivating Scenarios • Architecture • Part II: • Evaluation Metrics • Experimental Results • Conclusions and Questions GRUBER: A Grid Resource Usage SLA Broker
Our Approach: GRUBER • GRUBER: an architecture and toolkit for resource usage service level agreement (SLA) specification and enforcement in a grid environment • GT3 and GT4 based implementations • Able to handle as many clients (submission hosts) as the GTx container’s performance permits GRUBER: A Grid Resource Usage SLA Broker
A bit of History • Started in the context of Grid3 as monitoring engine • Evolved in a simple site recommendation engine • Later where added additional capabilities such as: • Enforcement components • Complex Usage SLAs and specification interfaces GRUBER: A Grid Resource Usage SLA Broker
GRUBER Novelty • Handles: • Sites with RMs • VO and groups • Submission hosts • Model usage allocations (SLAs) at several levels • Capacity: • to collect monitoring metrics from a grid • to make various decisions based on this information • To enforce complex SLAs by various means GRUBER: A Grid Resource Usage SLA Broker
Environment Overview GRUBER: A Grid Resource Usage SLA Broker
Environment Details • Target environments with: • large number of resources • resource owners • VOs where usage SLAs are required to handle resource utilisations • A few examples are: • Grid3 • OSG • TeraGrid • DataGrid GRUBER: A Grid Resource Usage SLA Broker
Research Problems • “How usage SLAs are handled in grid environments?” • What is the gain for taking in account such usage SLAs?” GRUBER: A Grid Resource Usage SLA Broker
Motivating Scenario • Controlled resource sharing is important because each participant wants to ensure that its goals are achieved • Three dimensions in the usage policy space: • resource providers (sites, VOs, groups) • resource consumers (VOs, groups, users), • time. • Provider policies make resources available to consumers for specified time periods. GRUBER: A Grid Resource Usage SLA Broker
Main Players & Elements • Owners: want convenient and flexible mechanisms for expressing the policies that determine how many resources are allocated to different purposes • User and group jobs: are the main interested parties in resources provided by sites and resources • Algorithms and policies: capture how jobs are assigned to host machines GRUBER: A Grid Resource Usage SLA Broker
Problem Domain • A grid consists of: • a set of resource provider sites:each contains a number of processors and some amount of disk space • a three-level hierarchy of users, groups, and VOs:each user is a member of exactly one group, and each group is member to exactly one VO • a set of submithosts and jobs: specified by four attributes: VO, Group, Required-Processor-Time, Required-Disk-space GRUBER: A Grid Resource Usage SLA Broker
Problem Domain – cont. • A grid consists of (cont.): • Usage SLAs: • site policy statement: defines site usage SLAs by specifying the number of processors and amount of disk space that sites make available to different VOs; • VO policy statement: defines VO usage SLAs by specifying the fraction of the VO’s total processor and disk resources (i.e., the aggregate of contributions to that VO from all sites) that the VO makes available to different groups. GRUBER: A Grid Resource Usage SLA Broker
GRUBER Architecture • Engine: implements various algorithms for detecting available resources and maintains a generic view of resource utilization in the grid • Site monitoring component:is one of the data providers for the GRUBER engine • Site selectors:are tools that communicate with the GRUBER engine and provide answers to the question: “which is the best site at which I can run this job?” • Queue manager:is a complex GRUBER client that must reside on a submitting host GRUBER: A Grid Resource Usage SLA Broker
GRUBER Picture GRUBER: A Grid Resource Usage SLA Broker
GRUBER Engine • If fewer waiting jobs at a site than available CPUs, then GRUBER assumes the job will start right away if an extensible usage policy is in place • If more waiting jobs than available CPUs or if an extensible SLA is not in place, then it considers: • if the VO is under its allocation, GRUBER assumes that a new job can be started (in a time that depends on the local resource manager type) • if the VO is over its allocation, GRUBER assumes that a new job cannot be started (the running time is unknown for the jobs already running) GRUBER: A Grid Resource Usage SLA Broker
GRUBER QM/SiteSel • QM is responsible for determining how many jobs per VO or VO group can be scheduled at a certain moment in time and when to release them • Job assignment and enforcement components are part of GRUBER • The site selector component answers: “Where is best to run next?”, while the queue manager answers: “How many jobs should group Gm of VOn V be allowed to run?” and “When to start these jobs?” GRUBER: A Grid Resource Usage SLA Broker
Disk Space Considerations • Introduces additional complexities • A file that has been staged to a site cannot be “delayed,” it can only be deleted. Yet deleting a file that has been staged for a job can result in livelock, if a job’s files are repeatedly deleted before the job runs • So far, we have considered a UNIX quota-like approach GRUBER: A Grid Resource Usage SLA Broker
Usage SLA Language • Based on Maui’s semantics and WS-Agreement syntax • Allocations are made for processor time, permanent storage, or network bandwidth resources, and there are at least two-levels of resource assignments: to a VO, by a resource owner, and to a VO user or group, by a VO. • e.g., VO0 15.5, VO1 10.0+, VO2 5.0-. GRUBER: A Grid Resource Usage SLA Broker
Screenshot: Site Selection GRUBER: A Grid Resource Usage SLA Broker
Screenshot: VO Usage SLA GRUBER: A Grid Resource Usage SLA Broker
Screenshot: VO Verifier GRUBER: A Grid Resource Usage SLA Broker
Talk Outline / Part II • Part I: • Introduction • Our Approach: GRUBER • Motivating Scenarios • Architecture • Part II: • Evaluation Metrics • Experimental Results • Conclusions and Questions GRUBER: A Grid Resource Usage SLA Broker
Evaluation Metrics • Comp: percentage of jobs completed successfully • Replan: number of re-planning operations • Time: total execution time for the workload • Util: average resource utilization: Util = Σ i=1..N ETi / (#cpus * Δt) * 100.00 • Delay is average time per job: Delay = Σi=1..N DTi / #jobs GRUBER: A Grid Resource Usage SLA Broker
Experimental Settings • A single job type in all experiments: the sequence analysis program BLAST • A single BLAST job has: • execution time of about an hour • about 10-33 kilobytes of input reads • about 0.7-1.5 megabytes of output • Various configurations: • 1x1K: 1000 independent BLAST jobs • 4x1K: the 1x1K workload is run in parallel from four hosts • each job can be re-planed at most four times GRUBER: A Grid Resource Usage SLA Broker
Experimental Environment • All experiments on Grid3 (December 2004) • Comprises around 30 sites across the U.S., of which we used 15 • Each site is autonomous and managed by different local resource managers, such as Condor, PBS, and LSF • Each site enforces different usage policies which are collected by our site SLA observation point and used in scheduling workloads GRUBER: A Grid Resource Usage SLA Broker
Results Least Used Site Assignment Policy GRUBER: A Grid Resource Usage SLA Broker
4x1k – Completion vs. Time GRUBER: A Grid Resource Usage SLA Broker
Result’s Variance GRUBER: A Grid Resource Usage SLA Broker
SiteSel Comparisons GRUBER: A Grid Resource Usage SLA Broker
Related Work • Fair share scheduling strategies developed for mainframes • SHARP • SPHINX • CREMONA GRUBER: A Grid Resource Usage SLA Broker
Conclusions about GRUBER • the experiments we performed with several approaches in task assignment policies showed initial GRUBER performance in scheduling jobs • GRUBER is an architecture and toolkit for resource usage SLAs specification and enforcement in a grid-like environment • Open Problems: • over-subscribed local resources, in the sense of a local policy that states that 40% of the local CPU power is available to VO1 and 80% is available to VO2 • hierarchic grouping and allocation of resources based on policy GRUBER: A Grid Resource Usage SLA Broker
Addressed Questions • “How usage SLAs are handled in grid environments?” • What is the gain for taking in account such usage SLAs?” GRUBER: A Grid Resource Usage SLA Broker
Thanks Questions? GRUBER: A Grid Resource Usage SLA Broker