Service Level Agreements and QoS: what do we measure and why? Omer F. Rana

Service Level Agreements and QoS: what do we measure and why?Omer F. Rana o.f.rana@cs.cardiff.ac.uk School of Computer Science, Cardiff University, UK Welsh eScience Centre, UK

QoS Management • QoS has been explored in: • Computer Networks • Bandwidth, Delay, Packet loss rate and Jitter. • Multimedia Applications • Frame rate and computation resource. • Grid Computing • Network QoS, computation and storage requirements.

Continue … • QoS management: • Covers a range of different activities, from resource specification, selection and allocation through to resource release. • QoS system should address the following: • Specifying QoS requirements • Mapping of QoS requirements to resource capability • Negotiating QoS with resource owners • Establishing contracts / SLAs with clients • Reserving and allocating resources • Monitoring parameters associated with QoS sessions • Adapting to varying resource quality characteristics • Terminating QoS sessions • User Expectations vs. Resource Management

Interactive sessions Computation steering (control parameters & data exchange) Interactive visualization (visualization & simulations services) Response within a limited time span Co-scheduling or co-location support When QoS is needed? From SCIRun, University of Utah • Application QoS • User perception, response time, appl. Security, etc. • Middleware QoS • Comp., Memory and Storage • Network QoS • BW, Packet loss, Delay, Jitter

What is a Service Level Agreement (SLA)? A relationship between a client and provider in the context of a particular capability (service) provision Provider Client Can you do X for me for Y in return? SLA SLA SLA-Offer SLA-Accept SLA-Reject Yes P2P Search, Directory Service Distinguish between: Discovery of suitable provider Establishment of an SLA

What is an SLA? Provider Client Can you do X for me for Y in return? SLA SLA SLA-Offer SLA-CounterOffer No, but I can do Z for Y SLA-Accept SLA-Reject Accept

What is an SLA? Provider Client Can you do X for me for Y in return? SLA SLA SLA-Offer SLA-CounterOffer No Negotiation Phase (Single or Multi-Round) Dependency SLA-Offer Can you do Z for me for Y in return?

Providers Client SLA Variations Providers Client SLA SLA SLA dependencies For an SLA to be valid, another SLA has to be agreed (e.g. co-allocation) Multi-provider SLA Single SLA is divided across multiple providers (e.g. workflow composition)

QoS Context • QoS: • Per Service • Per Workflow (Set of Services) • Workflow: • Aggregating metrics across services • Support through some workflow enactor

What is an SLA? • Dynamically established and managed relationship between two parties • Objective is “delivery of a service” by one of the parties in the context of the agreement • Delivery involves: • Functional • QoS Properties www.utilityserve.com SLA

Forming the Agreement • Distinguish between: • Agreement itself • Mechanisms that lead to the formation of the agreement • Mechanisms that lead to agreement: • Negotiation (single or multi-shot) • One-shot creation • Policy-based creation of agreements, etc.

WS-Agreement Agreement Information about Agreement Initiator Responder Expiration Time Name/ID Context Information about Service Service Description Terms (generally, these are domain dependent) Terms Composition Service Terms Information about Service Level Service Level Objectives, Qualifying Conditions for the agreement to be valid, Penalty Terms, etc Guarantee Terms

WS-Agreement Terms From: Viktor Yarmolenko (U Manchester)

SLA Life Cycle • Identify Provider • On completion of a discovery phase • Define SLA • Define what is being requested • Agree on SLA terms • Agree on Service Level Objectives • Monitor SLA Violation • Confirm whether SLO’s are being violated • Destroy SLA • Expire SLA • Penalty for SLA Violation

CATNETs: Metrics “Pyramid”

SLA Classes • Guaranteed • constraints to be exactly observed • SLA is precisely/exactly defined • adaptation algorithm/optimization heuristics • Controlled-load • some constraints may be observed • Range-oriented SLA • optimization heuristics • Best-effort • any resources will do • no adaptation support

Adaptation Algorithm • Assume a total capacity: C= CG + CA + CB where G, A & B denotes ‘guaranteed,’ ‘adaptive’ and ‘best effort’ • Adapt(c(u,t), g(u)) Net capacity NG(t) = CG(t) – If NG(t) < 0;(guarantees can’t be honoured) Then ADD ( - CG(t)) from A to G ADD (CA(t) – [ - CG(t)])from A to B • Before invoking the adaptive function: • Ensuring that the request at time (t) g(u) --- (SLA) • Ensuring that CG CG(t): Capacity in guaranteed block at time t; g(u): guarantee to user user “u”

G A B G A B G A B G A B G A B SLA Adaptation Aim: compensation for QoS degradation for ‘guaranteed’ class only • Assume capacityTotal: C= CG + CA + CB • ‘best effort’ can uses the adaptive capacity, as long as its not used by the ‘guaranteed’ • When QoS degrades for ‘guaranteed’ • Then adaptive is utilized to compensate for the degradation • ‘best effort’ can still utilize the remaining capacity of the adaptive, as long as its not used by the ‘guaranteed’ • When the congested capacity is restored, the adaptive capacity can be used entirely by the ‘best effort’ • Before invoking the adaptive function: • Ensuring that the request at time (t) the agreed upon in the SLA • Ensuring that the total capacities within all SLAs at time (t) CG

SLA Adaptation • If we define a set: QoS = { a1, a2, ..., an },where each ai represents a different parameter of interestsuch as CPU, network bandwidth, etc. • Then we could compare sets: QoSx = { a1x, ..., anx } and QoSy = {a1y, ..., any } • QoS values specified in SLA as: ay ai ax Or ai={ x, y, z } • Assume the existence of a cost function: Cost(ai) = ci* ai • Then: Service_ Cost (QoS) = • The proposed heuristic: Total Cost = max n is the total number of active services The QoS broker implements this heuristic by varying the resource quality selection, based on the agreed upon in the SLA, aiming to Optimize resource utilization and maximize the overall cost. This heuristic could be mapped to the Generalized Assignment Problem (GAP)

Resources Policy Manager Allocation Manager Reservation Manager QoS Grid Service Grid QoS service interface Grid Node

Main components • Policy Manager • To provide dynamic info about the domain-specific resource characteristics and policy • Reservation Manger • To provide advance/immediate resource reservation • Data structure contains reservation entries • Interact with policy manager for resource char. • Allocation Manger • To interact with the underlying resource manager for resource allocation (e.g DSRT, Bandwidth Broker)

Grid node 1 Grid node 2 Grid node 3 Policy Policy Policy Allocation Allocation Allocation Reservation Reservation Reservation QoS service QoS service QoS service UDDIe QoS Broker QoS Discovery Client's Appl. SLA SLA SLA Joint work with Argonne National Lab. (Gregor von Lazweski et al.)

Reservation Approaches • Resource reservation / allocation based on two strategies: • Time-domain: reserve the whole ‘compute’ power of Grid node. • Guaranteed exclusive access • Resource-domain: reserve a CPU slot of the Grid node. • Shared access – guaranteed resource capacity • Suitable for light weight applications/services.

Best Effort

Guaranteed

Service Level Agreements and QoS: what do we measure and why? Omer F. Rana