590 likes | 916 Views
Job Scheduling: Market-based and Proportional Sharing. Richard Dutton CSci 780 – P2P and Grid Systems November 22, 2004. Importance. Computing and storage resources are being shared among many users Grids, utility computing, “cluster utilities”
E N D
Job Scheduling: Market-based and Proportional Sharing Richard Dutton CSci 780 – P2P and Grid Systems November 22, 2004
Importance • Computing and storage resources are being shared among many users • Grids, utility computing, “cluster utilities” • Sharing can improve resource efficiency and flexibility and bring more computing power • Difficulty comes in controlling resource usage
Outline • Market-based framework • Motivation/Problem • Characteristics of approach • Interposed Proportional Sharing • Goals • Methodology • Experimentation/Results • Conclusions • Discussion
Motivation • Grids enable sharing of resources • Users have access to more resources • Resource usage must be arbitrated for competing demands • Both parties (resource provider and consumer) must benefit • Provider – effective global use of resources • Consumer – fairness, predictable behavior, control over relative priority of jobs
Motivation(2) • Why market-based approach? • Large scale of resources and participants in Grid • Varying supply and demand • Market-based approach is good because • Provides decentralized resource management • Selfish consumers lead to global goal • User gets jobs done quickly, providers efficiently delegate resources • Laissez faire
Ideas from batch systems • Batch systems incorporate: • User priority • Weighted proportional sharing • SLAs for setting bounds on available resources • Additionally, market-based approach will use relative urgency and cost of jobs
Value-based system • Value-based (sometimes called user-centric) scheduling allows users to define a value (yield or utility) to each job • System is trying to maximize total value of jobs completed rather than simply meeting deadlines or reaching a certain throughput • Users bid for resources and pay with the value (value currency) • System sells to highest “bidder” to maximize profits
Risk vs. Reward • Focus here is scheduling in grid service sites • Since price is derived from completion time, scheduler must take into consideration length of a task with its value and opportunity cost • What this means: scheduler must balance the risk of deferring a task with the reward of scheduling the task
Example: Market-Based Task Service • Tasks are batch computation jobs • Self-contained units of work • Execute anywhere • Consume known resources • Tasks give some value upon completion • Tasks associated with value function – gives value as function of completion time
Example: Market-Based Task Service • Characteristics of a market-based task service • Negotiation between customers and providers • Value price and quality of service completion time • Form contracts for task execution • Not meeting terms of the contract implies a penalty • Consumers look for the best deal and each site attempts to maximize its profits
Market Framework Bid (value, service demand) Accept (completion time, price) Accept (contract) Customer Bid (value, service demand) Reject Bid (value, service demand) Accept (completion time, price) Reject Task Service Sites
Goals • Develop policy choices for the task service sites to maximize profits • Acceptance – admission control • Scheduling • Use value metric to balance risk and reward in bidding and scheduling • Not concerned with currency supply, pricing systems, incentive mechanisms, payment…
Value functions • Negotiation between site and bidder establishes agreement on price and QoS • Value function maps service quality (completion time) to value • Want the formulation to be “simple, rich, and tractable” • Generalization of linear decay value functions from Millenium • Expresses how value of task degrades with time – decayi
Maximum Value Runtime Value Penalty Time Value function
Decisions • Which tasks to admit? • When to run an admitted task? • Wish to maximize profit • How much should a task be charged? • Based on value functions • Must find highest priced tasks and reject those which do not meet some minimum levels
Experimental Methodology • Simulator to allow bidding and schedule according to a task service economy with linear value functions • Use synthetic traces that are representative of real batch workloads • Compare against FirstPrice from Millenium • Concerned with relative performance and sensitivity analysis of using value and decay
Risk/Reward Heuristics • Discounting Future Gains • Leans toward shorter tasks – less likely to be preempted • Realizes gains more quickly with short tasks – risk-averse scheduler • Opportunity Cost – takes into account the slope of decay • Leans toward more urgent tasks • If all tasks must be completed, it is best to complete most urgent tasks first
Discounting Future Gains • Based on Present Value from finance • PVi = yieldi / (1 + (discount_rate * RPTi)) • PVi can be thought of as investment value • Interest is earned at discount_rate for Remaining Processing Time (RPT) • High discount_rate causes the system to be more risk-averse • HeuristiccalledPV selects jobs in order of discounted gain PVi/RPTi
Opportunity Cost • Extendedheuristictoaccountforlossesfromopportunitycost • Loss in revenue from choosing some task i before task j • Opportunity cost to start i is given by aggregate loss of all other competing tasks • Bounded penalties require O(n2) time • Unbounded penalties computed in O(log n)
Balancing Gains and Opportunity Cost • It is risky to defer gains from high-value task based solely on opportunity cost • Solution: FirstReward • rewardi = ((α)*PVi – (1-α)*costi)/RPTi • The αparameter controls how much system considers expected gains • α=1 and discount_rate =0 reduces FirstReward to PV
Bounded Penalties • Shows it is more important to consider costs than gains low alpha • Most effective around α=.3
Unbounded Penalties • Shows it is ONLY important to consider costs, not gains • Magnitude of improvements much greater
Negotiation • Client submits task bids • Site accepts/rejects bid • If site accepts, it negotiates to set a price and completion time
Admission Control • Steps for proposed tasks • Integrate task into candidate schedule according to FirstReward • Determine yield for the task if accepted • Apply acceptance heuristic to determine acceptability • If accepting, issue a bid to the client • If client accepts the contract, place task into schedule to execute • Acceptance heuristic based on amount of additional delay the task can allow before its value falls below some yield threshold
Summary of Market-based Services • Develops heuristics for scheduling and admission control in market-based grid task service • Value-based scheduling allows user to specify the value and urgency of the job • Maximizing user value in turn maximizes yield globally • Approach based on computational economy
Overview • This paper deals with share-based scheduling algorithms for differentiated service in network services, in particular storage service utilities • Allows a server to be shared among many request flows with some probabilistic assurance of receiving some minimum share of resources
Situation • Sharing of resources must be fair • SLA’s often define contractual obligations between client and service
Goals of This Research • Performance isolation • A surge from one flow should not degrade the performance of another flow • Differentiated application service quality • Performance should be predictable and configurable • Non-intrusive resource control • Designed to work without changes to existing services, like commercial storage servers • Views server as a black box • Control server resources externally
Idea in Words • As the name suggests, the idea is to interpose a request scheduler between the client and server • The scheduler will intercept requests to the server • Depending on the request and state of previous requests, it will delay, reorder, or simply dispatch the request
Interposed Request Scheduling • Scheduler intercepts requests • Dispatches according to some policies seeking to fairly share resources among all flows • Parameter D limits maximum number of outstanding requests • Each flow has separate queue • Scheduler dispatches from each queue on FIFO basis
Related Approaches • Façade proposes an interposed request scheduler that uses Earliest Deadline First(EDF) • Drawback: unfair – cannot provide performance isolation • Uses priority scheduling to achieve isolation • SLEDS – per-client network adapter • Uses leaky bucket filter to shape and throttle I/O flows • Not work-conserving
Proportional Sharing • Proposes 3 proportional sharing algorithms • SFQ(D) – Start-time Fair Queuing • FSFQ(D) – Four-tag Start-time Fair Queuing • RW(D) – Request Windows which are general and configurable solutions that provide • Performance isolation • Fairness • Work-conservation
Fair Queuing • Each flow is assigned a weight Φf • Resources allocated among active flows in proportion to weight • A flow is active if it has at least 1 outstanding request • Fair: Proven property bounding difference in work done for any pair of active flows (lag) • Work-conserving: surplus resources consumed by active flows without penalty
Start-time Fair Queuing • Start-time Fair Queuing(SFQ) is the basis for the scheduling algorithms due to fairness • SFQ assigns a tag to each request upon arrival and dispatches the requests in ascending order of tags • Fairness stems from method of computing and assigning tags
SFQ • Assigns a start tag and finish tag for each request • Start tag: • Finish tag: • Defines a system notion of virtual time v(t) that advances as active flows progress • For example, v(t) advances quickly with less competition in order to use surplus resources
SFQ • Start tag of a flow’s most recent request acts as the flow’s virtual clock • Flow with small tag value is behind and will receive priority • Flow with large tag value are ahead and may be held back • However, newly active flows will have their tag values set by v(t) so that there is fair competition between all active flows • Drawback: traditional SFQ [specifically v(t) ] does not work well in the face of concurrency
Interposed Proportional Sharing • Goal: use a variant of SFQ for interposed scheduling which handles up to D requests concurrently • Ideal goal: the interposed scheduler can dispatch enough jobs concurrently to completely use resources • Scheduler wants to always have D concurrent outstanding requests • This value D represents a tradeoff between server resource utilization and scheduler fairness • Example: large D allows server to always have jobs waiting, but also increases the wait time for incoming requests
Min-SFQ • Adaptation of SFQ which defines v(t) as the minimum start-tag of any outstanding request • Issue: • v(t) advances according to slowest active flow f • Sudden burst from f will penalize aggressive flows which are using surplus from f’s idle resources • If v(t) lags behind, it degrades down to Virtual Clock algorithm • If v(t) gets too far ahead, becomes FIFO • Both known to be unfair
SFQ(D) • Goal is to advance v(t) fairly • Solution1: derive v(t) from active flows, not lagging flows • v(t) is defined as the start tag of the queued request with the lowest start tag at the last dispatch • Still uses the initial rules for determining the tags
SFQ(D) • Since the algorithm for dispatching is strictly SFQ, earlier properties still hold • Fairness • Bound on lag for different flows • Authors prove that SFQ(D) has fairness and lag bounds for requests completed • Client’s view of fairness
SFQ(D) • Problems • v(t) advances monotonically on request dispatch events, but may not advance on every dispatch • Therefore, bursts of requests may get the same start tag regardless of being behind or ahead • It is most fair for the scheduler to be biased in these situations against flows that have been using surplus resources • Realization: MinSFQ doesn’t suffer from this
FSFQ(D) • Refinement of SFQ(D) that favors slow flows over ones that are ahead • Four-tag Start-time Fair Queuing • Combines fairness policies of SFQ(D) and MinSFQ(D) • Adds two new “adjusted “tags derived from MinSFQ(D) • The new tags are used to break ties in favor of lagging flows
Problems • SFQ(D) and FSFQ(D) require a central point of interposition to intercept and schedule all requests • Made for network switch or router • Introduces single point of failure and complexity • Scheduling overhead grows at best logarithmically – limits scalability • Authors propose simple decentralized approach called Request Windows (RW)
Request Windows • Credit-based server access scheme • Interposed at the client • Each flow i is given a number of credits ni • Each request from i uses a portion of i’s credit allocation • For a given flow i, it’s portion of the total weight D is seen as
Request Windows • Pros • Under light load, a flow will encounter little congestion and complete quickly • Similar to self-clocking nature of TCP • Cons • RW is not fully work-conserving • Yields tight fairness bound, but may limit concurrency and ability to use surplus resources • As with SFQ(D), able to prove bound on lag between active flows