200 likes | 401 Views
Hidra: History Based Dynamic Resource Allocation For Server Clusters. Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard Labs., Palo Alto, CA, USA. ITA05, Wrexham, UK September 2005. Why Dynamic Resource Allocation.
E N D
Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju1 and Yoshio Turner2 1 Stanford University, CA, USA 2Hewlett-Packard Labs., Palo Alto, CA, USA ITA05, Wrexham, UK September 2005
Why Dynamic Resource Allocation • High demand variation for an Internet service • Daily: peak load ~10 times average load during day • Variation over longer time scales (days, weeks) • Benefits of Dynamic Resource Allocation • Reduce operating costs for a service • Energy • Software license fees • Support more services on a shared infrastructure • Shift resources between services on-demand • Practical: fast server re-purposing • Blade server management • Networked storage • Virtual machine cloning/migration
Problem • Determine resource requirements for a service on-the-fly • Challenges: • Frequent service updates • Frequent changes in client interest set Static a priori capacity planning won’t work
Approach: Hidra Hidra: History-based Dynamic Resource Allocation • “Black-box approach”: continuously build and update a model of system behavior from externally visible performance attributes, without knowledge of internal operation (e.g., what is the bottleneck resource) • Model updates: introduce freshness and confidence • Extrapolation: determine resource requirements with only a partial model
Scope • Large services requiring multiple servers • Multi-tier: each tier = a cluster of servers. Assumptions: • Identical servers within a tier • Servers in different tiers can be different • Allocation granularity = Server (ex: blade in a blade server) • Predictable client request rate • Reasonable if smoothly varying, or occasional discontinuities • Service and server behavior can change over time • Goal: Find minimum cost resource allocation that meets server response time requirement • Cost = sum of cost of servers allocated to each tier • Mean response time (may be generalized)
Outline • Single-tier history-based resource allocation • Constructing and updating history-based model (freshness and confidence) • Using the model to determine resource allocation (extrapolation) • Multi-tier history-based resource allocation • Summary
Single-Tier History-Based Model • Model represents the average behavior of a server in a tier • Consists of a collection of measured operating points (history) for the tier • Each history point: at least (request rate per server, mean response time) • Model provides an estimate of function F (): response time = F (request rate) (increasing function in range of interest) (per-server request rate)
Response time threshold l Using the History-Based Model • Goal: find the fewest servers needed to meet a requirement for maximum mean response time • Extrapolate model to find l, the largest feasible average request rate per server • Given R = tier’s applied load (requests per second) Resource allocation = N = R/l servers (per-server request rate)
Updating the Model • Response time function can change over time: • Service content or implementation • Client interest set • Number of allocated servers (request distribution, and non-linear performance scaling) • Nevertheless, history-based model is useful • Gradual changes recent history is a good approximation • Occasional large changes recent history is relevant except in immediate moments after a large change • Periodically update model based on current performance measurements • Balance responsiveness and accuracy: Incorporate new measurements quickly to model current behavior, but not so aggressively that transient glitches pollute the model
History Update: Freshness and Confidence • History point update as weighted average of stored value and new measurement New stored value = a * old stored value + (1 – a) * new measurement • Older history is less likely to represent current behavior • Recent history can be obsolete after a sudden shift in behavior • Weighting factor a combines: • Freshness: value which decreases with time since last update • Confidence: value which increases with repeated confirmation of consistent behavior for the history point • Combination: EWMA (captures freshness) with decay rate that slows with increasing confidence
Extrapolation: Determining Resource Allocation • Model has incomplete view of response time function • To find optimal l, Hidra extrapolates/interpolates unique pair of history points • Only use points that match general shape of typical response time curve (positive slope) • Favor points with high avalue (ignore if ais very small) • If only one point exists (current operating point), adjust allocation differently 7 8 9 6 Response Time 5 X Y Z Threshold 4 3 2 1 Applied Load • Limits on consecutive changes in resource allocation (fixed limit for decreases, growing limits for increases)
Single-Tier Evaluation: Overview • Approach: Apply Hidra to allocate resources for a simulated cluster • Simulation allows easy control of cluster behavior and determination of optimal allocation • Each server modeled as simple M/M/1 queue with time-varying arrival rate l and service rate m • Provides response time function that varies over time • More complex models not needed for our purposes • Effectiveness of freshness and confidence • Effectiveness for clusters with non-linear cluster performance scaling
Effectiveness of Freshness • Increase msteadily over time from 40 to 70 req/s • No freshness (red) uses obsolete information • Freshness (green) close to optimal (blue) allocation
Effectiveness of Confidence • Set mconstant over time except for periodic transients Freshness only, no Confidence Freshness and Confidence • Using Confidence, Hidra less susceptible to short-term transients by preserving more commonly observed values
Non-Linear Cluster Scaling • Response time function may be sensitive to the resource allocation. Examples: • Caching effect: Memory in each additional server adds to total effective content cache capacity if shared effectively throughput scales faster than N • Communication effect: Overhead of coordination between servers throughput scales slower than N • Evaluate using request rates from hp.com logs for a 24-hour period • Caching: assume hit ratio increases linearly with N, causing increase of service rate m • Communication: increase service time (1/m) linearly with N
Response Time Resource Allocation Service Rate m Caching Effect Results • Wide variation in the average behavior of a server • Each server is more effective as allocation is increased • Hidra adapts, achieving close to optimal allocation
Response Time Resource Allocation Service Rate m Communication Effect Results • Opposite service rate behavior compared to caching • Each server is less effective as allocation is increased • Hidra handles this case also
Multi-Tier Resource Allocation • Multi-Tier characteristics • A request to first tier could trigger multiple secondary requests to other tiers • Average response time is sum of average response times of each tier • Cost of resource could be different for different tiers • Multi-Tier resource allocation as an extension of the single-tier case • Response time for each tier computed using single-tier algorithm • Dynamically vary target response times for each tier to minimize total cost resource allocation • Same client request rate used for all tiers
Two-Tier Results Total cost of allocated servers Caching (both tiers) Communication (both tiers) Caching (Tier1) Communication (Tier 2) • Same effect in both tiers results similar to single-tier case are optimal • Different effects in each tier optimal allocation has cost intermediate between the two extremes • Hidra adapts successfully to all these cases
Summary • Presented Hidra for history-based resource allocation of server clusters • Proposed use of freshness and confidence to update history-based model effectively • Developed extrapolation approach for finding operating point with incomplete model • Extended the model to multi-tier systems • Simulation-based results show scheme is promising for both single-tier and multi-tier systems