280 likes | 429 Views
Power-aware Resource Allocation for Cpu - and Memory Intense Internet Services. Vlasia Anagnostopoulou ( vlasia@cs.ucsb.edu ), Susmit Biswas , Heba Saadeldeen , Ricardo Bianchini , Tao Yang, Diana Franklin, Frederic T. Chong University of California, Santa Barbara
E N D
Power-aware Resource Allocation for Cpu- and Memory Intense Internet Services Vlasia Anagnostopoulou (vlasia@cs.ucsb.edu), SusmitBiswas, HebaSaadeldeen, Ricardo Bianchini, Tao Yang, Diana Franklin, Frederic T. Chong University of California, Santa Barbara First E2DC Workshop 08/05/2012
Cpu- and Memory Intense Internet-Services • Latency-bound • Intense computation (=>high cpu utilization) • Petascale data MapReduce, Hadoop,…
Challenges • Standard middleware algorithms are inefficient for cpu- and memory-intense internet services • Resource allocation operates at a fine-granularity • But is oblivious of the SLA • Power management is SLA-aware • But is only driven by the CPU • Coarse-grained • Request distribution does not operate at a resource granularity
Overview of solution Standard Middleware Optimized Middleware • SLA-aware and fine-grained • Two steps: • Configure states of servers (basic power-aware resource allocation) • Allocate resources to servers (cpuand memory) Resource Allocation Power-aware Resource Allocation for cpu and memory Power Management Adjusted Request distribution Request distribution
Contents • Introduction • Power-aware Resource Allocation • Basic • With Support for Multiple Applications • Adjusted Request Distribution • Methodology • Experiments • Conclusion
Basic Power-aware Resource Allocation • Configure server states: • Active, off, low-power state • Problem of memory being inaccessible • Internet-services have high memory demand (for caching) • Solution: use a memory-active, low-power state (barely-alive) • Memory is on • Server is not operational, but memory can be remotely accessed • Memory contributes to global cache
Basic Power-aware Resource Allocation • Calculations: • Active servers to service load • N_cpu_act= Load_demand / Cpu_capacity • Memory-active servers to satisfy memory demand • Active or barely-alive • N_mem_act = Memory_demand/ Mem_capacity • Configure to maximize energy savings, or to maximize memory allocation
Example • N=5 servers • Cpu-capacity = 1,000 conn. • Mem-capacity = 1GB • Load = 3,000 conn. • Target mem-alloc = 4GB • Maximize energy-savings: • Maximize memory alloc.: • Mem. usage: 0.8GB/server • How to control the memory allocation?
Memory Allocation for SLA • Two objectives: • 1) Allocate memory for SLA • 2) Share memory among services with SLA guarantees • Must be fair; accept priority • Guarantee minimum performance • Characteristics: • Uniform allocation per server (to avoid imbalance) • Memory performance monitoring capability which is SLA-aware
Memory allocation for SLA • Utilize stack algorithm [Mattson] • Measures contribution of memory size to the hit-rate • Hit-rate is used as proxy of performance • Server-level: Calculate alloc for target-hit-rate • Attach SLA mapping • Cluster-level: calculate avg size for target hit-rate • How to allocate memory when constrained?
SLA/Memory Sharing • Aggregate metric of performance • sum of allocations which yield performance closest to SLA • Linear optimization problem to maximize aggregate performance: • at each step, allocate memory s.t. to minimize aggregate performance • subject to memory capacity constraint • guarantee min SLA for each app {app1, app2} => Target SLA {#2, #2} dist_to_SLA_alloc= ∞ dist_to_SLA_alloc= ∞ dist_to_SLA_alloc = 1 dist_to_SLA_alloc = 1 dist_to_SLA_alloc = 0 dist_to_SLA_alloc = 0
Request Distribution Processing…
Adjusted Request Distribution Processing…
Contents • Introduction • Power-aware Resource Allocation • Basic • With Support for Multiple Applications • Adjusted Request Distribution • Methodology • Simulator • Traces • Experiments • Conclusion
Methodology • Datacenter-cluster simulator: • 1 rack • trace-based functional simulator • Simulate all standard and proposed middleware algorithms • Traces: • Internet-search “snippet” generator
Contents • Introduction • Power-aware Resource Allocation • Basic • With Support for Multiple Applications • Adjusted Request Distribution • Methodology • Simulator • Traces • Experiments • Basic Algorithm • Shared Cluster • Conclusion
Experiments – Basic Algorithm • Evaluate various configuration objectives: • Barely-alive: maximize memory allocation; Mixed: maximize energy savings • Fix SLA, evaluate energy savings only. Also, evaluate residual memory. • SLA #1, #2, #3: Response time degradation 1-2%, 2-3%, 3-4% • Aggressiveness of consolidation: 50, 70, 85%
Results – basic algorithm • Mixed system has highest energy savings; up to 42% (24% over On/Off) • BA: up to 34% (20% over On/Off)
Results – basic algorithm • Mixed system is most stable • In barely-alive system savings depend on the SLA level; can push the parameter for savings aggressiveness • On/off system savings are influenced by both parameters. Degrade significantly at high SLA levels
Results - Basic algorithm • BA: up to extra 7.5GB memory: allocate to another application, transition to low-power etc
Contents • Introduction • Power-aware Resource Allocation • Basic • With Support for Multiple Applications • Adjusted Request Distribution • Methodology • Simulator • Traces • Experiments • Basic Algorithm • Shared Cluster • Conclusion
Conclusion • Combine power management and resource allocation => power-aware resource allocation • SLA-driven, fine grained management of datacenter clusters • Performance guarantees + energy savings • Flexibility to different optimizations for datacenter scenarios • Achieve deep energy savings or potential for more memory utility out of cluster • Holistic design of middleware software
Thank you for your attention!!! Questions? Contact: vlasia@cs.ucsb.edu URL: www.cs.ucsb.edu/~vlasia