Power-aware Resource Allocation for Cpu - and Memory Intense Internet Services

Power-aware Resource Allocation for Cpu- and Memory Intense Internet Services Vlasia Anagnostopoulou (vlasia@cs.ucsb.edu), SusmitBiswas, HebaSaadeldeen, Ricardo Bianchini, Tao Yang, Diana Franklin, Frederic T. Chong University of California, Santa Barbara First E2DC Workshop 08/05/2012

Cpu- and Memory Intense Internet-Services • Latency-bound • Intense computation (=>high cpu utilization) • Petascale data MapReduce, Hadoop,…

Datacenter clusters

Datacenter cluster operation

Challenges • Standard middleware algorithms are inefficient for cpu- and memory-intense internet services • Resource allocation operates at a fine-granularity • But is oblivious of the SLA • Power management is SLA-aware • But is only driven by the CPU • Coarse-grained • Request distribution does not operate at a resource granularity

Overview of solution Standard Middleware Optimized Middleware • SLA-aware and fine-grained • Two steps: • Configure states of servers (basic power-aware resource allocation) • Allocate resources to servers (cpuand memory) Resource Allocation Power-aware Resource Allocation for cpu and memory Power Management Adjusted Request distribution Request distribution

Contents • Introduction • Power-aware Resource Allocation • Basic • With Support for Multiple Applications • Adjusted Request Distribution • Methodology • Experiments • Conclusion

Basic Power-aware Resource Allocation • Configure server states: • Active, off, low-power state • Problem of memory being inaccessible • Internet-services have high memory demand (for caching) • Solution: use a memory-active, low-power state (barely-alive) • Memory is on • Server is not operational, but memory can be remotely accessed • Memory contributes to global cache

Details of Barely-alive state

Basic Power-aware Resource Allocation • Calculations: • Active servers to service load • N_cpu_act= Load_demand / Cpu_capacity • Memory-active servers to satisfy memory demand • Active or barely-alive • N_mem_act = Memory_demand/ Mem_capacity • Configure to maximize energy savings, or to maximize memory allocation

Example • N=5 servers • Cpu-capacity = 1,000 conn. • Mem-capacity = 1GB • Load = 3,000 conn. • Target mem-alloc = 4GB • Maximize energy-savings: • Maximize memory alloc.: • Mem. usage: 0.8GB/server • How to control the memory allocation?

Memory Allocation for SLA • Two objectives: • 1) Allocate memory for SLA • 2) Share memory among services with SLA guarantees • Must be fair; accept priority • Guarantee minimum performance • Characteristics: • Uniform allocation per server (to avoid imbalance) • Memory performance monitoring capability which is SLA-aware

Memory allocation for SLA • Utilize stack algorithm [Mattson] • Measures contribution of memory size to the hit-rate • Hit-rate is used as proxy of performance • Server-level: Calculate alloc for target-hit-rate • Attach SLA mapping • Cluster-level: calculate avg size for target hit-rate • How to allocate memory when constrained?

SLA/Memory Sharing • Aggregate metric of performance • sum of allocations which yield performance closest to SLA • Linear optimization problem to maximize aggregate performance: • at each step, allocate memory s.t. to minimize aggregate performance • subject to memory capacity constraint • guarantee min SLA for each app {app1, app2} => Target SLA {#2, #2} dist_to_SLA_alloc= ∞ dist_to_SLA_alloc= ∞ dist_to_SLA_alloc = 1 dist_to_SLA_alloc = 1 dist_to_SLA_alloc = 0 dist_to_SLA_alloc = 0

Request Distribution Processing…

Adjusted Request Distribution Processing…

Contents • Introduction • Power-aware Resource Allocation • Basic • With Support for Multiple Applications • Adjusted Request Distribution • Methodology • Simulator • Traces • Experiments • Conclusion

Methodology • Datacenter-cluster simulator: • 1 rack • trace-based functional simulator • Simulate all standard and proposed middleware algorithms • Traces: • Internet-search “snippet” generator

Contents • Introduction • Power-aware Resource Allocation • Basic • With Support for Multiple Applications • Adjusted Request Distribution • Methodology • Simulator • Traces • Experiments • Basic Algorithm • Shared Cluster • Conclusion

Experiments – Basic Algorithm • Evaluate various configuration objectives: • Barely-alive: maximize memory allocation; Mixed: maximize energy savings • Fix SLA, evaluate energy savings only. Also, evaluate residual memory. • SLA #1, #2, #3: Response time degradation 1-2%, 2-3%, 3-4% • Aggressiveness of consolidation: 50, 70, 85%

Results – basic algorithm • Mixed system has highest energy savings; up to 42% (24% over On/Off) • BA: up to 34% (20% over On/Off)

Results – basic algorithm • Mixed system is most stable • In barely-alive system savings depend on the SLA level; can push the parameter for savings aggressiveness • On/off system savings are influenced by both parameters. Degrade significantly at high SLA levels

Results - Basic algorithm • BA: up to extra 7.5GB memory: allocate to another application, transition to low-power etc

Results – Cluster Sharing

Results – Cluster sharing

Contents • Introduction • Power-aware Resource Allocation • Basic • With Support for Multiple Applications • Adjusted Request Distribution • Methodology • Simulator • Traces • Experiments • Basic Algorithm • Shared Cluster • Conclusion

Conclusion • Combine power management and resource allocation => power-aware resource allocation • SLA-driven, fine grained management of datacenter clusters • Performance guarantees + energy savings • Flexibility to different optimizations for datacenter scenarios • Achieve deep energy savings or potential for more memory utility out of cluster • Holistic design of middleware software

Thank you for your attention!!! Questions? Contact: vlasia@cs.ucsb.edu URL: www.cs.ucsb.edu/~vlasia

Power-aware Resource Allocation for Cpu - and Memory Intense Internet Services

Power-aware Resource Allocation for Cpu - and Memory Intense Internet Services

Presentation Transcript

Resource Allocation

Resource Allocation

Memory Allocation

Resource Allocation

Memory Allocation

Stochastic Models of Resource Allocation for Services

Resource Allocation

Performance and Power Aware CMP Thread Allocation

Network Aware Resource Allocation in Distributed Clouds

Memory allocation

Memory Allocation

RESOURCE ALLOCATION

Memory allocation

Resource Allocation

Resource Allocation

Parallel Session B2 - CPU and Resource Allocation

Memory Allocation

Memory Allocation

QoS-Aware Resource Allocation for Slowly Time-Varying Channels

Memory allocation

Resource Allocation

Resource Allocation