Vlasia Anagnostopoulou (vlasia@cs.ucsb),

Energy Conservation in Datacenters through Cluster Memory Management and Barely-Alive Memory Servers Vlasia Anagnostopoulou (vlasia@cs.ucsb.edu), Susmit Biswas, Alan Savage, Ricardo Bianchini*, Tao Yang, Frederic T. Chong Department of Computer Science, UC Santa Barbara *Department of Computer Science, Rutgers University

Dependence on Internet-services… • More online services • new internet-services, email, informational sites, social networks… • Example: • Growth of web-search 450,000,000 200,000,000 500,000 10,000

Environmental impact • Internet-services live in datacenters • Thousands of machines per datacenter • Many datacenters across the globe • Energy consumption: ~1.2% total US [Ref: EPA] • Energy consumpt. growth:

Are Datacenters efficient? • Strict performance standards for internet-services through SLAs • Over-provisioning • Machines are under-utilized most of the time • Servers are inefficient at low or average utilization Ref: Barroso andHölzle

Current techniques for efficiency • Under low load, reconfigure cluster • Consolidate load into fewer machines • Turn rest off • Transition to low power idle state • Memory is not accessible in these states • Operate at lower frequency (VS) • Performance problems • For internet-services the working set typically doesn’t shrink with load! • Because of reboot, very slow to restart (~sec) • Have to warm-up memory

For web-search, memory is particularly critical • Search dataset doesn’t change much with load! • Searches have temporal locality -> Zipf’s distribution [Ref:Adamic] • Intense database search • May search up to hundreds servers at a time • But fairly light CPU task to process a search query Memory can and should be managed wisely, in order not to loose performance!

Our technique for efficiency + performance: • Barely-Alive state: • CPU is turned off, memory is kept on • Much lower power consumption • Distributed middleware: • Request distribution • Transition servers to BA state • Manage server memory content locally • Allocate optimal memory to services globally • Do not degrade performance (respect SLA) Hardware requirements?

How would it be implemented • Memory is accessed through Memory Controller (MC) • MC is on CPU (bummer!) • Install small CPU on NIC • Memory accessible like DMA via new CPU + MC • Turn off main CPU OFF Software requirements?

Basic request distribution in a self-managed cluster algorithm I am less loaded than Server 3 • LARD: locality-aware request distribution [Ref: V.Pai+others] • PRESS: its distributed version [Ref: Carrera+Bianchini] • Main idea: • Exploit locality in references by forwarding same requests to same machine • But balance load evenly among machines • Challenges of integrating BA servers into the request distribution scheme: • Transition from Active to BA and vice versa • Stale content of BA servers

Self-managed cluster with BA servers • Transition Active to BA • Application decides on global level • Locally, if there are no procs or reqs • Make sure not to over-utilize active servers! • Stale content of BA servers • Store installs new object (immutability) • Application may invalidate old objects at will • On activate, BA updates its Directory from active • Periodic activation or state swapping • Space of obsolete objects can be reclaimed Optimal memory allocation? Multiple services? Energy efficiency?

Middleware for efficient memory management • Optimal memory allocation • Dynamically size memory to respect exactly the SLA requirement • Translate SLA requirement -> target hit-ratio • Use stack algorithm to predict optimal size from target hit-ratio • Stack algorithm overview • Measures contribution of cache size to the hit-ratio • On a single pass, it calculates the cumulative hit-ratio with size • How to adapt the stack algorithm for resizing the • global cluster memory optimally?

Optimal memory allocation • Distributed stack algorithm • For each server: • Keep track of memory size + hit ratio information • On time window, broadcast size for desired hit-ratio. • Resize local stack with globalaverage size Extension for BA servers, variable sized objects, multiple services…

Extension of distributed Stack Algo • Include BA servers • Contribute fixed amount of memory (passive) • Multiple-size objects • Separate stack for each object size • This leverages directory look-up • Multiple services • Each service keeps its own stack in the memory • Memory partitioned across services Energy efficiency of BA state (without the efficiency yielded from the memory management)

Power savings potential

Cumulative power savings • Synthetic search trace over 1 day (24h)

Future work • Currently looking into more on-line and off-line apps (e.g. web-translation, sorting algorithm) • Extend power consumption breakdown • Sensitivity analysis of power savings to simulation’s parameters • (e.g. memory capacity, network assumptions, component access times, etc) • Evaluation of distributed algorithm

Conclusions • Datacenters have a growing impact on the environment • Machines in datacenters are inefficient • Memory is a critical component for performance for applications run on a cluster • Exploit memory without degrading performance with Barely-Alive state + middleware • Potential power savings up to 49%, without loss of performance

Questions? • Thank you for your attention! • vlasia@cs.ucsb.edu • www.cs.ucsb.edu/~arch

Vlasia Anagnostopoulou (vlasia@cs.ucsb),

Vlasia Anagnostopoulou (vlasia@cs.ucsb),

Presentation Transcript

Power-aware Resource Allocation for Cpu - and Memory Intense Internet Services

Recitation on EM slides taken from: cs.ucsb/~ambuj/Courses/bioinformatics/EM.pdf

Standard-sized container - Very efficient air-flow

Kyriaki Anagnostopoulou , Head of e- Learning Julian Prior, Project Officer

Dr. Christina Anagnostopoulou Department of Meteorology-Climatology, School of Geology

Elena Anagnostopoulou , Dionysios Mertyris and Christina Sevdali