Blink: Managing Server Clusters on Intermittent Power

Blink: Managing Server Clusters on Intermittent Power Navin Sharma, Sean Barker, David Irwin, and Prashant Shenoy

Energy’s Impact • Datacenters are growing in size • 100k servers + millions of cores possible • Energy demands also growing • Cost of energy is increasing • Estimates of >30% of TCO and rising • ≈ ½ the emissions of the airline industry

Reducing Energy’s Impact Solar • Financial: optimize energy budget • Regulate usage for variable prices • Environmental: use green energy • Leverage more renewables Wind • Must regulate energy footprint

Challenge • How do we design server clusters that run on intermittent power? • Power fluctuate independent of workload • Maximize performance subject to available energy

Outline • Motivation • Overview: Blink Abstraction • Application Example: Blinking Memcached • Implemenation: Blink Prototype • Evaluation: Power/workload Traces • Related Work • Conclusion

Running Clusters on Intermittent Power • Short term fluctuations (~minutes) • Smooth power using UPSes • Long term fluctuations • Increase/decrease power consumption • One approach: activate/deactivate servers • But servers maintain memory/disk state… • …that will be unavailable if not transferred/replicated

Server Blinking Blink Interval Blink Interval 100% power (0% duty cycle) • Blinking == a duty cycle for servers • Continuous active-to-inactive transitions • Extends PowerNap (ASPLOS ‘09) • Goal: cap energy use over short intervals • Feasible Today: ACPI S3 (Suspend-to-RAM) • Fast transitions (~seconds) • Large power reductions (>90% peak) 50% power (50% duty cycle)

Blinking Abstraction • Blinking policies: coordinate blinking across servers • Activation: vary active servers • Synchronous: blink servers in tandem • Asymmetric: blink servers at different rates Node 2 (100%) Node 3 (0%) Node 1 (100%) Node 4 (0%) Node 1 (50%) Node 2 (50%) Node 3 (50%) Node 4 (50%) Node 2 (50%) Node 3 (35%) Node 1 (100%) Node 4 (15%)

Proof-of-Concept Example: Memcached • Distributed in-memory cache • Popular optimization for web applications • E.g., Facebook, LiveJournal, Flikr • Smart Clients/Dumb Servers • Simple client hash function maps keysservers Example: hash(key) = 2 Memcached Node #1 Web App DB Memcached Node #2 get(key) Memcached Client . . . Memcached Node #N get_from_db(key)

Activation for Memcached • Initial approach: no memcached modifications • Keys randomly distributed across servers • Problem: which node to deactivate? • Arbitrarily favor keys on active servers Memcached Node #1 get key1 MCD Client obj1 get key2 key1 node 1 Memcached Node #2 miss key2 node 2

Reducing Activation’s Performance Penalty • Popularity-based key migration • Group similarly popular objects on same server • Deactivate “least popular” servers • Invalidates least popular objects • Benefit: higher hit rates • Problem: unfair if many keys have same popularity • No server is “least popular” • Still arbitrarily favors keys on active servers

Synchronous for Memcached • Benefits for Uniform Popularity • Fairness: all keys equally available • Performance: same hit rate as activation • Problem: poor hit rate if keys not equally popular Activation Hit Rate = 80% Synchronous Hit Rate = 50% Activation Hit Rate = 50% Synchronous Hit Rate = 50% (20% requests) (50% requests) (50% requests) (80% requests) (80% requests) (20% requests) (50% requests) (50% requests) Node #1 Node #1 Node #1 Node #1 Node #2 Node #2 Node #2 Node #2 miss miss miss miss miss miss hit hit hit hit hit hit hit hit miss miss miss miss hit hit

Best of Both Worlds: Load-proportional • Blink servers in proportion to load • Balance performance and fairness • Works well for Zipf popularity distributions • Few popular keys, many (equally) unpopular keys High hit rate for (mostly) active popular servers Fair for (mostly) synchronously blinking less popular servers

Blink Prototype Experimental Deployment Field Deployment Power trace via serial port Programmable Power Supply solar wind Battery - - - + + + Power Manager Low Power Node Low Power Node Low Power Node

BlinkCache Implementation • Place proxy between clients and servers • Proxy maintains keyserver table • Tracks key popularity and migrates keys • Modest implementation complexity • No client/server memcached modifications • Added ~300LOC to existing proxy Application Server Backend Server MCD Proxy PHP Server MCD Client MCD Server Power Client Power Manager Application Server Backend Server PHP Server MCD Client MCD Server UPS Power Client

Experimental Setup • Test Deployment • Cluster of 10 low-power nodes • AMD Geode LX (433MHz) CPU, 256 MB RAM • Power consumption well-matched to production • ~100 watts peak production/consumption • Workloads and Metrics • Solar + wind traces from deployment • Zipf popularity distributions (α = 0.6) • Hit Rate (Performance), Standard Deviation (Fairness)

S3 Transition Overhead Blink Prototype

Balancing Performance and Fairness • Activation with key migration (green) • Best hit rate • Load-proportional (red) • Slightly lower hit rate, but more fair

Case Study: Tag Clouds in GlassFish • GlassFish: Java application server that creates tag clouds • Cache dynamically-generated HTML pages in memcached • Each HTML page: 20 requests to a MySQL database

Related Work • Sensor and Mobile Research • Use duty-cycling to reduce energy footprint • Lexicographic (SenSys 09), AdaptiveCycle (SECON 07) • Blink: power delivery infrastructure shared • Energy-efficient Computing • Minimize energy to satisfy a workload • FAWN (SOSP 09), PowerNap (ASPLOS 09) • Blink: optimize workload to satisfy energy budget • Dealing with Variability • Similar to churn in DHTs • Chord (SIGCOMM 01), Bamboo (USENIX 04) • Blink: introduce regulated/controllable churn

Conclusions • Blink: new abstraction for regulating energy footprint • Blinking is feasible in modern servers • Highlight differences in blinking policies • Modify example application (Memcached) for blinking • Modest implementation overhead • Ongoing work • Explore more applications • Distributed storage is more important/challenging • Explore more power profiles • Variable pricing, battery capacities

Questions?

Activation for Memcached • Next approach: alter client hash function • Add/remove servers from hash • Problem: penalizes power fluctuations • Invalidates keys on every change • Consistent hashing: 1/nth keys for single addition/removal get key1 Server #1 obj1 MCD Client key1 key1 node 1 node 1 get key2 obj2 key2 key2 node 1 node 2 Server #2

Uniform Popularity • Synchronous • Same hit rate as activation • More fair than activation

Proxy Overhead • Memcached Proxy • Imposes modest overhead

Blink: Managing Server Clusters on Intermittent Power