380 likes | 519 Views
“ ElasticTree : Saving energy in data center networks“ by Brandon Heller, Seetharaman , Mahadevan , Yiakoumis , Sharma, Banerjee , McKeown presented by Nicoara Talpes , Kenneth Wade. About the paper. Published in April 2010 at Networked Systems Design & Implementation (NSDI). Motivation 1.
E N D
“ElasticTree: Saving energy in data center networks“ by Brandon Heller, Seetharaman, Mahadevan,Yiakoumis, Sharma, Banerjee, McKeownpresented by NicoaraTalpes, Kenneth Wade
About the paper Published in April 2010 at Networked Systems Design & Implementation (NSDI)
Motivation 1 Efforts spend so far on servers and cooling. Our focus is on the network (10-20% total power) Environmental Protection Agency: estimate that in 2011 networks in data centers will consume 12 B kWh This is 6.542.640 tons CO21
Motivation 2 Goal: energy proportionality
Motivation 3 Cannot get to green line using the hardware Common network goal is to balance traffic evenly among all links: power is constant regardless of load ‘Data centers provisioned to run at peak workload, below capacity most of the time’ Today’s network elements not energy proportional: switches, transceivers waste power at low loads Switches consume 70% of full power when idle
Existing networks 2N: fault tolerant
Wasted power Servers draw constant power independent of traffic time varying demands, provisioned for peak
ElasticTree • Goal: build network that has energy proportionality even if switches don’t • By using traffic management and control of switches: turning on switch consumes most of the power; 8%: going from zero to full traffic; turning off switch saves most power • Careful: minimizing effects on performance and fault tolerance • Has to work at scale to make an impact • With ET, we do opposite than balanced networks: only use a few links, lower power at low loads (ex: middle night)
Existing networks: scale-out Ex: Fat-tree; incremental degradation
Implementation 1 • Optimizer: find minimum power network subset which satisfies current traffic. Inputs: topology, traffic matrix, switch’s power models, fault tolerance constraints. Outputs new topology • Continually re-computes subset as traffic changes • Power control: toggles power states of ports, linecards, entire switches • Routing: chooses paths for all flows, pushes routes into network
Optimizer methods: formal model • outputs subset & flow assignments • Evaluates solution quality of other optimizers • optimal • (con) scales to number of hosts ^ 3.5
Optimizer methods: formal model • Doesn’t scale
Power savings: data centers • 30 % traffic inside dc, greedy-bin packet optimizer, scaled, reductions of 25-60%: energy elastic!
Need for redundancy • Nice propriety: cost drops with increase of network size since MST is smaller fraction
Optimizer methods: Greedy-Bin packing • Scales better, optimal solution not guaranteed, not all flows can be assigned • Understand power savings for larger topologies
Optimizer methods: topology aware heuristic • Quickly find subsets in networks with regular structure (fat tree) • Requires less information: only need the cross-layer totals, not the full traffic matrix • Routing independent: does not compute set of flow routes, (con) assumes divisible flows; can be applied with any fat tree routing algorithm (Portland); any full-bisection-bandwidth topologies with any nr layers (ex 1gb at edge, 10 gb core) • Simple additions to this lead to quality solutions in a fraction of time
Optimizer comparison • formal model intractable for large topologies greedy • Un-optimized single-core python implementation: 20s
Control software • ET requires traffic data and control over flow paths. we use Open Flow: generate traffic? and push application level flow routes to switches
Implementation 2 • Openflow: measure traffic matrix, control routing flows • Open flow: vendor neutral so no need to change code when use HP/ECR switches • Experiments show savings 25-40% feasible: 1 bill KWhr annual savings; then we have proportional reduction in cooling costs
Experiments • Topologies: two 3-layer k=4 fat tree; one 3-layer k=6 fat tree • Measurements: NetFPGA traffic generator: each emulates four servers • Latency monitor
3 Power savings results • Formal method. Savings depend on network utilization • traffic all inside. near traffic at low utilization: 60% reduction
Power savings: sine-wave demand • Reduction up to 64%
Robustness: safety margins • MSTs disadvantages: renounces path redundancy and fault tolerance • Added cost of fault tolerance insignificant for large networks
Performance • Uniform traffic shows spikes, large delays for packets
Safety margins • safety margins defer points of loss, degrade latency • Margins are adjustable
Topology aware optimizer • Better robustness by tweaks: setting the link rate utilization in equations to absorb overloads and reduce delay • setting the switch degree to add redundancy for improving fault tolerance • Solves constraints: Response times dominated by switch boot time (30sec - 3 min) • Fault tolerance: move topology-aware optimizer to separate host to prevent crashes to affect routing. • Traffic prediction experiments encouraging; can use greedy algorithm
7 Discussion • During low to mid-utilization, it respects the constraints while lowering the costs
References Some images borrowed from the author’s presentation available at his website or online at http://www.usenix.org/events/nsdi10/tech/ 1-http://www.nef.org.uk/greencompany/co2calculator.htm