170 likes | 322 Views
ElasticTree. Michael Fruchtman. Background Theory. Solving the multi-commodity flow problem NP-Complete hard G(V,E) and commodities set K With set S as source and set T as sinks Maximize : Maximize the minimal fraction of flow. Topologies. Question and Hypothesis.
E N D
ElasticTree Michael Fruchtman
Background Theory • Solving the multi-commodity flow problem • NP-Complete hard • G(V,E) and commodities set K • With set S as source and set T as sinks • Maximize: • Maximize the minimal fraction of flow
Question and Hypothesis • Data center network devices are always on • Networking devices are near constant power draw • Can a data center network be made energy proportional with non-energy proportional components? • Heller et al. propose three ways to calculate a network topology subset for the current demand.
Current State of Technology • I am HPC oriented • Data centers and HPC clusters use the same topology • Ubiquitous Fat Tree
Data Center Networking • Diurnal Cycle • Networking hardware is not energy proportional • Can we lower the power on the downcycle? • Network traffic is only 8% of power cost
Simple Attempts and Modeling • Switch power modeling • Constant minimum • 3W per port, 1W to turn it on • Minimum Spanning Tree • Connect all nodes • No redundancy • No fault tolerance • Is there anything in between?
Approaches • Formal Model • Traffic Matrix, assume a data rate • Solve for the minimum number of switches • Output: Subset of fat tree topology • Results • Scales to only 1000 nodes • Very slow, O(n3.5) • Cannot deal with traffic spikes
Approaches • Greedy bin packing • For each flow move to leftmost switch • Keep moving flow until each switch is full • Not all flows will resolve, some arbitrary decisions made • Output: Return fat tree subset.
Approaches • Topology Aware Heuristic • Minimize switch number • Compute minimum • Ports and switches needed • Take total flow up tree and divide by data link rate to find necessary number of ports • Do the same for the down flow • Assign the minimum number of switches to achieve the required bandwidth. • As reliable as minimum spanning tree
Results No results for topology heuristic was given on power savings. • Format model • 48 node fat tree • Plateau • Constant minimum • Locality Concerns • Performs better when traffic is localized
Approach Flaws • Long computation time produces cycles • On traffic increase switches get overloaded • On traffic decrease too many switches • “Following” algorithms • One node continually calculates the network • What to do if the optimizer goes down?
Latency • Safety margins can be introduced by layering additional MSTs over the solution
Redundancy • All solutions are not fault tolerant • Add fault tolerance by adding MSTs over the solution • Each MST adds 1% of original network’s power cost • Exponential increase in reliability
Methodology Flaws • Used testbed of routers with network simulators • No large datacenter • Largest size of routers with network simulators was 48 hosts, data centers have thousands of nodes • No TCP testing • TCP protocol has flow control built in • Will TCP back off before the optimizer activate more ports and switches? • Cannot handle traffic spikes due to multi-minute switch and router boot times • Will only reduce power in idle systems
Conclusion • Might be useful for data centers due to lower utilization • Perfect for diurnal cycle • Useless for most HPC clusters • http://saguaro.fulton.asu.edu • Needs physical testing • TCP could make this approach useless