1 / 17

ElasticTree

ElasticTree. Michael Fruchtman. Background Theory. Solving the multi-commodity flow problem NP-Complete hard G(V,E) and commodities set K With set S as source and set T as sinks Maximize : Maximize the minimal fraction of flow. Topologies. Question and Hypothesis.

phil
Download Presentation

ElasticTree

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ElasticTree Michael Fruchtman

  2. Background Theory • Solving the multi-commodity flow problem • NP-Complete hard • G(V,E) and commodities set K • With set S as source and set T as sinks • Maximize: • Maximize the minimal fraction of flow

  3. Topologies

  4. Question and Hypothesis • Data center network devices are always on • Networking devices are near constant power draw • Can a data center network be made energy proportional with non-energy proportional components? • Heller et al. propose three ways to calculate a network topology subset for the current demand.

  5. Current State of Technology • I am HPC oriented • Data centers and HPC clusters use the same topology • Ubiquitous Fat Tree

  6. Data Center Networking • Diurnal Cycle • Networking hardware is not energy proportional • Can we lower the power on the downcycle? • Network traffic is only 8% of power cost

  7. Simple Attempts and Modeling • Switch power modeling • Constant minimum • 3W per port, 1W to turn it on • Minimum Spanning Tree • Connect all nodes • No redundancy • No fault tolerance • Is there anything in between?

  8. Approaches • Formal Model • Traffic Matrix, assume a data rate • Solve for the minimum number of switches • Output: Subset of fat tree topology • Results • Scales to only 1000 nodes • Very slow, O(n3.5) • Cannot deal with traffic spikes

  9. Approaches • Greedy bin packing • For each flow move to leftmost switch • Keep moving flow until each switch is full • Not all flows will resolve, some arbitrary decisions made • Output: Return fat tree subset.

  10. Approaches • Topology Aware Heuristic • Minimize switch number • Compute minimum • Ports and switches needed • Take total flow up tree and divide by data link rate to find necessary number of ports • Do the same for the down flow • Assign the minimum number of switches to achieve the required bandwidth. • As reliable as minimum spanning tree

  11. Results No results for topology heuristic was given on power savings. • Format model • 48 node fat tree • Plateau • Constant minimum • Locality Concerns • Performs better when traffic is localized

  12. Approach Flaws • Long computation time produces cycles • On traffic increase switches get overloaded • On traffic decrease too many switches • “Following” algorithms • One node continually calculates the network • What to do if the optimizer goes down?

  13. Latency • Safety margins can be introduced by layering additional MSTs over the solution

  14. Redundancy • All solutions are not fault tolerant • Add fault tolerance by adding MSTs over the solution • Each MST adds 1% of original network’s power cost • Exponential increase in reliability

  15. Methodology Flaws • Used testbed of routers with network simulators • No large datacenter • Largest size of routers with network simulators was 48 hosts, data centers have thousands of nodes • No TCP testing • TCP protocol has flow control built in • Will TCP back off before the optimizer activate more ports and switches? • Cannot handle traffic spikes due to multi-minute switch and router boot times • Will only reduce power in idle systems

  16. Conclusion • Might be useful for data centers due to lower utilization • Perfect for diurnal cycle • Useless for most HPC clusters • http://saguaro.fulton.asu.edu • Needs physical testing • TCP could make this approach useless

  17. Questions

More Related