1 / 33

MicroTE : Fine G rained Traffic Engineering for Data Centers

Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research. MicroTE : Fine G rained Traffic Engineering for Data Centers. Why are Data Centers Important?. Host a variety of applications Web portals Web services

pascal
Download Presentation

MicroTE : Fine G rained Traffic Engineering for Data Centers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang+ *University of Wisconsin, Madison + Microsoft Research MicroTE: Fine Grained Traffic Engineering for Data Centers

  2. Why are Data Centers Important? • Host a variety of applications • Web portals • Web services • Multimedia applications • Chat/IM • DC performance  app performance • Congestion impacts app latency

  3. Why are Data Centers Important? • Poor performance  loss of revenue • Traffic engineering is crucial

  4. Contributions • Study effectiveness of current approach • Developed guidelines for ideal approach • Designed and implemented MicroTE • Showed effectiveness in mitigating congestion

  5. Road Map • Background • Traffic Engineering in data centers • Design goals for ideal TE • MicroTE • Evaluation • Conclusion

  6. Options for TE in Data Centers? • Current supported techniques • Equal Cost MultiPath (ECMP) • Spanning Tree Protocol (STP) • Proposed • Fat-Tree, VL2 • Other existing WAN techniques • COPE,…, OSPF link tuning

  7. How do we evaluate TE? • Simulator • Input: Traffic matrix, topology, traffic engineering • Output: Link utilization • Optimal TE • Route traffic using knowledge of future TM • Data center traces • Cloud data center (CLD) • Map-reduce app • ~1500 servers • University data center (UNV) • 3-Tier Web apps • ~500 servers

  8. Draw Backs of Existing TE • STP does not use multiple path • 40% worst than optimal • ECMP does not adapt to burstiness • 15% worst than optimal

  9. Draw Backs of Proposed TE • Fat-Tree • Rehash flows • Local opt. != global opt. • VL2 • ECMP on VLB architecture • Coarse grained flow assignment • VL2 & Fat-Tree do not adapt to burstiness

  10. Draw Backs of Other Approaches • COPE ….OSPF link tuning • Requires hours or days of predictable traffic • Traffic unpredictable over 100 secs(Maltz. et al) Egress Ingress x

  11. Design Goals for Ideal TE

  12. Design Requirements for TE • Calculate paths & reconfigure network • Use all network paths • Use global view • Avoid local optimals • Must react quickly • React to burstiness • How predictable is traffic? ….

  13. Is Data Center Traffic Predictable? 27% • YES! 27% or more of traffic matrix is predictable • Manage predictable traffic more intelligently 99%

  14. How Long is Traffic Predictable? • Different patterns of predictability • 1 second of historical data able to predict future 1.5 – 5.0 1.6 - 2.5

  15. MicroTE

  16. MicroTE: Architecture Monitoring Component Routing Component • Global view: • Created by network controller • React to predictable traffic: • Routing component tracks demand history • All N/W paths: • Routing component creates routes using all paths Network Controller

  17. Architectural Questions • Efficiently gather network state? • Determine predictable traffic? • Generate and calculate new routes? • Install network state?

  18. Architectural Questions • Efficiently gather network state? • Determine predictable traffic? • Generate and calculate new routes? • Install network state?

  19. Monitoring Component • Efficiently gather TM • Only one server per ToR monitors traffic • Transfer changed portion of TM • Compress data • Tracking predictability • Calculate EWMA over TM (every second) • Empirically derived alpha of 0.2 • Use time-bins of 0.1 seconds

  20. Routing Component New Global View Determine predictable ToRs Calculate network routes for predictable traffic Set ECMP for unpredictable traffic Install routes

  21. Routing Predictable Traffic • LP formulation • Constraints • Flow conservation • Capacity constraint • Use K-equal length paths • Objective • Minimize link utilization • Bin-packing heuristic • Sort flows in decreasing order • Place on link with greatest capacity

  22. Implementation • Changes to data center • Switch • Install OpenFlow firmware • End hosts • Add kernel module • New component • Network controller • C++ NOX modules

  23. Evaluation

  24. Evaluation: Motivating Questions • How does MicroTE Compare to Optimal? • How does MicroTE perform under varying levels of predictability? • How does MicroTE scale to large DCN? • What overheard does MicroTE impose?

  25. Evaluation: Motivating Questions • How does MicroTE Compare to Optimal? • How does MicroTE perform under varying levels of predictability? • How does MicroTE scale to large DCN? • What overheard does MicroTE impose?

  26. How do we evaluate TE? • Simulator • Input: Traffic matrix, topology, traffic engineering • Output: Link utilization • Optimal TE • Route traffic using knowledge of future TM • Data center traces • Cloud data center (CLD) • Map-reduce app • ~1500 servers • University data center (UNV) • 3-Tier Web apps • ~400 servers

  27. Performing Under Realistic Traffic • Significantly outperforms ECMP • Slightly worse than optimal (1%-5%) • Bin-packing and LP of comparable performance

  28. Performance Versus Predictability • Low predictability  performance is similar to ECMP

  29. Performance Versus Predictability • Low predictability  performance is similar to ECMP • High predictability  performance is comparable to Optimal • MicroTE adjusts according to predictability

  30. Scaling to Large DC • Reactiveness to traffic changes • Gathering network state • Bounded by network transfer time • Route computation • Flow installation • Bounded by switch installation time

  31. Other Related Works • SPAIN/MultiPath TCP • Use multiple paths  better DC utilization • Eliminating congestion isn’t goal • Helios/Flyways • Require augmenting existing DC • Hedera/Fat-Tree • Require forklift upgrade: full bisection

  32. Conclusion • Study existing TE • Found them lacking (15-40%) • Study data center traffic • Discovered traffic predictability (27% for 2 secs) • Developed guidelines for ideal TE • Designed and implemented MicroTE • Brings state of the art within 1-5% of Ideal • Efficiently scales to large DC (16K servers)

  33. Thank You • MicroTE adopts to low level properties of traffic. • Can we develop techniques that cooperate with applications?

More Related