340 likes | 517 Views
Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research. MicroTE : Fine G rained Traffic Engineering for Data Centers. Why are Data Centers Important?. Host a variety of applications Web portals Web services
E N D
Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang+ *University of Wisconsin, Madison + Microsoft Research MicroTE: Fine Grained Traffic Engineering for Data Centers
Why are Data Centers Important? • Host a variety of applications • Web portals • Web services • Multimedia applications • Chat/IM • DC performance app performance • Congestion impacts app latency
Why are Data Centers Important? • Poor performance loss of revenue • Traffic engineering is crucial
Contributions • Study effectiveness of current approach • Developed guidelines for ideal approach • Designed and implemented MicroTE • Showed effectiveness in mitigating congestion
Road Map • Background • Traffic Engineering in data centers • Design goals for ideal TE • MicroTE • Evaluation • Conclusion
Options for TE in Data Centers? • Current supported techniques • Equal Cost MultiPath (ECMP) • Spanning Tree Protocol (STP) • Proposed • Fat-Tree, VL2 • Other existing WAN techniques • COPE,…, OSPF link tuning
How do we evaluate TE? • Simulator • Input: Traffic matrix, topology, traffic engineering • Output: Link utilization • Optimal TE • Route traffic using knowledge of future TM • Data center traces • Cloud data center (CLD) • Map-reduce app • ~1500 servers • University data center (UNV) • 3-Tier Web apps • ~500 servers
Draw Backs of Existing TE • STP does not use multiple path • 40% worst than optimal • ECMP does not adapt to burstiness • 15% worst than optimal
Draw Backs of Proposed TE • Fat-Tree • Rehash flows • Local opt. != global opt. • VL2 • ECMP on VLB architecture • Coarse grained flow assignment • VL2 & Fat-Tree do not adapt to burstiness
Draw Backs of Other Approaches • COPE ….OSPF link tuning • Requires hours or days of predictable traffic • Traffic unpredictable over 100 secs(Maltz. et al) Egress Ingress x
Design Requirements for TE • Calculate paths & reconfigure network • Use all network paths • Use global view • Avoid local optimals • Must react quickly • React to burstiness • How predictable is traffic? ….
Is Data Center Traffic Predictable? 27% • YES! 27% or more of traffic matrix is predictable • Manage predictable traffic more intelligently 99%
How Long is Traffic Predictable? • Different patterns of predictability • 1 second of historical data able to predict future 1.5 – 5.0 1.6 - 2.5
MicroTE: Architecture Monitoring Component Routing Component • Global view: • Created by network controller • React to predictable traffic: • Routing component tracks demand history • All N/W paths: • Routing component creates routes using all paths Network Controller
Architectural Questions • Efficiently gather network state? • Determine predictable traffic? • Generate and calculate new routes? • Install network state?
Architectural Questions • Efficiently gather network state? • Determine predictable traffic? • Generate and calculate new routes? • Install network state?
Monitoring Component • Efficiently gather TM • Only one server per ToR monitors traffic • Transfer changed portion of TM • Compress data • Tracking predictability • Calculate EWMA over TM (every second) • Empirically derived alpha of 0.2 • Use time-bins of 0.1 seconds
Routing Component New Global View Determine predictable ToRs Calculate network routes for predictable traffic Set ECMP for unpredictable traffic Install routes
Routing Predictable Traffic • LP formulation • Constraints • Flow conservation • Capacity constraint • Use K-equal length paths • Objective • Minimize link utilization • Bin-packing heuristic • Sort flows in decreasing order • Place on link with greatest capacity
Implementation • Changes to data center • Switch • Install OpenFlow firmware • End hosts • Add kernel module • New component • Network controller • C++ NOX modules
Evaluation: Motivating Questions • How does MicroTE Compare to Optimal? • How does MicroTE perform under varying levels of predictability? • How does MicroTE scale to large DCN? • What overheard does MicroTE impose?
Evaluation: Motivating Questions • How does MicroTE Compare to Optimal? • How does MicroTE perform under varying levels of predictability? • How does MicroTE scale to large DCN? • What overheard does MicroTE impose?
How do we evaluate TE? • Simulator • Input: Traffic matrix, topology, traffic engineering • Output: Link utilization • Optimal TE • Route traffic using knowledge of future TM • Data center traces • Cloud data center (CLD) • Map-reduce app • ~1500 servers • University data center (UNV) • 3-Tier Web apps • ~400 servers
Performing Under Realistic Traffic • Significantly outperforms ECMP • Slightly worse than optimal (1%-5%) • Bin-packing and LP of comparable performance
Performance Versus Predictability • Low predictability performance is similar to ECMP
Performance Versus Predictability • Low predictability performance is similar to ECMP • High predictability performance is comparable to Optimal • MicroTE adjusts according to predictability
Scaling to Large DC • Reactiveness to traffic changes • Gathering network state • Bounded by network transfer time • Route computation • Flow installation • Bounded by switch installation time
Other Related Works • SPAIN/MultiPath TCP • Use multiple paths better DC utilization • Eliminating congestion isn’t goal • Helios/Flyways • Require augmenting existing DC • Hedera/Fat-Tree • Require forklift upgrade: full bisection
Conclusion • Study existing TE • Found them lacking (15-40%) • Study data center traffic • Discovered traffic predictability (27% for 2 secs) • Developed guidelines for ideal TE • Designed and implemented MicroTE • Brings state of the art within 1-5% of Ideal • Efficiently scales to large DC (16K servers)
Thank You • MicroTE adopts to low level properties of traffic. • Can we develop techniques that cooperate with applications?