1 / 35

Practical TDMA for Datacenter Ethernet

Practical TDMA for Datacenter Ethernet. Bhanu C. Vattikonda, George Porter, Amin Vahdat , Alex C. Snoeren. Variety of applications hosted in datacenters. Gather/Scatter. All-to-all. Performance depends on throughput sensitive traffic in shuffle phase. Generate latency sensitive traffic.

salene
Download Presentation

Practical TDMA for Datacenter Ethernet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical TDMA for Datacenter Ethernet Bhanu C. Vattikonda, George Porter, AminVahdat, Alex C. Snoeren

  2. Variety of applications hosted in datacenters Gather/Scatter All-to-all • Performance depends on throughput sensitive traffic in shuffle phase • Generate latency sensitive traffic

  3. Network is treated as a black-box • Applications like Hadoop MapReduce perform inefficiently • Applications like Memcached experience high latency Why does the lack of coordination hurt performance?

  4. Example datacenter scenario • Bulk transfer • Latency sensitive Traffic receiver Latency sensitive Bulk transfer Bulk transfer

  5. Drops and queuing lead to poor performance Traffic receiver • Bulk transfer traffic experiences packet drops • Latency sensitive traffic gets queued in the buffers Latency sensitive Bulk transfer Bulk transfer

  6. Current solutions do not take a holistic approach • Facebook uses a custom UDP based transport protocol • Alternative transport protocols like DCTCP address TCP shortcomings • Infiniband, Myrinet offer boutique hardware solutions to address these problems but are expensive Since the demand can be anticipated, can we coordinate hosts?

  7. Taking turns to transmit packets Receiver Time Division Multiple Access Bulk transfer Bulk transfer Latency sensitive

  8. TDMA: An old technique

  9. Enforcing TDMA is difficult • It is not practical to task hosts with keeping track of time and controlling transmissions • End host clocks quickly go out of synchronization

  10. Existing TDMA solutions need special support • Since end host clocks cannot be synchronized, special support is needed from the network • FTT-Ethernet, RTL-TEP, TT-Ethernet require modified switching hardware • Even with special support, the hosts need to run real time operating systems to enforce TDMA • FTT-Ethernet, RTL-TEP Can we do TDMA with commodity Ethernet?

  11. TDMA using Pause Frames • Flow control packets (pause frames) can be used to control Ethernet transmissions • Pause frames are processed in hardware • Very efficient processing of the flow control packets Blast UDP packets 802.3x Pause frames Measure time taken by sender to react to the pause frames

  12. TDMA using Pause Frames • Pause frames processed in hardware • Very efficient processing of the flow control packets • Reaction time to pause frames is 2 – 6 μs • Low variance * Measurement done using 802.3x pause frames

  13. TDMA using commodity hardware Collect demand information from the end hosts Collect demand information from the end hosts TDMA imposed over Ethernet using a centralized fabric manager Compute the schedule for communication Compute the schedule for communication Control end host transmissions

  14. TDMA example Collect demand information from the end hosts D1 Compute the schedule for communication S S –> D1: 1MB S –> D2: 1MB Control end host transmissions D2 round2 round1 Schedule • round1: S -> D1 • round2: S -> D2 • round3: S -> D1 • round4: S -> D2 • … Fabric manager

  15. More than one host Fabric manager • Control packets should be processed with low variance • Control packets should arrive at the end hosts synchronously round2 round2 round2 round2 round1 round1 round1 round1

  16. Synchronized arrival of control packets • We cannot directly measure the synchronous arrival • Difference in arrival of a pair of control packets at 24 hosts

  17. Synchronized arrival of control packets • Difference in arrival of a pair of control packets at 24 hosts • Variation of ~15μs for different sending rates at end hosts

  18. Ideal scenario: control packets arrive synchronously round2 round3 Round 1 Round 2 Round 3 Host A Round 1 Round 2 Round 3 Host B round2 round3

  19. round2 Experiments show that packets do not arrive synchronously Round 1 Round 2 Round 3 Host A Round 1 Round 2 Round 3 Host B Out of sync by <15μs round2

  20. Stop round2 Guard times to handle lack of synchronization Round 1 Round 2 Round 3 Host A Round 1 Round 2 Round 3 Host B Guard times (15μs) handle out of sync control packets Stop round2

  21. TDMA for Datacenter Ethernet • Use flow control packets to achieve low variance • Guard times adjust for variance in control packet arrival Control end host transmissions

  22. Encoding scheduling information • We use IEEE 802.1Qbb priority flow control frames to encode scheduling information • Using iptables rules, traffic for different destinations can be classified into different Ethernet classes • 802.1Qbb priority flow control frames can then be used to selectively start transmission of packets to a destination

  23. Methodology to enforce TDMA slots • Pause all traffic • Un-pause traffic to a particular destination • Pause all traffic to begin the guard time

  24. Evaluation • MapReduce shuffle phase • All to all transfer • Memcached like workloads • Latency between nodes in a mixed environment in presence of background flows • Hybrid electrical and optical switch architectures • Performance in dynamic network topologies

  25. Experimental setup • 24 servers • HP DL380 • Dual Myricom 10G NICs with kernel bypass to access packets • 1 Cisco Nexus 5000 series 10G 96-port switch,1 Cisco Nexus 5000 series 10G 52-port switch • 300μs TDMA slot and 15μs guard time • Effective 5% overhead

  26. All to all transfer in multi-hop topology • 10GB all to all transfer 8 Hosts 8 Hosts 8 Hosts

  27. All to all transfer in multi-hop topology • 10GB all to all transfer • We use a simple round robin scheduler at each level • 5% inefficiency owing to guard time TCP all to all Ideal transfer time: 1024s TDMA all to all 8 Hosts 8 Hosts 8 Hosts

  28. Latency in the presence of background flows • Start both bulk transfers • Measure latency between nodes using UDP Bulk transfer Latency sensitive Receiver Bulk transfer

  29. Latency in the presence of background flows • Latency between the nodes in presence of TCP flows is high and variable • TDMA system achieves lower latency TCP TDMA TDMA with Kernel bypass

  30. Adapting to dynamic network configurations Optical circuit switch Electrical packet switch

  31. Adapting to dynamic network configurations • Link capacity between the hosts is varied between 10Gbps and 1Gbps every 10ms Receiver Sender Ideal performance

  32. Adapting to dynamic network configurations • Link capacity between the hosts is varied between 10Gbps and 1Gbps every 10ms Receiver Sender TCP performance

  33. Adapting to dynamic network configurations TDMA better suited since it prevents packet losses TCP performance

  34. Conclusion • TDMA can be achieved using commodity hardware • Leverage existing Ethernet standards • TDMA can lead to performance gains in current networks • 15% shorter finish times for all to all transfers • 3x lower latency • TDMA is well positioned for emerging network architectures which use dynamic topologies • 2.5x throughput improvement in dynamic network settings

  35. Thank You

More Related