350 likes | 504 Views
Practical TDMA for Datacenter Ethernet. Bhanu C. Vattikonda, George Porter, Amin Vahdat , Alex C. Snoeren. Variety of applications hosted in datacenters. Gather/Scatter. All-to-all. Performance depends on throughput sensitive traffic in shuffle phase. Generate latency sensitive traffic.
E N D
Practical TDMA for Datacenter Ethernet Bhanu C. Vattikonda, George Porter, AminVahdat, Alex C. Snoeren
Variety of applications hosted in datacenters Gather/Scatter All-to-all • Performance depends on throughput sensitive traffic in shuffle phase • Generate latency sensitive traffic
Network is treated as a black-box • Applications like Hadoop MapReduce perform inefficiently • Applications like Memcached experience high latency Why does the lack of coordination hurt performance?
Example datacenter scenario • Bulk transfer • Latency sensitive Traffic receiver Latency sensitive Bulk transfer Bulk transfer
Drops and queuing lead to poor performance Traffic receiver • Bulk transfer traffic experiences packet drops • Latency sensitive traffic gets queued in the buffers Latency sensitive Bulk transfer Bulk transfer
Current solutions do not take a holistic approach • Facebook uses a custom UDP based transport protocol • Alternative transport protocols like DCTCP address TCP shortcomings • Infiniband, Myrinet offer boutique hardware solutions to address these problems but are expensive Since the demand can be anticipated, can we coordinate hosts?
Taking turns to transmit packets Receiver Time Division Multiple Access Bulk transfer Bulk transfer Latency sensitive
Enforcing TDMA is difficult • It is not practical to task hosts with keeping track of time and controlling transmissions • End host clocks quickly go out of synchronization
Existing TDMA solutions need special support • Since end host clocks cannot be synchronized, special support is needed from the network • FTT-Ethernet, RTL-TEP, TT-Ethernet require modified switching hardware • Even with special support, the hosts need to run real time operating systems to enforce TDMA • FTT-Ethernet, RTL-TEP Can we do TDMA with commodity Ethernet?
TDMA using Pause Frames • Flow control packets (pause frames) can be used to control Ethernet transmissions • Pause frames are processed in hardware • Very efficient processing of the flow control packets Blast UDP packets 802.3x Pause frames Measure time taken by sender to react to the pause frames
TDMA using Pause Frames • Pause frames processed in hardware • Very efficient processing of the flow control packets • Reaction time to pause frames is 2 – 6 μs • Low variance * Measurement done using 802.3x pause frames
TDMA using commodity hardware Collect demand information from the end hosts Collect demand information from the end hosts TDMA imposed over Ethernet using a centralized fabric manager Compute the schedule for communication Compute the schedule for communication Control end host transmissions
TDMA example Collect demand information from the end hosts D1 Compute the schedule for communication S S –> D1: 1MB S –> D2: 1MB Control end host transmissions D2 round2 round1 Schedule • round1: S -> D1 • round2: S -> D2 • round3: S -> D1 • round4: S -> D2 • … Fabric manager
More than one host Fabric manager • Control packets should be processed with low variance • Control packets should arrive at the end hosts synchronously round2 round2 round2 round2 round1 round1 round1 round1
Synchronized arrival of control packets • We cannot directly measure the synchronous arrival • Difference in arrival of a pair of control packets at 24 hosts
Synchronized arrival of control packets • Difference in arrival of a pair of control packets at 24 hosts • Variation of ~15μs for different sending rates at end hosts
Ideal scenario: control packets arrive synchronously round2 round3 Round 1 Round 2 Round 3 Host A Round 1 Round 2 Round 3 Host B round2 round3
round2 Experiments show that packets do not arrive synchronously Round 1 Round 2 Round 3 Host A Round 1 Round 2 Round 3 Host B Out of sync by <15μs round2
Stop round2 Guard times to handle lack of synchronization Round 1 Round 2 Round 3 Host A Round 1 Round 2 Round 3 Host B Guard times (15μs) handle out of sync control packets Stop round2
TDMA for Datacenter Ethernet • Use flow control packets to achieve low variance • Guard times adjust for variance in control packet arrival Control end host transmissions
Encoding scheduling information • We use IEEE 802.1Qbb priority flow control frames to encode scheduling information • Using iptables rules, traffic for different destinations can be classified into different Ethernet classes • 802.1Qbb priority flow control frames can then be used to selectively start transmission of packets to a destination
Methodology to enforce TDMA slots • Pause all traffic • Un-pause traffic to a particular destination • Pause all traffic to begin the guard time
Evaluation • MapReduce shuffle phase • All to all transfer • Memcached like workloads • Latency between nodes in a mixed environment in presence of background flows • Hybrid electrical and optical switch architectures • Performance in dynamic network topologies
Experimental setup • 24 servers • HP DL380 • Dual Myricom 10G NICs with kernel bypass to access packets • 1 Cisco Nexus 5000 series 10G 96-port switch,1 Cisco Nexus 5000 series 10G 52-port switch • 300μs TDMA slot and 15μs guard time • Effective 5% overhead
All to all transfer in multi-hop topology • 10GB all to all transfer 8 Hosts 8 Hosts 8 Hosts
All to all transfer in multi-hop topology • 10GB all to all transfer • We use a simple round robin scheduler at each level • 5% inefficiency owing to guard time TCP all to all Ideal transfer time: 1024s TDMA all to all 8 Hosts 8 Hosts 8 Hosts
Latency in the presence of background flows • Start both bulk transfers • Measure latency between nodes using UDP Bulk transfer Latency sensitive Receiver Bulk transfer
Latency in the presence of background flows • Latency between the nodes in presence of TCP flows is high and variable • TDMA system achieves lower latency TCP TDMA TDMA with Kernel bypass
Adapting to dynamic network configurations Optical circuit switch Electrical packet switch
Adapting to dynamic network configurations • Link capacity between the hosts is varied between 10Gbps and 1Gbps every 10ms Receiver Sender Ideal performance
Adapting to dynamic network configurations • Link capacity between the hosts is varied between 10Gbps and 1Gbps every 10ms Receiver Sender TCP performance
Adapting to dynamic network configurations TDMA better suited since it prevents packet losses TCP performance
Conclusion • TDMA can be achieved using commodity hardware • Leverage existing Ethernet standards • TDMA can lead to performance gains in current networks • 15% shorter finish times for all to all transfers • 3x lower latency • TDMA is well positioned for emerging network architectures which use dynamic topologies • 2.5x throughput improvement in dynamic network settings