220 likes | 337 Views
Time measurement of network data transfer. R. Fantechi , G. Lamanna 25/5/2011. Outline. Motivations Hardware setup Software tools Measurement and their (possible) interpretation Prospects. Motivations. Network transfers to L1 and L2 need low latency
E N D
Time measurement of network data transfer R. Fantechi, G. Lamanna 25/5/2011
Outline • Motivations • Hardware setup • Software tools • Measurement and their (possible) interpretation • Prospects
Motivations • Network transfers to L1 and L2 need low latency • Forboth TEL62-PC and PC-PC transfer, do weknowhowmuchitis? • For which network protocol is it the best? • How does it depend from the computer HW? • How does it depend from the network interface? • How much is it the latency fluctuaction? GPUs are sensitive… • The knowledge of fluctuations is important to stay within the 1ms budget • Standard software monitor tools give averages • Try to use hardware signals, generated in strategic points inside the software • Correlate signals from a sender to those from a receiver
Hardware setup • TwoPCswith GB I/F • A is a Pentium 4 2.4GHz • Called PCATE • B is a 2*4 core Xeon • Called PCGPU • Direct Ethernet connection on hidden network • Each PC is equipped with a Parallel port I/F • It is used to generate timing pulses • Lecroy scope • Time measurements • Histograms • Storage of screenshots PCATE PCGPU Adapter for the parallel port
Software tools • Investigate three “protocols” • Raw Ethernet packets (socketPF_PACKET, SOCK_RAW) • IP packets (socket PF_INET, SOCK_RAW) • TCP packets (socket PF_INET, SOCK_STREAM) • Three pairs of simple senders/receivers • The sender • Gets from the command line packet size, number of packets, delay between packets , downscaling factor (see later) • Initialize the socket and go in a tight loop, with a delay inside • Inside the loop, before and after the send command, write a pulse on the parallel port • The receiver • After inizialization, go in a receive loop and write a pulse on the parallel port after having received a packet
Code example /* Create rawsocket */ sock = socket(AF_INET, SOCK_RAW, PROBE_PROT); if (sock < 0) { perror("opening rawsocket"); exit(1); } …………………………. if (iloop<0) iloop = 1000000000; for (i=0;i<iloop;i++) { if (i%50==0) { buf[0]=0x01; out=0x01; outb(out,0x378); out=0x00; outb(out,0x378); } else buf[0]=0x00; if (sendto(sock, buf, buflen,0,&server,sizeof(structsockaddr_in)) < 0) perror("writing on stream socket"); out=0x02; outb(out,0x378); out=0x00; outb(out,0x378); for (k=0; k<conv_time; k++); } • Sender /* Create socket */ sock = socket(AF_INET, SOCK_RAW, PROBE_PROT); if (sock < 0) { perror("opening streamsocket"); exit(1); } …………………. int kk=0; serv_size = sizeof(server); do { if ((rval = recvfrom(sock, buf, BUFFER_SIZE,0,(structsockaddr *)&server,&serv_size)) < 0) perror("readingstreammessage"); i = 0; if (rval == 0) printf("Endingconnection\n"); else { if(rval== BUFFER_SIZE) { outb(0x01,0x378); outb(0x00,0x378); } ("-->%d\n", rval); } while (rval != 0); • Receiver Send a pulse Delay loop
Software tools • Maximum rate • On the sender, some timeisspentfor the code execution • The minimum achievable repetition rate between packets varies from ~6 ms to ~10 ms • Depending on machine speed, type of protocol, etc • Downscaling factor • Needed to operate properly the scope at high rates • If the loop index modulo the downscaling factor is 0, send in the packet the pattern to be written by the receiver on the parallel port, otherwise 0 • Packets are sent at the specified rate, but the scope registers only a fraction • Additional tools used • Wireshark and Tcpdump to check packet arrival • Ifconfig and /proc/interrupts to count packet and interrupt loss
Basicmethodcheck • Are these pulse reliable? • A simple check: histogram the width of the pulse generated by the sender • Pulse width: ~1.22 ms , sdev 0.04 ms, watch out the maximum
Parametersused in the tests • Packet size • Small packets (200 bytes) or large packets (1300 bytes) • Protocols • 3 as mentioned before • Delay between packets • Usually from 10 ms down to the minimum • Typical sequence: 10, 5, 2, 1 ms, 100, 50, 20, 10 ms • Measurements • Store interesting screenshots • Record time difference, sigma, max value • Time difference = time of rx pulse – time of tx pulse
Lost packets and interrupts • No lost packets observed at any rate • Checked with ifconfig at source and destination • Interrupt behaviour via /proc/interrupts • At high rates the number of interrupts decreases • Well known phenomenon of “interrupt coalescence” in the driver • Packets received too fast are buffered and the CPU interrupted only once • For TCP at high rates and 200 bytes buffers, interrupts are reduced also because TCP puts many buffers in an Eth packet • Anyway, measuring TCP performances is more difficult as the protocol has the freedom of segmenting user buffers as it likes (i.e. flow control)
Interrupt coalescence Two examples, at 15 ms (left) and 12 ms (right) 1300 bytes, PCATE->PCGPU
CPU usage Sender Receiver
Time across sendto Time difference btw a pulse after sendto and one before – The machine is the same
Time across sendto - Fluctuations Count how many times the time is over 20 ms (wrt all times) Raw ~5/26000 IP ~13/26000 TCPmin ~8/20000 (1 ms) max ~402/20000 (100 ms) - 1300 bytes 18/26000 - 200 bytes On PCATE as sender Quiet example Moving the mouse… Only 15 > 4500
Transfer time As a function of time, different buffer sizes Critical zone
Transfer time As a function of packet size, different times, PCATE->PCGPU
Transfer time PCATE -> PCGPU, raw, 1300 bytes 5 ms 2 ms 1 ms 200 ms 100 ms 500 ms
Transfer time 5 ms PCGPU->PCATE ~8 ms 200 bytes 1300 bytes
Transfer time trending PCGPU->PCATE, raw 200 bytes 50 ms 1000 bytes 50 ms 1300 bytes 40 ms 1300 bytes 20 ms 200 bytes 20 ms 1000 bytes 20 ms
Summary • Hardware timing system • Reliable, not interfering with the measurement (at level of max 10 ms) • Time spent in the sender • A fraction (<10%) of the total transfer time • Varies with the protocol type • Stable with the packet rate • Transfer time • Down to 50 ms varies a little as a function of packet rate • Between 50 and 120 ms • Below 20 ms it increases (up to 2 ms) for raw, but not for IP • This setup is not working below ~10 ms • Where we are most interested
To be done • Complete the measurement • Both directions • All protocols (TCP, maybe new ones) • Performance as a function of CPU power • Use different PCs • Add load on the machines • Test multiple I/F and switches • Change the sender to an object driven by an FPGA • TEL62 or TALK • Investigate different protocol features • New protocols or switch features of the old ones • Test more complex transfer sw (i.e. TDBIO) • Some work hopefully done by USA summer students…