1 / 36

Richard Hughes-Jones The University of Manchester hep.man.ac.uk/~rich/ then “Talks”

The Performance of High Throughput Data Flows for e-VLBI in Europe Multiple vlbi_udp Flows, Constant Bit-Rate over TCP & Multi-Gigabit over G ÉANT2. Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks”. Resolution Baseline Sensitivity

gilles
Download Presentation

Richard Hughes-Jones The University of Manchester hep.man.ac.uk/~rich/ then “Talks”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Performance of High Throughput Data Flows for e-VLBI in EuropeMultiple vlbi_udp Flows,Constant Bit-Rate over TCP&Multi-Gigabit over GÉANT2 Richard Hughes-Jones The University of Manchesterwww.hep.man.ac.uk/~rich/ then “Talks” TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  2. Resolution Baseline Sensitivity Bandwidth B is as important as time τ: Can use as many Gigabits as we can get! What is VLBI ? • VLBI signal wave front • Data wave front sent over the network to the Correlator TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  3. European e-VLBI Test Topology Metsähovi Finland Gbit link Chalmers University of Technology, Gothenburg Jodrell BankUK OnsalaSweden Gbit link TorunPoland 2* 1 Gbit links DedicatedDWDM link Dwingeloo Netherlands MedicinaItaly TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  4. vlbi_udp: UDP on the WAN • iGrid2002 monolithic code • Convert to use pthreads • control • Data input • Data output • Work done on vlbi_recv: • Output thread polled for data in the ring buffer – burned CPU • Input thread signals output thread when there is work to do – else wait on semaphore – had packet loss at high rate,  variable throughput • Output thread uses sched_yield() when no work to do • Multi-flow Network performance – set up in Dec06 • 3 Sites to JIVE: Manc UKLight; Manc production; Bologna GEANT PoP • Measure: throughput, packet loss, re-ordering, 1-way delay TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  5. vlbi_udp: Some of the Problems • JIVE made Huygens, mark524 (.54) and mark620 (.59) available • Within minutes of Arpad leaving, the Alteon NIC of mark524 lost the data network! • OK used mark623 (.62) – faster CPU • Firewalls needed to allow vlbi_udp ports • Aarrgg (!!!) Huygens is SUSE Linux • Routing – well this ALWAYS needs to be fixed !!! • AMD Opteron did not like sched_getaffinity() sched_setaffinity() • Comment out this bit • udpmon flows Onsala to JIVE look good • udpmon flows JIVE mark623 to Onsala & Manc UKL don’t work • Firewall down stops after 77 udpmon loops • Firewall up udpmon cant communicate with Onsala • CPU load issues on the MarkV systems • Don’t seem to be able to keep up with receiving UDP flow AND emptying the ring buffer • Torun PC / Link lost as the test started TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  6. Multiple vlbi_udp Flows • Gig7  Huygens UKLight 15 us spacing • 816 Mbit/s sigma <1Mbit/sstep 1 Mbit/s • Zero packet loss • Zero re-ordering • Gig8  mark623 Academic Internet 20 us spacing • 612 Mbit/s • 0.6 falling to 0.05% packet loss • 0.02 % re-ordering • Bologna  mark620 Academic Internet 30 us spacing • 396 Mbit/s • 0.02 % packet loss • 0 % re-ordering TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  7. The Impact of Multiple vlbi_udp Flows • Gig7  Huygens UKLight 15 us spacing 800 Mbit/s • Gig8  mark623 Academic Internet 20 us spacing 600 Mbit/s • Bologna  mark620 Academic Internet 30 us spacing 400 Mbit/s SJ5 Access link SURFnet Access link GARR Access link TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  8. e-VLBI: Driven by Science Microquasar GRS1915+105 (11 kpc) on 21 April 2006 at 5 Ghz using 6 EVN telescopes, during a weak flare (11 mJy), just resolved in jet direction (PA140 deg). (Rushton et al.) • 128 Mbit/s from each telescope • 4 TBytes raw samples data over 12 hours • 2.8 GBytes of correlated data Microquasar Cygnus X-3 (10 kpc) on 20 April (a) and 18 May 2006 (b). The source as in a semi-quiescent state in (a) and in a flaring state in (b), The core of the source is probably ~20 mas to the N of knot A. (Tudose et al.) b a TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  9. RR001 The First Rapid Response Experiment (Rushton Spencer) The experiment was planned as follows: • Operate EVN 6 telescope in real time on 29th Jan 2007 • Correlate and Analyse results in double quick time • Select sources for follow up observations • Observe selected sources 1 Feb 2007 The experiment worked – we successfully observed and analysed 16 sources (weak microquasars), ready for the follow up run but we found that none of the sources were suitably active at that time.– a perverse universe! TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  10. Constant Bit-Rate Data over TCP/IP TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  11. CBR Test Setup TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  12. Moving CBR over TCP When there is packet loss TCP decreases the rate. TCP buffer 0.9 MB (BDP) RTT 15.2 ms Effect of loss rate on message arrival time. TCP buffer 1.8 MB (BDP) RTT 27 ms Timely arrivalof data Can TCP deliver the data on time? TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  13. Resynchronisation Delay in stream Packet loss Expected arrival time at CBR Arrival time Message number / Time TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  14. CBR over TCP – Large TCP Buffer • Message size: 1448 Bytes • Data Rate: 525 Mbit/s • Route:Manchester - JIVE • RTT 15.2 ms • TCP buffer 160 MB • Drop 1 in 1.12 million packets • Throughput increases • Peak throughput ~ 734 Mbit/s • Min. throughput ~ 252 Mbit/s TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  15. CBR over TCP – Message Delay • Message size: 1448 Bytes • Data Rate: 525 Mbit/s • Route:Manchester - JIVE • RTT 15.2 ms • TCP buffer 160 MB • Drop 1 in 1.12 million packets • OK you can recover BUT: • Peak Delay ~2.5s • TCP buffer  RTT4 TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  16. Multi-gigabit tests over GÉANT But will 10 Gigabit Ethernet work on a PC? TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  17. High-end Server PCs for 10 Gigabit • Boston/Supermicro X7DBE • Two Dual Core Intel Xeon Woodcrest 5130 • 2 GHz • Independent 1.33GHz FSBuses • 530 MHz FD Memory (serial) • Parallel access to 4 banks • Chipsets: Intel 5000P MCH – PCIe & MemoryESB2 – PCI-X GE etc. • PCI • 3 8 lane PCIe buses • 3* 133 MHz PCI-X • 2 Gigabit Ethernet • SATA TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  18. Histogram FWHM ~1-2 us 10 GigE Back2Back: UDP Latency • Motherboard: Supermicro X7DBE • Chipset: Intel 5000P MCH • CPU: 2 Dual Intel Xeon 5130 2 GHz with 4096k L2 cache • Mem bus: 2 independent 1.33 GHz • PCI-e 8 lane • Linux Kernel 2.6.20-web100_pktd-plus • Myricom NIC10G-PCIE-8A-R Fibre • myri10ge v1.2.0 + firmware v1.4.10 • rx-usecs=0 Coalescence OFF • MSI=1 • Checksums ON • tx_boundary=4096 • MTU 9000 bytes • Latency 22 µs & very well behaved • Latency Slope 0.0028 µs/byte • B2B Expect: 0.00268 µs/byte • Mem 0.0004 • PCI-e 0.00054 • 10GigE 0.0008 • PCI-e 0.00054 • Mem 0.0004 TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  19. 10 GigE Back2Back: UDP Throughput • Kernel 2.6.20-web100_pktd-plus • Myricom 10G-PCIE-8A-R Fibre • rx-usecs=25 Coalescence ON • MTU 9000 bytes • Max throughput 9.4 Gbit/s • Notice rate for 8972 byte packet • ~0.002% packet loss in 10M packetsin receiving host • Sending host, 3 CPUs idle • For <8 µs packets, 1 CPU is >90% in kernel modeinc ~10% soft int • Receiving host3 CPUs idle • For <8 µs packets, 1 CPU is 70-80% in kernel modeinc ~15% soft int TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  20. 10 GigE UDP Throughput vs packet size • Motherboard: Supermicro X7DBE • Linux Kernel 2.6.20-web100_pktd-plus • Myricom NIC10G-PCIE-8A-R Fibre • myri10ge v1.2.0 + firmware v1.4.10 • rx-usecs=0 Coalescence ON • MSI=1 • Checksums ON • tx_boundary=4096 • Steps at 4060 and 8160 byteswithin 36 bytes of 2n boundaries • Model data transfer time as t= C + m*Bytes • C includes the time to set up transfers • Fit reasonable C= 1.67 µs m= 5.4 e4 µs/byte • Steps consistent with C increasing by 0.6 µs • The Myricom driver segments the transfers, limiting the DMA to 4096 bytes – PCI-e chipset dependent! TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  21. 10 GigE X7DBEX7DBE: TCP iperf Web100 plots of TCP parameters • No packet loss • MTU 9000 • TCP buffer 256k BDP=~330k • Cwnd • SlowStart then slow growth • Limited by sender ! • Duplicate ACKs • One event of 3 DupACKs • Packets Re-Transmitted • Iperf TCP throughput 7.77 Gbit/s TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  22. OK so it works !!! TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  23. ESLEA-FABRIC:4 Gbit flows over GÉANT2 • Set up 4 Gigabit Lightpath Between GÉANT2 PoPs • Collaboration with DANTE • GÉANT2 Testbed London – Prague – London • PCs in the DANTE London PoP with 10 Gigabit NICs • VLBI Tests: • UDP Performance • Throughput, jitter, packet loss, 1-way delay, stability • Continuous (days) Data Flows – VLBI_UDP and udpmon • Multi-Gigabit TCP performance with current kernels • Multi-Gigabit CBR over TCP/IP • Experience for FPGA Ethernet packet systems • DANTE Interests: • Multi-Gigabit TCP performance • The effect of (Alcatel 1678 MCC 10GE port) buffer size on bursty TCP using BW limited Lightpaths TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  24. The GÉANT2 Testbed • 10 Gigabit SDH backbone • Alcatel 1678 MCCs • GE and 10GE client interfaces • Node location: • London • Amsterdam • Paris • Prague • Frankfurt • Can do lightpath routingso make paths of different RTT • Locate the PCs in London TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  25. Provisioning the lightpath on ALCATEL MCCs • Some jiggery-pokery needed with the NMS to force a “looped back” lightpath London-Prague-London • Manual XCs (using element manager) possible but hard work • 196 needed + other operations! • Instead used RM to create two parallel VC-4-28v (single-ended) Ethernet private line (EPL) paths • Constrained to transit DE • Then manually joined paths in CZ • Only 28 manually created XCs required TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  26. Provisioning the lightpath on ALCATEL MCCs • Paths come up • (Transient) alarms clear • Result: provisioned a path of 28 virtually concatenated VC-4sUK-NL-DE-NL-UK • Optical path ~4150 km • With dispersion compensation~4900 km • RTT 46.7 ms TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  27. Photos at The PoP Test-bed SDH Production SDH 10 GE ProductionRouter Optical Transport TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  28. 4 Gig Flows on GÉANT: UDP Throughput • Kernel 2.6.20-web100_pktd-plus • Myricom 10G-PCIE-8A-R Fibre • rx-usecs=25 Coalescence ON • MTU 9000 bytes • Max throughput 4.199 Gbit/s • Sending host, 3 CPUs idle • For <8 µs packets, 1 CPU is >90% in kernel modeinc ~10% soft int • Receiving host3 CPUs idle • For <8 µs packets, 1 CPU is ~37% in kernel modeinc ~9% soft int TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  29. Lab Tests: • Peak separation 86 µs • ~40 µs extra delay • Lightpath adds no unwanted effects 4 Gig Flows on GÉANT: 1-way delay • Kernel 2.6.20-web100_pktd-plus • Myricom 10G-PCIE-8A-R Fibre • Coalescence OFF • 1-way delay stable at 23.435 µs • Peak separation 86 µs • ~40 µs extra delay TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  30. Packet separation 300 µs Packet separation 100 µs Lab Tests: Lightpath adds no effects 4 Gig Flows on GÉANT: Jitter hist • Kernel 2.6.20-web100_pktd-plus • Myricom 10G-PCIE-8A-R Fibre • Coalescence OFF • Peak separation ~36 µs • Factor 100 smaller TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  31. 4 Gig Flows on GÉANT: UDP Flow Stability • Kernel 2.6.20-web100_pktd-plus • Myricom 10G-PCIE-8A-R Fibre • Coalescence OFF • MTU 9000 bytes • Packet spacing 18 us • Trials send 10 M packets • Ran for 26 Hours • Throughput very stable3.9795 Gbit/s • Occasional trials have packet loss ~40 in 10M - investigating • Our thanks go to all our collaborators • DANTE really provided “Bandwidth on Demand” • A record 6 hours ! including • Driving to the PoP • Installing the PCs • Provisioning the Light-path TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  32. Any Questions? TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  33. Introduction What is EXPReS? • EXPReS = Express Production Real-time e-VLBI Service • Three year project, started March 2006, funded by the European Commission (DG-INFSO), Sixth Framework Programme, Contract #026642 • Objective: to create a distributed, large-scale astronomical instrument of continental and inter-continental dimensions • Means: high-speed communication networks operating in real-time and connecting some of the largest and most sensitive radio telescopes on the planet • Additional Information http://expres-eu.org/ [note: only one “s”] http://www.jive.nl TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  34. Introduction EXPReS Partners Radio Astronomy Institutes • Joint Institute for VLBI in Europe (Coordinator), The Netherlands • Arecibo Observatory, National Astronomy and Ionosphere Center, Cornell University, USA • Australia Telescope National Facility, a Division of CSIRO, Australia • Institute of Radioastronomy, National Institute for Astrophysics (INAF), Italy • Jodrell Bank Observatory, University of Manchester, United Kingdom • Max Planck Institute for Radio Astronomy (MPIfR), Germany • Metsähovi Radio Observatory, Helsinki University of Technology (TKK), Finland • National Center of Geographical Information, National Geographic Institute (CNIG-IGN), Spain • Hartebeesthoek Radio Astronomy Observatory, National Research Foundation, South Africa • Netherlands Foundation for Research in Astronomy (ASTRON), NWO, The Netherlands • Onsala Space Observatory, Chalmers University of Technology, Sweden • Shanghai Astronomical Observatory, Chinese Academy of Sciences, China • Torun Centre for Astronomy, Nicolaus Copernicus University, Poland • Transportable Integrated Geodetic Observatory (TIGO), University of Concepción, Chile • Ventspils International Radio Astronomy Center, Ventspils University College, Latvia National Research Networks • AARNet, Australia • DANTE, United Kingdom • Poznan Supercomputing and Networking Center, Poland • SURFnet, The Netherlands TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  35. Introduction Participating EXPReS Telescopes TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

  36. Provisioning the lightpath on ALCATEL MCCs • Create a virtual network element to a planned port (non-existing)in Prague VNE2 • Define end points • Out port 3 in UK & VNE2 CZ • In port 4 in UK & VNE2 CZ • Add Constraint: to go via DE • Or does OSPF • Set capacity ( 28 VC-4s ) • Alcatel Resource Manager allocates routing of EXPReS_outVC-4 trails • Repeat for EXPReS_ret • Same time slots used in CZ for EXPReS_out & EXPReS_ret paths TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester

More Related