360 likes | 506 Views
The Performance of High Throughput Data Flows for e-VLBI in Europe Multiple vlbi_udp Flows, Constant Bit-Rate over TCP & Multi-Gigabit over G ÉANT2. Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks”. Resolution Baseline Sensitivity
E N D
The Performance of High Throughput Data Flows for e-VLBI in EuropeMultiple vlbi_udp Flows,Constant Bit-Rate over TCP&Multi-Gigabit over GÉANT2 Richard Hughes-Jones The University of Manchesterwww.hep.man.ac.uk/~rich/ then “Talks” TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Resolution Baseline Sensitivity Bandwidth B is as important as time τ: Can use as many Gigabits as we can get! What is VLBI ? • VLBI signal wave front • Data wave front sent over the network to the Correlator TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
European e-VLBI Test Topology Metsähovi Finland Gbit link Chalmers University of Technology, Gothenburg Jodrell BankUK OnsalaSweden Gbit link TorunPoland 2* 1 Gbit links DedicatedDWDM link Dwingeloo Netherlands MedicinaItaly TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
vlbi_udp: UDP on the WAN • iGrid2002 monolithic code • Convert to use pthreads • control • Data input • Data output • Work done on vlbi_recv: • Output thread polled for data in the ring buffer – burned CPU • Input thread signals output thread when there is work to do – else wait on semaphore – had packet loss at high rate, variable throughput • Output thread uses sched_yield() when no work to do • Multi-flow Network performance – set up in Dec06 • 3 Sites to JIVE: Manc UKLight; Manc production; Bologna GEANT PoP • Measure: throughput, packet loss, re-ordering, 1-way delay TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
vlbi_udp: Some of the Problems • JIVE made Huygens, mark524 (.54) and mark620 (.59) available • Within minutes of Arpad leaving, the Alteon NIC of mark524 lost the data network! • OK used mark623 (.62) – faster CPU • Firewalls needed to allow vlbi_udp ports • Aarrgg (!!!) Huygens is SUSE Linux • Routing – well this ALWAYS needs to be fixed !!! • AMD Opteron did not like sched_getaffinity() sched_setaffinity() • Comment out this bit • udpmon flows Onsala to JIVE look good • udpmon flows JIVE mark623 to Onsala & Manc UKL don’t work • Firewall down stops after 77 udpmon loops • Firewall up udpmon cant communicate with Onsala • CPU load issues on the MarkV systems • Don’t seem to be able to keep up with receiving UDP flow AND emptying the ring buffer • Torun PC / Link lost as the test started TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Multiple vlbi_udp Flows • Gig7 Huygens UKLight 15 us spacing • 816 Mbit/s sigma <1Mbit/sstep 1 Mbit/s • Zero packet loss • Zero re-ordering • Gig8 mark623 Academic Internet 20 us spacing • 612 Mbit/s • 0.6 falling to 0.05% packet loss • 0.02 % re-ordering • Bologna mark620 Academic Internet 30 us spacing • 396 Mbit/s • 0.02 % packet loss • 0 % re-ordering TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
The Impact of Multiple vlbi_udp Flows • Gig7 Huygens UKLight 15 us spacing 800 Mbit/s • Gig8 mark623 Academic Internet 20 us spacing 600 Mbit/s • Bologna mark620 Academic Internet 30 us spacing 400 Mbit/s SJ5 Access link SURFnet Access link GARR Access link TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
e-VLBI: Driven by Science Microquasar GRS1915+105 (11 kpc) on 21 April 2006 at 5 Ghz using 6 EVN telescopes, during a weak flare (11 mJy), just resolved in jet direction (PA140 deg). (Rushton et al.) • 128 Mbit/s from each telescope • 4 TBytes raw samples data over 12 hours • 2.8 GBytes of correlated data Microquasar Cygnus X-3 (10 kpc) on 20 April (a) and 18 May 2006 (b). The source as in a semi-quiescent state in (a) and in a flaring state in (b), The core of the source is probably ~20 mas to the N of knot A. (Tudose et al.) b a TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
RR001 The First Rapid Response Experiment (Rushton Spencer) The experiment was planned as follows: • Operate EVN 6 telescope in real time on 29th Jan 2007 • Correlate and Analyse results in double quick time • Select sources for follow up observations • Observe selected sources 1 Feb 2007 The experiment worked – we successfully observed and analysed 16 sources (weak microquasars), ready for the follow up run but we found that none of the sources were suitably active at that time.– a perverse universe! TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Constant Bit-Rate Data over TCP/IP TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
CBR Test Setup TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Moving CBR over TCP When there is packet loss TCP decreases the rate. TCP buffer 0.9 MB (BDP) RTT 15.2 ms Effect of loss rate on message arrival time. TCP buffer 1.8 MB (BDP) RTT 27 ms Timely arrivalof data Can TCP deliver the data on time? TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Resynchronisation Delay in stream Packet loss Expected arrival time at CBR Arrival time Message number / Time TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
CBR over TCP – Large TCP Buffer • Message size: 1448 Bytes • Data Rate: 525 Mbit/s • Route:Manchester - JIVE • RTT 15.2 ms • TCP buffer 160 MB • Drop 1 in 1.12 million packets • Throughput increases • Peak throughput ~ 734 Mbit/s • Min. throughput ~ 252 Mbit/s TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
CBR over TCP – Message Delay • Message size: 1448 Bytes • Data Rate: 525 Mbit/s • Route:Manchester - JIVE • RTT 15.2 ms • TCP buffer 160 MB • Drop 1 in 1.12 million packets • OK you can recover BUT: • Peak Delay ~2.5s • TCP buffer RTT4 TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Multi-gigabit tests over GÉANT But will 10 Gigabit Ethernet work on a PC? TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
High-end Server PCs for 10 Gigabit • Boston/Supermicro X7DBE • Two Dual Core Intel Xeon Woodcrest 5130 • 2 GHz • Independent 1.33GHz FSBuses • 530 MHz FD Memory (serial) • Parallel access to 4 banks • Chipsets: Intel 5000P MCH – PCIe & MemoryESB2 – PCI-X GE etc. • PCI • 3 8 lane PCIe buses • 3* 133 MHz PCI-X • 2 Gigabit Ethernet • SATA TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Histogram FWHM ~1-2 us 10 GigE Back2Back: UDP Latency • Motherboard: Supermicro X7DBE • Chipset: Intel 5000P MCH • CPU: 2 Dual Intel Xeon 5130 2 GHz with 4096k L2 cache • Mem bus: 2 independent 1.33 GHz • PCI-e 8 lane • Linux Kernel 2.6.20-web100_pktd-plus • Myricom NIC10G-PCIE-8A-R Fibre • myri10ge v1.2.0 + firmware v1.4.10 • rx-usecs=0 Coalescence OFF • MSI=1 • Checksums ON • tx_boundary=4096 • MTU 9000 bytes • Latency 22 µs & very well behaved • Latency Slope 0.0028 µs/byte • B2B Expect: 0.00268 µs/byte • Mem 0.0004 • PCI-e 0.00054 • 10GigE 0.0008 • PCI-e 0.00054 • Mem 0.0004 TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
10 GigE Back2Back: UDP Throughput • Kernel 2.6.20-web100_pktd-plus • Myricom 10G-PCIE-8A-R Fibre • rx-usecs=25 Coalescence ON • MTU 9000 bytes • Max throughput 9.4 Gbit/s • Notice rate for 8972 byte packet • ~0.002% packet loss in 10M packetsin receiving host • Sending host, 3 CPUs idle • For <8 µs packets, 1 CPU is >90% in kernel modeinc ~10% soft int • Receiving host3 CPUs idle • For <8 µs packets, 1 CPU is 70-80% in kernel modeinc ~15% soft int TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
10 GigE UDP Throughput vs packet size • Motherboard: Supermicro X7DBE • Linux Kernel 2.6.20-web100_pktd-plus • Myricom NIC10G-PCIE-8A-R Fibre • myri10ge v1.2.0 + firmware v1.4.10 • rx-usecs=0 Coalescence ON • MSI=1 • Checksums ON • tx_boundary=4096 • Steps at 4060 and 8160 byteswithin 36 bytes of 2n boundaries • Model data transfer time as t= C + m*Bytes • C includes the time to set up transfers • Fit reasonable C= 1.67 µs m= 5.4 e4 µs/byte • Steps consistent with C increasing by 0.6 µs • The Myricom driver segments the transfers, limiting the DMA to 4096 bytes – PCI-e chipset dependent! TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
10 GigE X7DBEX7DBE: TCP iperf Web100 plots of TCP parameters • No packet loss • MTU 9000 • TCP buffer 256k BDP=~330k • Cwnd • SlowStart then slow growth • Limited by sender ! • Duplicate ACKs • One event of 3 DupACKs • Packets Re-Transmitted • Iperf TCP throughput 7.77 Gbit/s TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
OK so it works !!! TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
ESLEA-FABRIC:4 Gbit flows over GÉANT2 • Set up 4 Gigabit Lightpath Between GÉANT2 PoPs • Collaboration with DANTE • GÉANT2 Testbed London – Prague – London • PCs in the DANTE London PoP with 10 Gigabit NICs • VLBI Tests: • UDP Performance • Throughput, jitter, packet loss, 1-way delay, stability • Continuous (days) Data Flows – VLBI_UDP and udpmon • Multi-Gigabit TCP performance with current kernels • Multi-Gigabit CBR over TCP/IP • Experience for FPGA Ethernet packet systems • DANTE Interests: • Multi-Gigabit TCP performance • The effect of (Alcatel 1678 MCC 10GE port) buffer size on bursty TCP using BW limited Lightpaths TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
The GÉANT2 Testbed • 10 Gigabit SDH backbone • Alcatel 1678 MCCs • GE and 10GE client interfaces • Node location: • London • Amsterdam • Paris • Prague • Frankfurt • Can do lightpath routingso make paths of different RTT • Locate the PCs in London TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Provisioning the lightpath on ALCATEL MCCs • Some jiggery-pokery needed with the NMS to force a “looped back” lightpath London-Prague-London • Manual XCs (using element manager) possible but hard work • 196 needed + other operations! • Instead used RM to create two parallel VC-4-28v (single-ended) Ethernet private line (EPL) paths • Constrained to transit DE • Then manually joined paths in CZ • Only 28 manually created XCs required TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Provisioning the lightpath on ALCATEL MCCs • Paths come up • (Transient) alarms clear • Result: provisioned a path of 28 virtually concatenated VC-4sUK-NL-DE-NL-UK • Optical path ~4150 km • With dispersion compensation~4900 km • RTT 46.7 ms TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Photos at The PoP Test-bed SDH Production SDH 10 GE ProductionRouter Optical Transport TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
4 Gig Flows on GÉANT: UDP Throughput • Kernel 2.6.20-web100_pktd-plus • Myricom 10G-PCIE-8A-R Fibre • rx-usecs=25 Coalescence ON • MTU 9000 bytes • Max throughput 4.199 Gbit/s • Sending host, 3 CPUs idle • For <8 µs packets, 1 CPU is >90% in kernel modeinc ~10% soft int • Receiving host3 CPUs idle • For <8 µs packets, 1 CPU is ~37% in kernel modeinc ~9% soft int TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Lab Tests: • Peak separation 86 µs • ~40 µs extra delay • Lightpath adds no unwanted effects 4 Gig Flows on GÉANT: 1-way delay • Kernel 2.6.20-web100_pktd-plus • Myricom 10G-PCIE-8A-R Fibre • Coalescence OFF • 1-way delay stable at 23.435 µs • Peak separation 86 µs • ~40 µs extra delay TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Packet separation 300 µs Packet separation 100 µs Lab Tests: Lightpath adds no effects 4 Gig Flows on GÉANT: Jitter hist • Kernel 2.6.20-web100_pktd-plus • Myricom 10G-PCIE-8A-R Fibre • Coalescence OFF • Peak separation ~36 µs • Factor 100 smaller TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
4 Gig Flows on GÉANT: UDP Flow Stability • Kernel 2.6.20-web100_pktd-plus • Myricom 10G-PCIE-8A-R Fibre • Coalescence OFF • MTU 9000 bytes • Packet spacing 18 us • Trials send 10 M packets • Ran for 26 Hours • Throughput very stable3.9795 Gbit/s • Occasional trials have packet loss ~40 in 10M - investigating • Our thanks go to all our collaborators • DANTE really provided “Bandwidth on Demand” • A record 6 hours ! including • Driving to the PoP • Installing the PCs • Provisioning the Light-path TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Any Questions? TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Introduction What is EXPReS? • EXPReS = Express Production Real-time e-VLBI Service • Three year project, started March 2006, funded by the European Commission (DG-INFSO), Sixth Framework Programme, Contract #026642 • Objective: to create a distributed, large-scale astronomical instrument of continental and inter-continental dimensions • Means: high-speed communication networks operating in real-time and connecting some of the largest and most sensitive radio telescopes on the planet • Additional Information http://expres-eu.org/ [note: only one “s”] http://www.jive.nl TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Introduction EXPReS Partners Radio Astronomy Institutes • Joint Institute for VLBI in Europe (Coordinator), The Netherlands • Arecibo Observatory, National Astronomy and Ionosphere Center, Cornell University, USA • Australia Telescope National Facility, a Division of CSIRO, Australia • Institute of Radioastronomy, National Institute for Astrophysics (INAF), Italy • Jodrell Bank Observatory, University of Manchester, United Kingdom • Max Planck Institute for Radio Astronomy (MPIfR), Germany • Metsähovi Radio Observatory, Helsinki University of Technology (TKK), Finland • National Center of Geographical Information, National Geographic Institute (CNIG-IGN), Spain • Hartebeesthoek Radio Astronomy Observatory, National Research Foundation, South Africa • Netherlands Foundation for Research in Astronomy (ASTRON), NWO, The Netherlands • Onsala Space Observatory, Chalmers University of Technology, Sweden • Shanghai Astronomical Observatory, Chinese Academy of Sciences, China • Torun Centre for Astronomy, Nicolaus Copernicus University, Poland • Transportable Integrated Geodetic Observatory (TIGO), University of Concepción, Chile • Ventspils International Radio Astronomy Center, Ventspils University College, Latvia National Research Networks • AARNet, Australia • DANTE, United Kingdom • Poznan Supercomputing and Networking Center, Poland • SURFnet, The Netherlands TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Introduction Participating EXPReS Telescopes TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester
Provisioning the lightpath on ALCATEL MCCs • Create a virtual network element to a planned port (non-existing)in Prague VNE2 • Define end points • Out port 3 in UK & VNE2 CZ • In port 4 in UK & VNE2 CZ • Add Constraint: to go via DE • Or does OSPF • Set capacity ( 28 VC-4s ) • Alcatel Resource Manager allocates routing of EXPReS_outVC-4 trails • Repeat for EXPReS_ret • Same time slots used in CZ for EXPReS_out & EXPReS_ret paths TERENA Networking Conference, Lyngby, 21-24 May 2007, R. Hughes-Jones Manchester