120 likes | 179 Views
ATLAS Networking & T2UK. Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks”. Remote Computing Farms. Discussion at CERN to establish a work-plan for 2006 Valuable for Monitoring and Calibration MOU Alberta CERN Krakow Manchester
E N D
ATLAS Networking & T2UK Richard Hughes-Jones The University of Manchesterwww.hep.man.ac.uk/~rich/ then “Talks” T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester
Remote Computing Farms • Discussion at CERN to establish a work-plan for 2006 • Valuable for Monitoring and Calibration • MOU Alberta CERN Krakow Manchester • New Network Topology with all links carried by GÉANT and NRNs • Planned Investigations • Characterise the new network links and end host performance • Tools:iperf udpmon thrulay yatm • Measure the ATLAS request-response behaviour • Tools: tcpmon, web100 tcpdump • Setup the WAN emulator with the measured conditions • Compare network and ATLAS traffic observations • Install and test ATLAS application gateway (as used at the pit) • Test deployment of Online TDAQ HLT releases • Measure performance of Online TDAQ HLT releases • Consider how to link Real-Time T/DAQ to remote Grid farms • First draft of Work Plan document circulated T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester
Network Operation & Performance • Analysis of Fault Tolerance in ATLAS T/DAQ Networks • Document the action of the switches • Fate of the packets • Effect on T/DAQ applications • Networks Considered: • Front End (DataFlow) Network • BackEnd Network • Controls Network (Run control, services, some monitoring) • Consider questions like: • “Failure of a link between the ROS and the ROS Concentrator Switch” • Draft Document being discussed • Performance tests discussed • The PCI-e 4* 1GE PEG4 NIC Silicom. • Simple and trunking Throughput • ROS SuperMicro Motherboard • 6 PCI, 1 4 lane PCI-e, one 3.4 GHz Xeon (dual socket) T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester
Network Monitoring in ATLAS T/DAQ • Levels of Monitoring • SNMP Statistics MRTG, RRD, YATM higher sample rate • Traffic patterns, bytes, packets NOT dropped packets • Network test programs udpmon, iperf • Throughput loss 1-way delay rtt • Standalone ATLAS test programs speaking the TDAQ application protocol. • Richard • ATLAS test programs speaking the TDAQ application protocol using TDAQ APIs • Stefan • Monitoring by the TDAQ application itself • Integration of Message Passing Libraries • DataFLow (Reiner) and EF (Mario) main difference in substantiation of buffers • Integrate over common thin shim over the socket calls • Idea to put monitoring into (common) message passing layer • What can be observed? • Question of keeping state – Application would be the best place ! T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester
Related Work: RAID, ATLAS Grid • RAID0 and RAID5 tests • 4th Year MPhys project last semester • Throughput and CPU load • Different RAID parameters • Number of disks • Stripe size • User read / write size • Different file systems • Ext2 ext3 XSF • Sequential File Write, Read • Sequential File Write, Read with continuous background read or write • Status • Need to check some results & document • Independent RAID controller tests planned. T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester
ESLEA: ATLAS Grid on UKLight • Demonstration of benefits of Dedicated links • 1 Gbit Lightpath Lancaster-Manchester • Disk 2 Disk Transfers • Storage Element with SRM using distributed disk pools dCache & xrootd T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester
PCI-X bus with RAID Controller Read from diskfor 44 ms every 100ms PCI-X bus with Ethernet NIC Write to Network for 72 ms Check out the end host: bbftp • What is the end-host doing with your application protocol? • Transatlantic bbftp over TCP/IP • Look at the PCI-X buses • 3Ware 9000 controller RAID0 • 1 Gbit Ethernet link • 2.4 GHz dual Xeon • ~660 Mbit/s T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester
Any Questions? T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester
Backup Slides T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester
1.2GHz PIII rtt 1 ms • TCP iperf 980 Mbit/s • Kernel mode 95% Idle 1.3 % • CPULoad with nice priority • Throughput falls as priorityincreases • No Loss No Timeouts • Not enough CPU power • 2.8 GHz Xeon rtt 1 ms • TCP iperf 916 Mbit/s • Kernel mode 43% Idle 55% • CPULoad with nice priority • Throughput constant as priority increases • No Loss No Timeouts • Kernel mode includes TCP stackand Ethernet driver TCP Stacks & CPU Load • Real User problem! • End host TCP flow at 960 Mbit/s with rtt 1 ms falls to 770 Mbit/s when rtt 15 ms T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester
A Few Items for Discussion • Achievable Throughput • Sharing link Capacity (OK what is sharing?) • Convergence time • Responsiveness • rtt fairness (OK what is fairness?) • mtu fairness • TCP friendliness • Link utilisation (by this flow or all flows) • Stability of Achievable Throughput • Burst behaviour • Packet loss behaviour • Packet re-ordering behaviour • Topology – maybe some “simple” setups • Background or cross traffic - how realistic is needed? – what protocol mix? • Reverse traffic • Impact on the end host – CPU load, bus utilisation, Offload • Methodology – simulation, emulation and Real links ALL help T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester
More Information Some URLs 1 • UKLight web site: http://www.uklight.ac.uk • MB-NG project web site:http://www.mb-ng.net/ • DataTAG project web site: http://www.datatag.org/ • UDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/net • Motherboard and NIC Tests: http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/ • TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html& http://www.psc.edu/networking/perf_tune.html • TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004 • PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ • Dante PERT http://www.geant2.net/server/show/nav.00d00h002 T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester