280 likes | 364 Views
10Gbit between GridKa and openlab (and their obstacles). Forschungszentrum Karlsruhe GmbH Institute for Scientific Computing P.O. Box 3640 D-76021 Karlsruhe, Germany http://www.gridka.de Bruno Hoeft. LAN of GridKa Strucure Prevision of installation in 2008 WAN History (1G/2003)
E N D
10Gbit between GridKa and openlab (and their obstacles) Forschungszentrum Karlsruhe GmbH Institute for Scientific Computing P.O. Box 3640 D-76021 Karlsruhe, Germany http://www.gridka.de Bruno Hoeft
LAN of GridKa Strucure Prevision of installation in 2008 WAN History (1G/2003) Current 10G Testbed Challenges crossing multi NREN (National Research and Education Network) Quality and quantity evaluation of GridKa and openlap network connection File transfer Caching effect Outline
CMS ATLAS Lab z Uni e virtual organizations CERN Tier 1 Uni a UK (RAL) USA(Fermi, BNL) Lab y France(IN2P3) Tier 1 Tier 3(Institutecomputer) Uni b CERN Tier 2(Uni-CCs,Lab-CCs) LHCb ………. Tier 0 Italy(CNAF) Tier 4 (Desktop) Lab x Germany(FZK) ………. Lab i Uni d Uni c working groups Tier 0 Centre at CERN LHC Computing Grid Project - LCG The LHC multi-Tier Computing Model Grid Computing Centre Karlsruhe
Projects at GridKa calculating jobs with “real” data Atlas (SLAC, USA) (FermiLab ,USA) 1,5 Mio. Jobs and 4,2 Mio. hours calculation in 2004 (CERN) (FermiLab ,USA) LHC Experimente non-LHC Experimente
gradual extention of GridKa resources * Internet connetion for sc • April 2005: • biggest Linux-Cluster within the German Science Society • largest Online-Storage at a single installation in Germany • strongest Internet connection in Germany • available at the Grid with over 100 installations in Europe
Ethernet 320 Mbit Ethernet 100 Mbit Ethernet 1 Gbit FiberChannel 2Gbit Ethernet 10 Gbit DFN 10Gbit/s service challenge only 1Gbit/s GridKa sc nodes router sc nodes SAN … 1 Gbit/s sc nodes Storage direct internet access LoginServer LoginServer 1 Gbit/s … PIX FS NAS FS NAS FS NAS … FS FS SAN ComputeNode … ComputeNode FS ComputeNode … ComputeNode Network installation router private network … … switch … …
Ethernet 320 Mbit Ethernet 100 Mbit Ethernet 1 Gbit FiberChannel 2Gbit Ethernet 10 Gbit 10Gbit/s service challenge only sc nodes sc nodes SAN … sc nodes PIX FS NAS FS NAS FS NAS ComputeNode ComputeNode ComputeNode … ComputeNode Network installation incl. Management Network DFN 1Gbit/s GridKa router 1 Gbit/s Storage direct internet access Master Controller LoginServer • Ganglia • Nagios • Cacti RM switch LoginServer 1 Gbit/s … router private network … … RM … switch FS RM … … FS RM SAN … RM RM FS
Block C Block D Block B Block A FS FS FS FS FS FS FS FS CN - Rack CN - Rack CN - Rack CN - Rack CN - Rack CN - Rack CN - Rack CN - Rack … … … … Block Adminis-tration Block Adminis-tration Block Adminis-tration Block Adminis-tration FS FS FS FS FS FS FS FS CN - Rack CN - Rack CN - Rack CN - Rack CN - Rack CN - Rack CN - Rack CN - Rack … … … … CN - Rack CN - Rack CN - Rack CN - Rack … … … … … … … … CN - Rack CN - Rack CN - Rack CN - Rack … … … … CN - Rack CN - Rack CN - Rack CN - Rack … … … … CN - Rack CN - Rack CN - Rack CN - Rack FS FS FS FS FS FS FS FS Backbone Router Backbone Router Backbone Router Backbone Router Projection of installation in 2008 2006 completed end 2005 10 Gbit 10 Gbit 10 Gbit Internet 10 Gbit light path to CERN 10 Gbit 10 Gbit 10 Gbit 2007 ?? 10 Gbit
Géant 10 Gbps CERN DFN 2.4 Gbps WAN 2003/4 -- Gigabit GridKa – CERN (DataTag) Frankfurt Geneva 1000 98% of 1Gbit Karlsruhe GridFTP tested over 1 Gbps 2x 1 Gbps 0 GridFTP server GridFTP server
Géant 10 Gbps CERN DFN 10 Gbps 10Gigabit WAN SC GridKa – CERN(openlab) Frankfurt Geneva swiCE3-G4-3.switch.ch swiCE2-P6-1.switch.ch it.ch1.ch.geant.net de.it1.it.geant.net dfn.de1.de.geant.net Karlsruhe cr-karlsruhe1-po12-0.g-win.dfn.de 10 Gbps 10 Gbps ar-karlsruhe1-ge5-2-700.g-win.dfn.de r-internet.fzk.de GridFTP server GridFTP server
Géant 10 Gbps CERN DFN LSP Routing 10 Gbps 10Gigabit WAN SC GridKa – CERN(openlab) Frankfurt Geneva swiCE3-G4-3.switch.ch fr.ch1.ch.geant.net de.fr1.fr.geant.net dfn.de1.de.geant.net Karlsruhe cr-karlsruhe1-po12-0.g-win.dfn.de 10 Gbps 10 Gbps ar-karlsruhe1-ge5-2-700.g-win.dfn.de r-internet.fzk.de GridFTP server GridFTP server
Géant 10 Gbps CERN DFN 10 Gbps 10Gigabit WAN SC GridKa – CERN(openlab) Frankfurt Geneva • Bandwidth evaluation (tcp/udp) • MPLS via France • (MPLS - MultiProtocol Label Switching) • LBE (Least Best Effort) • GridFTP server pool HD to HD Storage to Storage • SRM Karlsruhe 10 Gbps 10 Gbps GridFTP server GridFTP server
Various Xeon dual 2.8 and 3.0 GHz IBM x-series (Intel and Broadcom NIC) Recently added 3.0 GHz EM64T (800 FSB) Cisco 6509 with 4 10 Gb ports and lots of 1 Gb Storage:Datadirect 2A8500 with 16 TB Linux RH ES 3.0 (U2 and U3), GPFS 10 GE Link to GEANT via DFN (least best effort ) Hardware • TCP/IP stack • 4 MB buffer • 2 MB window size
957 Mbit/sec 953 Mbit/sec = = 957 Mbit/sec 884 Mbit/sec Quality Evaluation (UDP stream) scalable 953Mbit; jitter: 0,018 ms; ooo: 215 – 0,008% W A N W A N 884Mbit; jitter: 0,015 ms; ooo: 4239 – 0,018% LAN 10gkt113 oplapro73 [10gtk111] iperf -s –u [10gtk113] iperf -c 192.108.46.111 -u -b 1000M [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0 -10 sec 1141 MBytes 957 Mbits/sec 0.028 ms 0/813885 (0%) Reno symmetric [10gtk113] iperf -s –u [10gtk111] iperf -c 192.108.46.113 -u -b 1000M [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0 -30 sec 3422 MBytes 957 Mbits/sec 0.022 ms 0/2440780 (0%) jitter: 0,021; ooo: 0 957 Mbit 957 Mbit jitter: 0,028; ooo: 0 LAN WAN [oplapro73]> iperf -s -u [10gtk113] iperf -c 192.16.160.13 -u -b 1000M [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0 -30 sec 3408 MBytes 953 Mbits/sec 0.019 ms 7299/2438606 (0.3%) [ 3] 0 -30 sec 215 datagrams received out-of-order asymmetric [oplapro73]> iperf -c 192.108.46.113 -u -b 1000M [10gtk113] iperf -s -u [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0 -30 sec 3162 MBytes 884 Mbits/sec 0.015 ms 130855/2386375 (5.5%) [ 3] 0 -30 sec 4239 datagrams received out-of-order 10gkt111 ooo - out of order
Bandwidth evaluation (TCP stream) Reno W A N W A N 348Mbit/sec – single stream iperf 10gkt101 – 10gtk105 oplapro73 Reno
Bandwidth evaluation Reno W A N W A N 5 nodes a 112MByte/sec – 24 parrallel streams iperf 10gkt10[1-5] Oplapro7[1-5] Reno
Mbit/s 7000 6300 5600 4900 4200 3500 2800 2100 1400 700 18:00 20:00 Evaluation of max throughput • 9 nodes each site • 8 * 845 Mbit • 1 * 540 Mbithigher speed at one stream • is resulting in a packet loss
to hd/SAN to /dev/null Mbit/s 4900 4200 3500 2800 2100 1400 700 4th Feb. 05 6th Feb. 05 7th Feb. 05 8th Feb. 05 5th Feb. 05 Gridftp sc1 throughput Sc1 – 500MByte sustained 19 Nodes - 15 WorkerNodes * 20MByte IDE/ATA HD - 1 FileServer * 50MByte SCSI HD - 3 FileServer * 50MByte SAN
SC2 • five nodes at GridKa • gridftp to gpfs
SC2-part 2 • Trouble shooting with radiant • Shaping diffrent host performances (load bolancing) • Parallel threads did not perfom better • Best performance • 20 parallel file copies • Equal Nodes (no performance diffrenz)
cache Gridftp 2 /dev/null MByte/sec 80 70 60 50 40 30 1,0 1,5 MByte - TCP windos
cache Gridftp 2 gpfs 80 60 40 20 0 MByte/sec 1,0 0,5 1,5 MByte - TCP windos
cache MByte/sec 80 70 60 50 40 30 1,5 1,0 MByte/sec 80 60 40 20 0 MByte - TCP windos 0,5 1,5 1,0
80 70 60 50 40 MByte/sec * * current throughput average throughput cache MByte/sec 80 70 60 50 40 30 20 Gridftp 8Gbyte File 2Gbyte 16% of time 1,0 1,5 MByte - TCP windos 0,5 2,0 1,0 3,0 0 1,5 2,5 messurement points
multi NREN 10Gbit link up to 7.0Mbit usable SC2 part 1 : approx. 1/5 of the CERN aggregated load SC2 part 2 : 250 MByte stable, peak over 400MByte TCP for WAN (un-)modified? Conclusion
Gpfs for data dest. Digging into HW details to discover bottlenecks (packet drop due to bad PCI timing) Stabilise the transport Installation of SRM Installation of dcache Planned 2006 Lightpath via X-Win (DFN) / Geant2 to CERN since sc2 end of March single stream up to 115MByte sustained multistream 60 to 70MByte Future Work some solved, but still ongoing since sc2 is not aproaching the edge, the impreession is far more stable for SC, migrate to production
Forschungszentrum Karlsruhein der Helmholtz-Gemeinschaft thanks for your attention ? Questions