160 likes | 347 Views
DataTAG Project Update CGW’2003 workshop, Crakow (Poland) October 28, 2003. Olivier Martin, CERN, Switzerland. DataTAG partners. http://www.datatag.org. Funding agencies. Cooperating Networks. DataTAG Mission. T rans A tlantic G rid. EU US Grid network research
E N D
DataTAG Project UpdateCGW’2003 workshop, Crakow (Poland)October 28, 2003 Olivier Martin, CERN, Switzerland CGW03, Crakow, 28 October 2003
DataTAG partners http://www.datatag.org
Funding agencies Cooperating Networks
DataTAG Mission TransAtlantic Grid • EU US Grid network research • High Performance Transport protocols • Inter-domain QoS • Advance bandwidth reservation • EU US Grid Interoperability • Sister project to EU DataGRID
Main DataTAG achievements (EU-US Grid interoperability) • GLUE Interoperability effort with DataGrid, iVDGL & Globus • GLUE testbed & demos • VOMS design and implementation in collaboration with DataGrid • VOMS evaluation within iVDGL underway • Integration of GLUE compliant components in DataGrid and VDT middleware 5
Main DataTAG achievements (Advanced networking) • Internet landspeed records have been beaten one after the other by DataTAG project members and/or teams closely associated with DataTAG: • Atlas Canada lightpath experiment (iGRID2002) • New Internet2 landspeed record (I2 LSR) by Nikhef/Caltech team (SC2002) • Scalable TCP, HSTCP, GridDT & FAST experiments (DataTAG partners & Caltech) • Intel 10GigE tests between CERN (Geneva) and SLAC (Sunnyvale) – (Caltech, CERN, Los Alamos NL, SLAC) • New I2LSR (Feb 27-28, 2003): 2.38Gb/s sustained rate, single TCP/IP v4 flow, 1TB in one hour • Caltech-CERN • Latest IPv4 & IPv6 I2LSR were awarded live from Indianapolis during Telecom World 2003: • May 6, 2003: 987 Mb/s single TCP/IP v6 stream • Oct 1, 2003, 5.44 Gb/s sustained rate, single TCP/IP v4 stream, 1.1TB in 26 minutes -> 1 680MB CD/second 6
Significance of I2LSR to the Grid? • Essential to establish the feasibility of multi-Gigabit/second single stream IPv4 & IPv6 data transfers: • Over dedicated testbeds in a first phase • Then across academic & research backbones • Last but not least across campus network • Disk to disk rather than memory to memory • Study impact of high performance TCP over disk servers • Next steps: • Above 6Gb/s expected soon between CERN and Los Angeles (Caltech/CENIC PoP) across DataTAG & Abilene • Goal is to reach 10Gb/s with new PCI Express buses • Study alternatives to standard TCP • Non-TCP transport • HSTCP, FAST, Grid-DT, etc… 7
Impact of high performance flows across A&R backbones? Possible solutions: • Use of “TCP friendly” non-TCP (i.e. UDP) transport • Use of Scavenger (i.e. less than best effort) services 8
Layer1/2/3 networking (1) • Conventional layer 3 technology is no longer fashionable because of: • High associated costs, e.g. 200/300 KUSD for a 10G router interfaces • Implied use of shared backbones • The use of layer 1 or layer 2 technology is very attractive because it helps to solve a number of problems, e.g. • 1500 bytes Ethernet frame size (layer1) • Protocol transparency (layer1&2) • Minimum functionality hence, in theory, much lower costs (layer1&2)
Layer1/2/3 networking (2) • So called, « lambda Grids » are becoming very popular, • Pros: • circuit oriented model like the telephone network, hence no need for complex transport protocols • Lower equipment costs (i.e. typically a factor 2 or 3 per layer) • the concept of a dedicated end to end light path is very elegant • Cons: • « End to end » still very loosely defined, i.e. site to site, cluster to cluster or really host to host • High cost, Scalability & Additional required middleware to deal with circuit set up, etc
Multi vendor 2.5Gb/s layer 2/3 testbed Layer 3 INRIA Layer 2 Layer 1 VTHD Routers L3 Servers GigE switch A1670 Multiplexer GigE switch A-7770 2.5G 2*GigE C-7606 To STARLIGHT 8*GigE P-8801 CERN C-ONS15454 J-M10 10G Super-Janet UvA GEANT From CERN Ditto 2.5G Abilene PPARC GARR ESNet Canarie L2 Servers STARLIGHT INFN/CNAF 12
State of 10G deployment and beyond • Still little deployed, because of lack of demand, hence: • Lack of products • High costs, e.g. 150KUSD for a 10GigE port on a Juniper T320 router • Even switched, layer 2, 10GigE ports are expensive, however the prices should come down to 10KUSD/port towards the end of 2003. • 40G deployment, although more or less technologically ready, is unlikely to happen in the near future, i.e. before LHC starts
10G DataTAG testbed extension to Telecom World 2003 and Abilene/Cenic On September 15, 2003, the DataTAG project was the first transatlantic testbed offering direct 10GigE access using Juniper’sVPN layer2/10GigE emulation. Sponsors: Cisco, HP, Intel, OPI (Geneva’s Office for the Promotion of Industries & Technologies), Services Industriels de Geneve, Telehouse Europe, T-Systems
Impediments to high E2E throughput across LAN/WAN infrastructure • For many years the Wide Area Network has been the bottlemeck, this is no longer the case in many countries thus, in principle, making the deployment of data intensive Grid infrastructure possible! • Recent I2LSR records show for the first time ever that the network can be truly transparent and that throughputs are limited by the end hosts • The dream of abundant bandwith has now become a reality in large, but not all, parts of the world! • Challenge shifted from getting adequate bandwidth to deploying adequate LANs and cybersecurity infrastructure as well as making effective use of it! • Major transport protocol issues still need to be resolved, however there are many encouraging signs that practical solutions may now be in sight. 15
Single TCP stream performance under periodic losses • TCP throughput is much more sensitive to packet loss in WANs than in LANs • TCP’s congestion control algorithm (AIMD) is not suited to gigabit networks • Poor limited feedback mechanisms • The effect of even very small packet loss rates is disastrous • TCP is inefficient in high bandwidth*delay networks • The future performance of data intensive grids looks grim if we continue to rely on the widely-deployed TCP RENO stack Loss rate =0.01%: • LAN BW utilization= 99% • WAN BW utilization=1.2% Bandwidth available = 1 Gbps