220 likes | 367 Views
Why you shouldn’t take the network for granted. Freek Dijkstra University of Amsterdam. What’s wrong with the Internet? Short intro to TCP flaws Hint: Your computer may be the culprit Paradigm: Optical networks What’s new is not really the “optical” part Find out about Hybrid networks
E N D
Why you shouldn’t take the network for granted Freek Dijkstra University of Amsterdam
What’s wrong with the Internet? Short intro to TCP flaws Hint: Your computer may be the culprit Paradigm: Optical networks What’s new is not really the “optical” part Find out about Hybrid networks Problem driven research: Optical networks in practice IP addressing in Optical networks iGrid demo using Zeroconf techniques If time allows: short demo Outline 2
Nothing really, unless you’re using “big fat pipes” TCP is flawed for these connections It has scaled extremely well, but not enough Most end hosts are not optimally tuned for those connections You can’t just replace TCP on the Internet Using the routed Internet is not always efficient Your application may depend on specific QoS settings, like guaranteed bandwidth or low jitter If you send a lot of data between A and B, why do you need a router? What’s wrong with the Internet? 3
Bandwidth-delay product (BDP): bandwidth • one way delay Bandwidth-delay product = the amount of data “in flight” All data must be buffered till ACK comes back (TCP is reliable), so you need 2•BDP as buffer size TCP Tuning for Big Fat Pipes(a.k.a. What’s wrong with your computer) bandwidth One way delay Download from Sweden: 43 ms RTT, 100 Mbit/s: 0.043s * 12.5 Mbyte/s = 0.538 Mbyte =538 kByte = 525 kiByte iGrid: 182ms, 1 Gbit/s: 21.7 MiByte # Set the following sysctl parameters: # increase TCP max buffer size net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 # increase Linux autotuning TCP buffer limits # min, default, and max number of bytes to use net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 # see http://dsd.lbl.gov/TCP-Tuning/ • Linux default send/receive buffer size: 64 kiByte, default max.: 256 kiByte. 4 Source: Brian Tierney – TCP Tuning Tutorial
TCP is very friendly towards other traffic: it backs of at the first sign of congestion Congestion Window = the number of packets the sender is allowed to send Throughput = TCP overview Window Size Round Trip Time 5 Source: Brian Tierney – TCP Tuning Tutorial
Maximum Transmission Unit (MSS + IP & TCP headers) Congestion Avoidance Congestion window decreased by 50% Congestion window increased by 1 packet per round trip time (RTT) 6 Sources: Bill Allcock, Richard Hughes-Jones
About 200 variants of TCP BSD (1983-1986), BSD Tahoe (1988), BSD Reno (1990), Vegas (1994), NewReno (1999), BIC TCP (2004) More experimental variants: S-TCP, FAST, HS-TCP, H-TCP, HSTCP-LP, PTCP, CUBIC, TCP Westwood+, SACK, XCP, SCTCP UDP-Based Transport Mechanisms RBUDP, UDT (SABUL), Tsunami, DCCP A protocol needs to be Fair and TCP-Friendly Alternatives to TCP 100xTCP 81xTCP RBUDP UDT 7 Sources: Les Cottrell, Steven Low, Eric He, Hans Blom
For more information, see (amongst others): TCP Tuning Guide by Brain Tierney, Lawrence Berkeley National Laboratory http://dsd.lbl.gov/TCP-tuning/ TCP-Tuning Guide WAN/ LAN PHY capacity difference Slow hardware bus Small MTU Beware of pitfalls: Slow TCP recovery TCP buffer size Buffer in switches 8
ΣC >> 100 Gb/s ΣB ≈ 30 Gb/s ΣA ≈ 40 Gb/s User Classes • Lightweight users, browsing, mailing, home use Need full Internet routing, one to many • Business apps, multicast, streaming, VPN’s, mostly LAN Need VPN services and full Internet routing, several to several + uplink • Scientific applications, distributed data processing, grids Need very fat pipes, limited Virtual Organizations, few to few, p2p number of users A C B GigE ADSL 9 Source: Cees de Laat Bandwidth requirements
Optical Components are relatively cheap Moore’s law: Processor speed doubles every 18 months Disk capacity doubles every 12 months Network capacity doubles every 9 months The network is no longer the bottleneck Mmap I/O LRAM LRAM Prefetch Paradigm: Optical Networks 10 LambdaRAM Source: Jason Leigh (EVL)
Routers are expensive Give each packet in the network the service it needs, but no more. Look for a hybrid architecture which serves all classes in a cost effective way Paradigm: Hybrid Networks Optical Cross Connect (OSI layer 1) ~ 1 k€/port Ethernet Switch (OSI layer 2) ~ 7 k€/port IP Router (OSI layer 3) ~ 75 k$/port 11 Source: Cees de Laat
Use dedicated lightpaths to create optical private networks Examples: SURFnet6, GLIF community http://www.glif.is/ Tier-1 DAS Router DAS Router VLBI DAS Router Core Router VLBI Tier-1 Hybrid Networks in practice 12
GLIF (Global Lambda Integrated Facility) 13 Source: http://www.glif.is
Manual: e-mail your NRN netmaster-optical@surfnet.nl Flyer Lichtpaden: http://www.surfnet.nl/staging/attachment.db?907629 Multidomain problem Path discovery More complex then routing on the Internet. Need intelligence. Automatic path setup SURFnet orders SARA (or Telindus) to manually configure the path. Fault detection Something is wrong, but there is no traceroute. SURFnet pays a lot just to see bit error rates at each device (SONET/SDH equipment) End node configuration Which IP range to use? + A long time The daily practice: Obtaining a Lightpath 2 days 5 days 5 days 3 days 1 day 14
We already have a Dijkstra algorithm... Link-state algorithms assume you have full topology knowledge. We don’t. That’s why BGP uses Bellman-Ford (a distance-vector algorithm) We already have Bellman-Ford... Path discovery in optical networks requires usage of multiple parameters, or massive overprovisioning Multiple parameters: Route (regular metric) Availability of links Usage permission Technology match What’s the Big Deal about Path Discovery? TU Delft is doing research on routing algorithms with multiple metrics (multi-constraint algorithms), however, those algorithms still assume full topology knowledge. See: SAMCRA, DIMCRA by Fernando Kuipers and Piet van Mieghem 15
ITU-T recommendation G.805 Only describes network connections, not networks Excellent notion of layering DMTF standard CIM (Common Information Model) Is very geared towards hardware, not towards networks and network device capabilities (hardly a notion of layering) Existing GLIF Ideas Central database: Does not scale Put it in DNS: (ab)use of DNS? Our idea: Use RDF and Semantic Web Everyone publishes their own network descriptions, pointing to other descriptions. Readable by computers and humans Allows generation of maps, facilitates path discovery So far described physical network. Extend with domain abstraction Extend with layering, based on G.805 model Network Description Language 16 See also: http://www.science.uva.nl/~vdham/research/ndl/
Existing tools: UCLP (four flavours) Targets long-lived connections Currently only supports single-domain SONET/ SDH networks StarPlane New VU/UvA project Targets short-lived connections Looks at both control plane as well as how applications can benefit from this flexibility Automatic Configuration 17 See also: http://www.starplane.org
Problem: After a LightPath has been created, time is spent to manually configure IP addresses. DHCP will not work out-of-the-box, since it is not clear which domain should run it. Automated Solution show at iGrid 2005 Used Zero Configuration: Automatic configuration of network without a central authority Solution: use self-assigned IP addresses Use multicast DNS to link IP address and hostname and visa versa Use DNS Service Discovery to find out the host name where services are running End Node Configuration Cluster domain 1 Cluster domain 2 LightPath 18
Use Zero Configuration protocols Automatic configuration of IP addresses RFC3927 for IPv4 or RFC2462 for IPv6 Name lookup of hosts Multicast DNS (mDNS) or Link-Local Multicast Name Resolution (LLMNR) Discovery of services DNS Service Discovery (DNS-SD), or Simple Service Discovery Protocol (SSDP, in UPnP), or Service Location Protocol (SLP) (or even UDDI, SDP, Salutation, or Jini) Three software suites, used multiple implementations: RFC3927: ZCIP and autoip for Linux, native in OS X and Windows mDNS: mDNSResponder, tmdns, and Porchdog mDNS hooking gethostby*() to use mDNS: tmdns and libnss_mdns Technologies and Implementations 19
Used broadcast ping to discover hosts Used multicast DNS and gethostbyaddr() hook to discover hostnames Tested IP collisions Also demonstrated service discovery through DNS Encountered a few implementation bugs... iGrid 2005 Demonstration 20
Optical Networks Niels Roosen Bert Andree Li Xu Throughput Freek Dijkstra Jeroen van der Ham Karst Koymans Cees de Laat Paola Grosso AAA Authentication, Authorisation and Accounting Hans Blom Leon Gommans Arie Taal Security Yuri Demchenko Fred Wan Martijn Steenbakkers Jaap van Ginkel AIR Group Research This presentation 21
That’s All Folks! That’s All Folks! for today... http://www.science.uva.nl/~fdijkstr/ http://www.science.uva.nl/research/air/