School of Computing Science Simon Fraser University

School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture & Protocols Network Layer Instructor: Dr. Mohamed Hefeeda

Review of Basic Networking Concepts • Internet structure • Protocol layering and encapsulation • Internet services and socket programming • Network Layer • Network types: Circuit switching, Packet switching • Addressing, Forwarding, Routing • Transport layer • Reliability and congestion control • TCP, UDP • Link Layer • Multiple Access Protocols • Ethernet

Mesh of interconnected routers The fundamental question: how is data transferred through net? circuit switching: dedicated circuit per call: telephone net packet-switching: data sent thru net in discrete “chunks” The Network Core

Network resources (e.g., bandwidth) divided into “pieces” using Frequency division multiplexing (FDM) Time division multiplexing (TDM) Pieces allocated to “calls” (connections)  guaranteed performance Resource piece idle if not used by owning call no sharing Connection setup is required Examples (Traditional) Telephone network Network Core: Circuit Switching

Circuit Switching: Dedicated Circuits

each end-end data stream divided into packets packets from different users share network resources each packet uses full link bandwidth resources used asneeded store and forward: packets move one hop at a time Node receives complete packet before forwarding Bandwidth division into “pieces” Dedicated allocation Resource reservation Network Core: Packet Switching resource contention: • aggregate resource demand can exceed amount available • congestion: packets queue, wait for link use

Sequence of A & B packets does not have fixed pattern, shared on demand  statistical multiplexing In contrast, in TDM each host gets same slot in revolving TDM frame D E Packet Switching: Statistical Multiplexing 10 Mb/s Ethernet C A statistical multiplexing 1.5 Mb/s B queue of packets waiting for output link

1 Mb/s link each user: 100 kb/s when “active” active 10% of time circuit-switching: 10 users packet switching: with 35 users, probability > 10 active less than 0 .0004 Packet switching allows more users to use network! N users 1 Mbps link Packet Switching: Efficiency Q: how did we get value 0.0004?

Advantages no call setup  simpler resource sharing (statistical multiplexing)  better resource utilization more users or faster transfer (a single user can use entire bw) Well suited for bursty traffic (typical in data networks) Disadvantages Congestion may occur  packet delay and loss need protocols to control congestion and ensure reliable data transfer Packet Switching

Datagram network Example: The Internet Virtual-circuit network Examples: ATM (Asynchronous Transfer Mode), frame relay, X.25 Packet Switching: Two Classes

no call setup at network layer routers: no state about end-to-end connections no network-level concept of “connection” packets forwarded using destination host address packets between same source-dest pair may take different paths application transport network data link physical application transport network data link physical 1. Send data 2. Receive data Packet-switched Datagram Networks

Source-to-dest path behaves much like telephone circuit  performance-wise connection setup, teardown for each call before data can flow each packet carries VC identifier (not destination address) every router on source-dest path maintains state for each passing connection link, router resources (bandwidth, buffers) may be allocated to VC Examples: ATM (Asynchronous Transfer Mode), frame relay, X.25 Packet-switched VC Networks

Signaling protocols are used to setup, maintain, and teardown VCs Note: not widely used in the current Internet application transport network data link physical application transport network data link physical VC Networks: Connection Setup 6. Receive data 5. Data flow begins 4. Call connected 3. Accept call 1. Initiate call 2. incoming call

Telecommunication networks Packet-switched networks Circuit-switched networks FDM TDM Datagram Networks Networks with VCs Network Taxonomy

Network layer protocols in every host and router Network layer’s goal transport data from sending host to receiving host We focus on datagram networks (Internet) network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical application transport network data link physical application transport network data link physical Network Layer

Host, router network layer functions: • ICMP protocol • error reporting • router “signaling” • IP protocol • addressing conventions • datagram format • packet handling conventions • Routing protocols • path selection • RIP, OSPF, BGP forwarding table Network Layer in the Internet Transport layer: TCP, UDP Network layer Link layer physical layer

routing algorithm local forwarding table header value output link 0100 0101 0111 1001 3 2 2 1 value in arriving packet’s header 1 0111 2 3 Routing vs. Forwarding • Routing • determine route taken by packets from source to destination • Routing algorithms, e.g., RIP, OSPF, BGP • Forwarding • move packets from router’s input to appropriate output • use forwarding table populated by routing algorithm • E.g., IP forwarding function

IP Datagram Format IP protocol version number 32 bits total datagram length (bytes) header length (bytes) type of service head. len ver length for fragmentation/ reassembly fragment offset Provides some QoS flgs 16-bit identifier max number remaining hops (decremented at each router) upper layer time to live Internet checksum 32 bit source IP address 32 bit destination IP address upper layer protocol to deliver payload to E.g. timestamp, record route taken, specify list of routers to visit. Options (if any) data (variable length, typically a TCP or UDP segment) IP ver 4.0

IP address: 32-bit identifier for each host, router networkinterface Represented in Dotted-decimal notation 223 1 1 1 IP Addressing: Introduction 11011111 00000001 00000001 00000001 223.1.1.1

Network interface: connection between host/router and physical link routers typically have multiple interfaces host typically has one interface Unique IP addresses associated with each interface 223.1.1.2 223.1.2.2 223.1.2.1 223.1.3.2 223.1.3.1 223.1.3.27 IP Addressing 223.1.1.1 How do we assign IPs? 223.1.2.9 223.1.1.4 223.1.1.3 Divide network into subnets, each has a common ID

Subnet is: a group of devices that can reach each other without intervening router identified by high order bits of IP addresses 223.1.1.0/24 223.1.2.0/24 223.1.3.0/24 Subnets 11011111 00000001 00000001 00000001 HostID Subnet ID 223.1.1.0/24 /24: # bits in subnet portion of address, subnet mask

How many subnets? 6 subnets Recipe: detach each interface from its host or router, creating isolated networks Each isolated network is a subnet 223.1.1.2 223.1.1.1 223.1.1.4 223.1.1.3 223.1.7.0 223.1.9.2 223.1.9.1 223.1.7.1 223.1.8.1 223.1.8.0 223.1.3.27 223.1.2.6 223.1.2.1 223.1.2.2 223.1.3.1 223.1.3.2 Subnets

host part subnet part 11001000 0001011100010000 00000000 200.23.16.0/23 IP Addressing: CIDR • CIDR:Classless InterDomain Routing • subnet portion of address of arbitrary length • address format: a.b.c.d/x, where x is # bits in subnet portion of address • Old Classful Addressing: • Subnet length had to be /8 (class A), /16 (class B), /24 (class C) • Why CIDR? • Finer control over address allocation  reduce waste of addresses • Ex: company with 2000 machines would have to get class B, wasting 63,000+ addresses

IP Addresses: How to Get One? Q: How does host get IP address? • hard-coded by system admin in a file • WIN: control-panel->network->configuration->tcp/ip->properties • UNIX: /etc/rc.config • DHCP:Dynamic Host Configuration Protocol: dynamically get address from as server • “plug-and-play”

IP Addresses: How to Get One? Q: How does network get subnet part of IP addr? A: gets allocated portion of its provider ISP’s address space ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20 Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23 Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23 Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23 ... ….. …. …. Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23 • ISPs get their address space from ICANN • ICANN: Internet Corporation for Assigned Names and Numbers • allocates addresses, manages DNS and assigns domain names

Organization 0 200.23.16.0/23 200.23.18.0/23 200.23.30.0/23 200.23.20.0/23 Organization 1 “Send me anything with addresses beginning 200.23.16.0/20” Organization 2 Fly-By-Night-ISP Internet Organization 7 . . . . . . “Send me anything with addresses beginning 199.31.0.0/16” ISPs-R-Us Hierarchical Addressing: Route Aggregation Hierarchical addressing allows efficient advertisement of routing information:

5 3 5 2 2 1 3 1 2 1 x w z u y v Graph Abstraction • Graph: G = (N,E) • N = set of routers = {u, v, w, x, y, z } • E = set of links ={(u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z)} • cost of link (x1, x2): • Metric value, e.g., c(w,z) = 5 • could be 1 (typical), or • inversely related to bandwidth, or • inversely related to congestion Routing algorithm: find the least-cost path

Global or local information? Global: all routers have complete topology, link cost info “link state” algorithms Local: each router knows physically-connected neighbors, link costs to neighbors “distance vector” algorithms Classification of Routing Algorithms

Dijkstra’s algorithm net topology, link costs known to all nodes accomplished via “link state broadcast” all nodes have same info computes least cost paths from one node (source) to all other nodes gives forwarding table for that node A Link-State Routing Algorithm

Notation: c(x,y): link cost from node x to y; c(x,y) = ∞ if not direct neighbors D(v): current value of cost of path from source to dest. v p(v): predecessor node along path from source to v N': set of nodes whose least cost path definitively known A Link-State Routing Algorithm

Dijsktra’s Algorithm 1 Initialization: 2 N' = {u} 3 for all nodes v 4 if v adjacent to u 5 then D(v) = c(u,v) 6 else D(v) = ∞ 7 8 Loop 9 find w not in N' such that D(w) is a minimum 10 add w to N' 11 update D(v) for all v adjacent to w and not in N' : 12 D(v) = min { D(v), D(w) + c(w,v) } 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N'

5 3 5 2 2 1 3 1 2 1 x z w y u v Dijkstra’s algorithm: example D(v),p(v) 2,u 2,u 2,u D(x),p(x) 1,u D(w),p(w) 5,u 4,x 3,y 3,y D(y),p(y) ∞ 2,x Step 0 1 2 3 4 5 N' u ux uxy uxyv uxyvw uxyvwz D(z),p(z) ∞ ∞ 4,y 4,y 4,y

x z w u y v destination link (u,v) v (u,x) x y (u,x) (u,x) w z (u,x) Dijkstra’s algorithm: example (2) Resulting shortest-path tree from u: Resulting forwarding table in u:

Distance Vector Algorithm Bellman-Ford Equation (dynamic programming) Define dx(y) := cost of least-cost path from x to y Then dx(y) = min {c(x,v) + dv(y) } where min is taken over all neighbors v of x v

5 3 5 2 2 1 3 1 2 1 x z w u y v Bellman-Ford example Determine du(z) u has 3 neighbors: v, x, w and dv(z) = 5, dx(z) = 3, dw(z) = 3 B-F equation says: du(z) = min { c(u,v) + dv(z), c(u,x) + dx(z), c(u,w) + dw(z) } = min {2 + 5, 1 + 3, 5 + 3} = 4 How would you use BF equation to construct shortest paths?

Distance Vector Algorithm: Idea Basic idea: • Each node periodically sends its own distance vector estimate to neighbors • When a node x receives new DV estimate from neighbor, it updates its own DV using B-F equation: Dx(y) ← minv{c(x,v) + Dv(y)} for each node y ∊ N • Under minor, natural conditions, the estimate Dx(y) converge to the actual least cost dx(y)

Distance Vector Algorithm: Notes • Dx(y) = estimate of least cost from x to y • Distance vector: Dx = [Dx(y): y є N ] • Node x knows cost to each neighbor v: c(x,v) • Node x maintains Dx = [Dx(y): y є N ] • Node x also maintains its neighbors’ distance vectors, that is: • x maintains Dv = [Dv(y): y є N ] for every neighbor v

Iterative Continues until no more info is exchanged Each iteration caused by: local link cost change DV update message from neighbor Asynchronous Nodes do not operate in lockstep Distributed Each node receives info only from its directly attached neighbors NO Global info wait for (change in local link cost or msg from neighbor) recompute estimates if DV to any dest has changed, notify neighbors Distance Vector Algorithm Each node:

cost to x y z x 0 2 7 y from ∞ ∞ ∞ z ∞ ∞ ∞ 2 1 7 z x y Dx(z) = min{c(x,y) + Dy(z), c(x,z) + Dz(z)} = min{2+1 , 7+0} = 3 Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)} = min{2+0 , 7+1} = 2 node x table cost to cost to x y z x y z x 0 2 3 x 0 2 3 y from 2 0 1 y from 2 0 1 z 7 1 0 z 3 1 0 node y table cost to cost to cost to x y z x y z x y z x ∞ ∞ x 0 2 7 ∞ 2 0 1 x 0 2 3 y y from 2 0 1 y from from 2 0 1 z z ∞ ∞ ∞ 7 1 0 z 3 1 0 cost to cost to Example node z table cost to x y z x y z x y z x 0 2 7 x 0 2 3 x ∞ ∞ ∞ y y 2 0 1 from from y 2 0 1 from ∞ ∞ ∞ z z z 3 1 0 3 1 0 7 1 0 time

Message complexity LS: with n nodes, E links, O(nE) msgs sent DV:exchange between neighbors only But send entire table Speed of Convergence LS: O(n2) algorithm requires O(nE) msgs may have oscillations DV: convergence time varies may be routing loops count-to-infinity problem Robustness: what happens if router malfunctions? LS: node can advertise incorrect link cost each node computes only its own table  some degree of robustness DV: node can advertise incorrect path cost each node’s table used by others error propagates thru network In The Internet: LS: OSPF (recent, more features) DV: RIP (old, small nets) Comparison of LS and DV algorithms

scale: with 200 million destinations: can’t store all dest’s in routing tables! routing table exchange would swamp links! administrative autonomy internet = network of networks each network admin may want to control routing in its own network Hierarchical Routing Our routing study thus far - idealization • all routers identical • network “flat” … not true in practice

aggregate routers into regions, “autonomous systems” (AS) routers in same AS run same routing protocol “intra-AS” routing protocol routers in different ASes can run different intra-AS routing protocols Gateway router Direct link to router in another AS, must use same inter-AS routing protocol Hierarchical Routing

Forwarding table is configured by both intra- and inter-AS routing protocols Intra-AS sets entries for internal destinations Inter-AS & Intra-As sets entries for external destinations 3a 3b 2a AS3 AS2 1a 2c AS1 2b 3c 1b 1d 1c Inter-AS Routing protocol Intra-AS Routing protocol Forwarding table Interconnected ASes

Suppose router in AS1 receives datagram for which dest is outside of AS1 Router should forward packet towards one of the gateway routers, but which one? AS1 needs: to learn which dests are reachable through AS2 and which through AS3 to propagate this reachability info to all routers in AS1 Job of inter-AS routing! 3a 3b 2a AS3 AS2 1a AS1 2c 2b 3c 1b 1d 1c Inter-AS tasks

Determine from forwarding table the interface I that leads to least-cost gateway. Use routing info from intra-AS protocol to determine costs of least-cost paths to each of the gateways Learn from inter-AS protocol that subnet x is reachable via multiple gateways Hot potato routing: Choose the gateway that has the smallest least cost Example: Choosing among multiple ASes • Now suppose AS1 learns from the inter-AS protocol that subnet x is reachable from AS3 and from AS2 • To configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x • Hot potato routing: send packet towards closest of two routers

Internet inter-AS routing: BGP • BGP (Border Gateway Protocol):the de facto standard • BGP provides each AS a means to: • Obtain subnet reachability information from neighboring Ases (reachability = AS path) • Propagate the reachability information to all routers internal to the AS • Determine “good” routes to subnets based on reachability information and policy • BGP allows a subnet to advertise its existence to rest of the Internet: “I am here”

3a 3b 2a AS3 AS2 1a 2c AS1 2b eBGP session 3c 1b 1d 1c iBGP session BGP basics • Pairs of routers (BGP peers) exchange routing info over semi-permanent TCP connections: BGP sessions • Note: BGP sessions do not correspond to physical links • When AS2 advertises a prefix to AS1, AS2 is promising it will forward any datagrams destined to that prefix towards the prefix • AS2 can aggregate prefixes in its advertisement

3a 3b 2a AS3 AS2 1a 2c AS1 2b eBGP session 3c 1b 1d 1c iBGP session Distributing reachability info • With eBGP session between 3a and 1c, AS3 sends prefix reachability info to AS1 • 1c can then use iBGP to distribute this new prefix reachability info to all routers in AS1 • 1b can then re-advertise the new reachability info to AS2 over the 1b-to-2a eBGP session • When router learns about a new prefix, it creates an entry for the prefix in its forwarding table.

School of Computing Science Simon Fraser University