680 likes | 691 Views
This lecture covers the network layer functions, implementation of the network layer with IP, router architecture overview, routing versus forwarding, input port functions, queuing, and forwarding engine.
E N D
CSE 524: Lecture 9 Network layer (Part 4)
Administrative • Homework #3 due Wednesday (10/24) • E-mail confirmation of research paper topic due next Monday (10/29) • Midterm next Monday (10/29)
Network layer (so far) • Network layer functions • Network layer implementation (IP) • Today • Network layer devices (routers) • Network processors • Input/output port functions • Forwarding functions • Switching fabric • Advanced network layer topics • Routing problems • Routing metric selection • Overlay networks
NL: Router Architecture Overview Key router functions: • Run routing algorithms/protocol (RIP, OSPF, BGP) and construct routing table • Switch/forward datagrams from incoming to outgoing link based on route
NL: Routing vs. Forwarding • Routing: process by which the forwarding table is built and maintained • One or more routing protocols • Procedures (algorithms) to convert routing info to forwarding table. • Forwarding: the process of moving packets from input to output • The forwarding table • Information in the packet
NL: What Does a Router Look Like? • Network processor/controller • Handles routing protocols, error conditions • Line cards • Network interface cards • Forwarding engine • Fast path routing (hardware vs. software) • Backplane • Switch or bus interconnect
NL: Network Processor • Runs routing protocol and downloads forwarding table to forwarding engines • Use two forwarding tables per engine to allow easy switchover (double buffering) • Typically performs “slow” path processing • ICMP error messages • IP option processing • IP fragmentation • IP multicast packets
NL: Fast-path router processing • Packet arrives arrives at inbound line card • Header transferred to forwarding engine • Forwarding engine determines output interface • Forwarding engine signals result to line card • Packet copied to outbound line card
NL: Input Port Functions Decentralized switching: • given datagram dest., lookup output port using routing table in input port memory • goal: complete input port processing at ‘line speed’ • queuing: if datagrams arrive faster than forwarding rate into switch fabric Physical layer: bit-level reception Data link layer: e.g., Ethernet see chapter 5
NL: Input Port Queuing • Fabric slower than input ports combined => queuing may occur at input queues • Head-of-the-Line (HOL) blocking: queued datagram at front of queue prevents others in queue from moving forward • queueing delay and loss due to input buffer overflow!
NL: Input Port Queuing • Possible solution • Virtual output buffering • Maintain per output buffer at input • Solves head of line blocking problem • Each of MxN input buffer places bid for output • Crossbar connect • Challenge: map of bids to schedule for crossbar
NL: Forwarding Engine • General purpose processor + software • Packet trains help route hit rate • Packet train = sequence of packets for same/similar flows • Similar to idea behind IP switching (ATM/MPLS) where long-lived flows map into single label • Example • Partridge, et. al. “A 50-Gb/s IP Router”, IEEE Trans. On Networking, Vol 6, No 3, June 1998. • 8KB L1 Icache • Holds full forwarding code • 96KB L2 cache • Forwarding table cache • 16MB L3 cache • Full forwarding table x 2 - double buffered for updates
NL: Binary trie Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* 0 1 A D 1 0 1 0 1 0 0 1 C E 0 0 1 0 1 F G H I 0 B
NL: Path-compressed binary trie • Eliminate single branch point nodes • Variants include PATRICIA and BSD tries Bit=1 Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* 0 1 Bit=3 A Bit=2 D 0 1 0 1 B C E Bit=3 0 1 Bit=4 Bit=4 0 1 0 1 F G H I
NL: Patricia tries and variable prefix match • Patricia Tree • Arrange route entries into a series of bit tests • Worst case = 32 bit tests • Problem: memory speed is a bottleneck • Used in older BSD Unix routing implementations 0 Bit to test – 0 = left child,1 = right child 10 default 0/0 16 128.2/16 19 128.32/16 128.32.130/240 128.32.150/24
NL: Multi-bit tries • Compare multiple bits at a time • Reduces memory accesses • Forces table expansion for prefixes falling in between strides • Variable-length multi-bit tries • Fixed-length multi-bit tries • Most route entries are Class C • Cut prefix tree at 16 bit depth • Many prefixes 8, 16, 24 bits in length • 64K bit mask • Bit = 1 if tree continues below cut (root head) • Bit = 1 if leaf at depth 16 or less (genuine head) • Bit = 0 if part of range covered by leaf
NL: Variable stride multi-bit trie • Single level has variable stride lengths Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* 00 01 10 11 A A D D 0 1 00 01 10 11 00 01 10 11 C C E F G H I 0 1 B
NL: Fixed stride multi-bit trie • Single level has equal strides Route Prefixes A 0* B 01000* C 011* D 1* E 100* F 1100* G 1101* H 1110* I 1111* 000 001 010 011 100 101 110 111 A A A C E D D D B F F G G H H I I 00 01 10 11 00 01 10 11 00 01 10 11
NL: Other data structures • Ruiz-Sanchez, Biersack, Dabbous, “Survey and Taxonomy of IP address Lookup Algorithms”, IEEE Network, Vol. 15, No. 2, March 2001 • LC trie • Lulea trie • Full expansion/compression • Binary search on prefix lengths • Binary range search • Multiway range search • Multiway range trees • Binary search on hash tables (Waldvogel – SIGCOMM 97)
NL: Prefix Match issues • Scaling • IPv6 • Stride choice • Tuning stride to route table • Bit shuffling
NL: Speeding up Prefix Match - Alternatives • Route caches • Temporal locality • Many packets to same destination • Protocol acceleration • Add clue (5 bits) to IP header • Indicate where IP lookup ended on previous node (Bremler-Barr SIGCOMM 99) • Content addressable memory (CAM) • Hardware based route lookup • Input = tag, output = value associated with tag • Requires exact match with tag • Multiple cycles (1 per prefix searched) with single CAM • Multiple CAMs (1 per prefix) searched in parallel • Ternary CAM • 0,1,don’t care values in tag match • Priority (i.e. longest prefix) by order of entries in CAM
NL: Types of network switching fabrics Memory Crossbar interconnection Multistage interconnection Bus
NL: Types of network switching fabrics • Issues • Switch contention • Packets arrive faster than switching fabric can switch • Speed of switching fabric versus line card speed determines input queuing vs. output queuing
NL: Switching Via Memory First generation routers: • packet copied by system’s (single) CPU • 2 bus crossings per datagram • speed limited by memory bandwidth Modern routers: • input port processor performs lookup, copy into memory • Cisco Catalyst 8500 Memory Input Port Output Port System Bus
NL: Switching Via Bus • Datagram from input port memory to output port memory via a shared bus • Bus contention: switching speed limited by bus bandwidth • 1 Gbps bus, Cisco 1900: sufficient speed for access and enterprise routers (not regional or backbone)
NL: Switching Via An Interconnection Network • Overcome bus bandwidth limitations • Crossbar networks • Fully connected (n2 elements) • All one-to-one, invertible permutations supported
NL: Switching Via An Interconnection Network • Crossbar with N2 elements hard to scale • Multi-stage interconnection networks (Banyan) • Initially developed to connect processors in multiprocessor • Typically (n log n) elements • Datagram fragmented fixed length cells • Cells switched through the fabric • Cisco 12000: Gbps through an interconnection network • Blocking possible (not all one-to-one, invertible permutations supported) A W B X C Y D Z
NL: Output Ports • Output contention • Datagrams arrive from fabric faster than output port’s transmission rate • Buffering required • Scheduling discipline chooses among queued datagrams for transmission
NL: Output port queueing • buffering when arrival rate via switch exceeds ouput line speed • queueing (delay) and loss due to output port buffer overflow!
NL: Advanced topics • Routing synchronization • Routing instability • Routing metrics • Overlay networks
NL: Routing Update Synchronization • Another interesting robustness issue to consider... • Even apparently independent processes can eventually synchronize • Intuitive assumption that independent streams will not synchronize is not always valid • Periodic routing protocol messages from different routers • Abrupt transition from unsynchronized to synchronized system states
NL: Examples/Sources of Synchronization • TCP congestion windows • Cyclical behavior shared by flows through gateway • Periodic transmission by audio/video applications • Periodic downloads • Synchronized client restart • After a catastrophic failure • Periodic routing messages • Manifests itself as periodic packet loss on pings • Pendulum clocks on same wall • Automobile traffic patterns
NL: How Synchronization Occurs T A Message from B Weak Coupling when A’s behavior is triggered off of B’s message arrival! T A Weak coupling can result in eventual synchronization
NL: Routing Source of Synchronization • Router resets timer after processing its own and incoming updates • Creates weak coupling among routers • Solutions • Set timer based on clock event that is not a function of processing other routers’ updates, or • Add randomization, or reset timer before processing update • With increasing randomization, abrupt transition from predominantly synchronized to predominantly unsynchronized • Most protocols now incorporate some form of randomization
NL: Routing Instability • References • C. Labovitz, R. Malan, F. Jahanian, ``Internet Routing Stability'', SIGCOMM 1997. • Record of BGP messages at major exchanges • Discovered orders of magnitude larger than expected updates • Bulk were duplicate withdrawals • Stateless implementation of BGP – did not keep track of information passed to peers • Impact of few implementations • Strong frequency (30/60 sec) components • Interaction with other local routing/links etc.
NL: Route Flap Storm • Overloaded routers fail to send Keep_Alive message and marked as down • BGP peers find alternate paths • Overloaded router re-establishes peering session • Must send large updates • Increased load causes more routers to fail!
NL: Route Flap Dampening • Routers now give higher priority to BGP/Keep_Alive to avoid problem • Associate a penalty with each route • Increase when route flaps • Exponentially decay penalty with time • When penalty reaches threshold, suppress route
NL: BGP Oscillations • Can possible explore every possible path through network (n-1)! Combinations • Limit between update messages (MinRouteAdver) reduces exploration • Forces router to process all outstanding messages • Typical Internet failover times • New/shorter link 60 seconds • Results in simple replacement at nodes • Down link 180 seconds • Results in search of possible options • Longer link 120 seconds • Results in replacement or search based on length
NL: Routing Metrics • Choice of link cost defines traffic load • Low cost = high probability link belongs to SPT and will attract traffic, which increases cost • Main problem: convergence • Avoid oscillations • Achieve good network utilization
NL: Metric Choices • Static metrics (e.g., hop count) • Good only if links are homogeneous • Definitely not the case in the Internet • Static metrics do not take into account • Link delay • Link capacity • Link load (hard to measure)
NL: Original ARPANET Metric • Cost proportional to queue size • Instantaneous queue length as delay estimator • Problems • Did not take into account link speed • Poor indicator of expected delay due to rapid fluctuations • Delay may be longer even if queue size is small due to contention for other resources
NL: Metric 2 - Delay Shortest Path Tree • Delay = (depart time - arrival time) + transmission time + link propagation delay • (Depart time - arrival time) captures queuing • Transmission time captures link capacity • Link propagation delay captures the physical length of the link • Measurements averaged over 10 seconds • Update sent if difference > threshold, or every 50 seconds
NL: Performance of Metric 2 • Works well for light to moderate load • Static values dominate • Oscillates under heavy load • Queuing dominates
NL: Specific Problems • Range is too wide • 9.6 Kbps highly loaded link can appear 127 times costlier than 56 Kbps lightly loaded link • Can make a 127-hop path look better than 1-hop • No limit to change between reports • All nodes calculate routes simultaneously • Triggered by link update
NL: Example A Net X Net Y B
NL: Example After everyone re-calculates routes: A Net X Net Y B .. Oscillations!
NL: Consequences • Low network utilization (50% in example) • Congestion can spread elsewhere • Routes could oscillate between short and long paths • Large swings lead to frequent route updates • More messages • Frequent SPT re-calculation
NL: Revised Link Metric Better metric: packet delay = f(queueing, transmission, propagation) When lightly loaded, transmission and propagation are good predictors When heavily loaded queueing delay is dominant and so transmission and propagation are bad predictors
NL: Normalized Metric If a loaded link looks very bad then everyone will move off of it Want some to stay on to load balance and avoid oscillations It is still an OK path for some Hop normalized metric diverts routes that have an alternate that is not too much longer Also limited relative values and range of values advertised gradual change
NL: Revised Metric Limits on relative change Measured link delay is taken over 10sec period Link utilization is computed as .5*current sample + .5*last average Max change limited to slightly more than ½ hop Min change limited to slightly less than ½ hop Bounds oscillations Normalized according to link type Satellite should look good when queueing on other links increases