1 / 55

Network Address Translation ✧ Inside Internet Routers

Network Address Translation ✧ Inside Internet Routers. GZ01 Networked Systems Kyle Jamieson Lecture 7 Department of Computer Science University College London. Today. Network address translation (NAT) Inside internet routers Architecture Crossbar scheduling: iSLIP algorithm

ulla-mckay
Download Presentation

Network Address Translation ✧ Inside Internet Routers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Address Translation✧Inside Internet Routers GZ01 Networked Systems Kyle Jamieson Lecture 7 Department of Computer Science University College London

  2. Today • Network address translation (NAT) • Inside internet routers • Architecture • Crossbar scheduling: iSLIP algorithm • Longest-prefix lookup: Luleå algorithm

  3. Network Address Translation (NAT) • Motivation • IP address space exhaustion • Home users don’t want to and can’t manage IP addresses • Often most communication is within a network (e.g. intranet) • NAT: Main idea • Create a private network or realm with its own IP address space • 10.0.0.0−10.255.255.255 (10/8 prefix) • 172.16.0.0−172.31.255.255 (172.16/12 prefix) • 192.168.0.0−192.168.255.255 (192.168/16 prefix) • Private addresses only have meaning within their realm • NAT-enabled router (NAT box) allows communication out

  4. NAT in action S=10.0.0.1:3345 D=128.119.40.186:80 10.0.0.1 • Choice of any source port number not already in table • Transparent to the web server on the Internet • Index into translation table using destination IP:port S=138.76.29.7:5001 D=128.119.40.186:80 S=128.119.40.186:80 D=138.76.29.7:5001 S=128.119.40.186:80 D=138.76.29.7:5001 10.0.0.2 Internet router NAT-enabled router 138.76.29.7 10.0.0.3 NAT translation table WAN side LAN side 138.76.29.7:5001 10.0.0.1:3345

  5. NAT: Discussion • Huge impact on the Internet • America Online (AOL) used to be a NAT: 10+ million users • Now most household Internet routers are NAT boxes • Hosts behind NAT cannot be servers (although NAT traversal algorithms exist) • Objections to NAT • Routers should only process packets only up to L3 • NAT violates the end-to-end argument: hosts should be talking directly with each other, without interfering nodes modifying IP addresses, port numbers • We should use IPv6 (more addresses) rather than a stopgap solution like NAT • What if applications put IP addresses inside the packet payload? • Breaks applications: e.g. FTP, P2P • Breaks end-to-end transparency

  6. Today • Network address translation (NAT) • Inside internet routers • Architecture • Crossbar scheduling: iSLIP algorithm • Longest-prefix lookup: Luleå algorithm Cisco Gigabit Switch Router 12816

  7. The forwarding problem • SONET optical fiber links • OC-48 @ 2.4 Gbits/s: backbones of secondary ISPs • OC-192 @ 10 Gbits/s: widespread in the core • OC-768 @ 40 Gbits/s: deployed in a few core links • Have to handle minimum-sized packets (40−64 bytes) • At 10 Gbits/s have 32−51 ns to decide what to do with each packet • DRAM latency ≈ 50 ns; SRAM latency ≈ 5 ns

  8. Router architecture • Data path: functions performed on each datagram • Forwarding decision • Switching fabric (backplane) • Output link scheduling • Control plane: functions performed relatively infrequently • Routing table information exchange with others • Configuration and management n

  9. Input port functionality • IP address lookup • CIDR longest-prefix match • Copy of forwarding table from control processor • Check IP header, decrement TTL, recalculate checksum, prepend next-hop link-layer address • Input queuing if switch fabric can’t handle n × R bits/second (n input ports, each at rate R) R

  10. Switching fabric

  11. Switching via memory • First generation routers: traditional computers with switching under direct control of CPU • Packet copied from input port across shared bus to RAM • Packet copied from RAM across shared bus to output port • Simple design • All ports share queue memory in RAM • Speed limited by CPU: must process every packet [Image: N. McKeown]

  12. Switching via shared bus • Datagram moves from input port memory to output port memory via a shared bus • e.g. Cisco 5600: 32 Gbit/s bus; sufficient speed for access routers • Eliminate CPU bottleneck • Bus contention: switching speed limited by bus bandwidth • CPU speed still a factor [Image: N. McKeown]

  13. Crossbar interconnect • Why do we need switched backplanes? • Shared buses divide bandwidth among contenders • Electrical reason: speed of bus limited by # connectors • Replaces shared bus • 2n connects join n inputs to n outputs • Multiple input ports communicate simultaneously [Image: N. McKeown]

  14. Switching via crossbar • Datagram moves from input port memory to output port memory via a shared bus • e.g. Cisco 12000 family: 60 Gbit/s; sufficient speed for core routers • Eliminates bus bottleneck • Custom ASIC forwarding engines replace general purpose CPUs • Requires algorithm to determine crossbar configuration Crossbar [Image: N. McKeown]

  15. Switching via an interconnection network • Overcome bus bandwidth limitations • Banyan network • 2x2 switching elements • Self-routing header: use ith bit for ith stage • Block if two arriving packets have same value • Banyan is collision free if packets are presented in ascending order • First layer moves packets to correct upper or lower half based on 1st bit (0↗, 1↘) Banyan with four arriving packets

  16. Sorting networks x1 Sorting network for n elements • Comparator notation • yi= xi if x1 ≤ x2 • y2 = x1, y1 = x2 otherwise • Insertion sort by recursive definition • Batcher network: an efficient sorter • Batcher-Banyan architecture for collision-free switching x2 x1 y1 x3 x2 y2 … … xn−1 xn xn+1 x1 x2 x3 x4 x5 x6

  17. Output port functionality • Output queuing required when datagrams arrive from fabric faster than line transmission rate • Switch fabric forwarding rate ≥ R at any output • Scheduling discipline chooses among output-queued datagrams for transmission at each output port

  18. Where does queuing occur? • Central issue in switch design: three choices • At input ports (input queuing) • At output ports (output queuing) • Some combination of the above

  19. Output queuing • Multiple packets may arrive in one cycle • Output port buffers all packets • Worst case: output port rate required = n × R • Aggregate output rate required n2 × R

  20. Output queuing • Multiple packets may arrive in one cycle • Output port buffers all packets • Worst case: output port rate required = n × R • Aggregate output rate required n2 × R

  21. Input port queuing • Send at most one packet per cycle to an output • Output port rate required: R • Switch fabric forwarding rate required: n × R • Queuing may occur at input ports • Problem: Queued datagram at front of queue prevents others in queue from moving forward • Result: Queuing delay and loss due to input buffer overflow!

  22. Input queuing: Head-of-line blocking • One packet per cycle sent to any output • Blue packet blocked despite available capacity at output ports and in switch fabric

  23. Input queuing: Head-of-line blocking • One packet per cycle sent to each output • Blue packet still blocked despite available capacity

  24. Input queuing: Head-of-line blocking • Suppose switch fabric supports one packet per cycle sent to any output • Blue packet still blocked despite available capacity

  25. Virtual output queuing • On each input port, one input queue per output port • Input port places packet in virtual output queue (VOQ) corresponding to output port of forwarding decision • No head-of-line blocking, no output queuing • Need to schedule fabric Output ports (3)

  26. Virtual output queuing [Image: N. McKeown]

  27. Today • Network address translation (NAT) • Inside internet routers • Architecture • Crossbar scheduling: iSLIP algorithm • Longest-prefix lookup: Luleå algorithm

  28. Crossbar scheduling algorithm: goals • High throughput • Low queue occupancy in VOQs • Sustain 100% of rate R on all n inputs, n outputs • Starvation-free • Don’t allow any one virtual output queue to be unserved indefinitely • Speed of execution • Should not be the performance bottleneck in the router • Simplicity of implementation • Will likely be implemented on a special purpose chip

  29. iSLIP algorithm: Introduction • McKeown, 1999 • Model problem as a bipartite graph • Input port = graph node on left • Output port = graph node on right • Edge (i, j) indicates packets in VOQ Q(i, j) at input port i • Scheduling = a bipartite matching (no two edges connected to the same node) Request graph Bipartite matching

  30. iSLIP: High-level overview • iSLIP computes maximal bipartite matching • Every packet time, algorithm restarts • Number of iterations/cell (packet) • Each iteration consists of three phases: • Request phase: all inputs send requests to outputs • Grant phase: all outputs grant requests to some input • Accept phase: input chooses an output’s grant to accept

  31. iSLIP: Accept and grant counters • Each input port i has a round-robin accept counter ai • Each output port j has a round-robin grant counter gj • Round robin counter: 1, 2, 3, …, n, 1, 2, … a1 g1 g3 4 4 1 1 a3 g2 a2 g4 3 3 2 2 a4

  32. iSLIP: One iteration in detail • Request phase • Input sends a request to all backlogged outputs • Grant phase • Output j grants the next request grant pointer gj points to • Accept phase • Input i accepts the next grant its accept pointer ai points to • For all inputs k that have accepted, increment then ak a1 g1 4 4 1 1 g2 a2 3 3 2 2 g3 a3 g4 a4

  33. iSLIP example • Two inputs, two outputs • Input 1 always has traffic for outputs 1 and 2 • Input 2 always has traffic for outputs 1 and 2 • All accept and grant counters initialized to 1 • One iteration per cell time 1 1 2 2 1 1 1 1 a1 a2 g2 g1 2 2 2 2

  34. iSLIP example: Cell time 1 1 1 Request phase 2 2 1 1 1 1 1 1 1 1 1 1 g1 a1 a2 a2 g2 g1 g2 g2 a2 a1 2 2 2 2 2 2 2 2 2 2 Grant phase 1 1 1 1 a1 g1 2 2 2 2 Accept phase 1 1 2 2

  35. iSLIP example: Cell time 2 Request phase 1 1 1 1 1 1 1 1 1 1 2 2 2 2 a1 a2 a2 g1 g2 g2 2 2 2 2 2 2 Grant phase 1 1 1 1 1 1 a2 g1 a1 a1 g1 g2 2 2 2 2 2 2 1 1 Accept phase 2 2

  36. iSLIP example: Cell time 3 Request phase 1 1 1 1 1 1 1 1 1 1 2 2 2 2 g2 a2 a1 g1 a2 g1 2 2 2 2 2 2 Grant phase 1 1 1 1 1 1 a1 g2 a2 g2 g1 a1 2 2 2 2 2 2 1 1 Accept phase 2 2

  37. Implementing iSLIP Accept arbiters Request vector: Grant arbiters Decision vector: 1 1 0 0 0 r11 = 1 r21 = 1 r12 = 1 r22 = 1 1 1 0 1 2 2 0 1 1 1 2 2 2 Request phase Grant phase Accept phase

  38. Implementing iSLIP: General circuit

  39. Implementing iSLIP: Inside an arbiter Highest priority Incrementer

  40. Today • Network address translation (NAT) • Inside internet routers • Architecture • Crossbar scheduling: iSLIP algorithm • Longest-prefix IP lookup: Luleå algorithm

  41. The IP lookup problem • Given incoming packet with IP address x, choose output port number outport(x) to deliver packet • Then will configure switching fabric to connect inport(x)  outport(x)

  42. Radix tree • Binary tree; internal nodes indicate which bit positions to test • Leaves contain key (IP address) and mask (# of significant bits) • NetBSD PATRICIA trees similar 0 0 1 1 18 key=128.32.0.0 mask=0xffff 0000 29 key=128.32.33.5 (host) key=128.32.33.0 mask=0xffff ff00 key=0.0.0.0 mask=0x0000 0000 31 0 key=127.0.0.0 mask=0xff00 0000 Key=127.0.0.1 (host)

  43. Radix tree • Search 127.0.0.1, leading to a matching route for host 127.0.0.1 • Tests minimum number of bits required to differentiate 127.0.0.1 0x7f00 0001 0 0 1 1 18 key=128.32.0.0 mask=0xffff 0000 29 key=128.32.33.5 (host) key=128.32.33.0 mask=0xffff ff00 key=0.0.0.0 mask=0x0000 0000 31 0 key=127.0.0.0 mask=0xff00 0000 Key=127.0.0.1 (host)

  44. Radix tree • Search 128.32.33.7 leading to a route specific to host 128.32.33.5 • For longest-prefix match, need to backtrack from leaf 128.32.33.7 0x8020 2107 0xffff ff00 0 0 1 1 18 key=128.32.0.0 mask=0xffff 0000 29 key=128.32.33.5 (host) key=128.32.33.0 mask=0xffff ff00 key=0.0.0.0 mask=0x0000 0000 31 0 key=127.0.0.0 mask=0xff00 0000 Key=127.0.0.1 (host)

  45. Luleå algorithm: Motivation Degermark et al., “Small forwarding tables for fast routing lookups” in Proc. of SIGCOMM ‘97 • Large routing tables • Patricia (NetBSD), radix (4.4 BSD) trees • 24 bytes for leaves • Size: 2 Mbytes  12 Mbytes • Naïve binary tree is huge, won’t fit in fast CPU cache memory • Memory accesses are the bottleneck of lookup • Goal: minimize memory accesses, size of data structure • Design for 214 ≈ 16K different next-hops • Method for compressing the radix tree using bit-vectors

  46. Luleå algorithm • CIDR longest prefix match rule: e2 supersedes e1 • Divide a complete binary tree into three levels • Level 1: one big node representing entire tree ≤ depth 16 bits • Levels 2 and 3: chunks describe portions of the tree • The binary tree is sparse, and most accesses fall into levels 1 and/or 2 0 Bit offsets e1 32 e2 IP address space: 232 possible addresses

  47. Luleå algorithm: Level 1 • Covers all prefixes of length ≤ 16 • Cut across tree at depth 16 ➛ bit vector of length 216 • Root head = 1, genuine head = 1, member of genuine head = 0 • Divide bit vector into 212bit masks, each 16 bits long Genuine head Root head One bit mask: … …

  48. Luleå algorithm: Level 1 • One 16-bit pointer per bit set (=1) in bit-mask • Pointer composed of 2 bits of type info; 14 bits of indexing info • Genuine heads:index into next-hop table • Root heads: index into array of Level 2 (L2) chunks • Problem: given an IP address, find the index pixinto the pointer array Head information stored in pointer array: 2 14 Next-hop table: pix L2 chunk One bit mask: … …

  49. Luleå: Finding pointer group • Group pointers by 16-bit bit masks; how many bit masks to skip? • Recall: Bit vector is 216 total length • Code word array code(212 entries) • One entry/16-bit bit mask, so indexed by top 12 bits of IP address • 6-bit offset six: num/ptrs to skip to find 1st ptr for that bit mask in ptr array • Four bit masks, max 4 × 16 = 48 bits set, 0 ≤ six ≤ 63, so value may be too big • Base index array base (210 entries) • One base index per four code words: num/ptrs to skip for those four bit masks • Indexed by top 10 bits of IP address 16 base: 0 13 210 six ten code: 0 3 212 10 11 0 e.g. bit vector: 1000100010000000 10111000100001010 1000000000000000 1000000010000000 1000000010101000…

  50. Luleå: Finding pointer group • Extract top 10 bits from IP address: bix • Extract top 12 bits from IP address: ix • Skip code[ix].six + base[bix] pointer groups in the pointer table

More Related