440 likes | 445 Views
Computer Networks (CS 778). Chapter 3: Packet Switching data over long distances (not just 1 link). 2 Packet Switching approaches: connection-oriented and connectionless Forwarding or Switching: Routing packets from an input to the right output.
E N D
Computer Networks (CS 778) Chapter 3: Packet Switching data over long distances (not just 1 link). • 2 Packet Switching approaches: connection-oriented and connectionless • Forwarding or Switching: Routing packets from an input to the right output. • Key problem a packet switch must deal with is finite bandwidth of its outputs. • Contention: Packets arrive for an output faster than its capacity (buffered) • Congestion: Switch runs out of buffer space. • This chapter deals with forwarding and contention in packet switches • LAN Switching & ATM Switching: 2 main network packet switching technologies.
Switching/Forwarding(layer? OSI network; Internet IP; ATM).Switch: multiInput, multiOutput devise which transfers packets from an input to 1 output. (called switching or forwarding).Assume bi-direction links (in a wire world, a link is an inPort-outPort pair). Switchs add star topology to the topologies we’ve seen so far (pt-pt, bus, ring). Star allows a hierarchy and virtually unlimited size. Stars are scalable (adding host to a switch w/o decreasing performance for others, assuming switch “backplane” bandwidth is sum of link bandwidths)How does a switch decide which output to place packet on? Several approaches: Datagrams (connectionless approach)Virtual Circuits (connection-oriented approach)Source Routing (simple approach – less common than the other two)(Switch below has two T3 links and one STS-1 SONET link)
Datagrams (and datagram networks) • No setup phase (connectionless model) • Each packet contains full destination address • Hosts never know if the network can deliver or even if the destination can receive. • Forwarding tables: eg, at SW2: A,C,D->3 B,G,H->0 E->2 • Table creation (Cpt4) is hard: topology may change or there may be multiple paths • E.g., successive packets from A to B can follow different paths • A switch or link failure may not preclude communication. • tables may be updated to route around the failure. • This capability goes back to the ARPANET (forerunner of the internet) • Since it was a military network, this capability was essential.
Virtual Circuit (VC) Switching (connection-oriented) • Setup phase: Establishes a connection-state in each switch along connection–by the • System Admin: for long-lived permanent virtual circuit or PVC or by • Host: send setup req into net switched virtual circuit or SVC(signaling), Transfer Phase Teardown Phase. • All packets follow the same circuit (Analogous to phone calls) • Each switch keeps a VC table with VC state-entries: | inPort | inVCI | outPort | outVCI |(VCI=VC-identifier) • Combination of (inPort, inVCI) uniquely identifies VC thru a particular link. • VCI’s are not globally unique (in fact, inVCI & outVCI usually differ) • VCIs have link-local scope. WHY??
Virtual Circuit (Continued) • For a PVC, Network Administrator picks unused VCI for each link (e.g., 5,11,7,4): inPortInVCIoutPortoutVCI • VC-table entries at each switch are: SW1: 2 5 1 11 SW2: 3 11 0 7 SW3: 0 7 3 4
Virtual Circuit (Continued) • How is signaling done for SVCs? (setup communication) • hostA sends SetupMessage (SM) to SW1 (with at least hostA, hostB addrs) • SM flows on SW2 -> SW3 -> hostB (How? Routing details later.) • Each SW sets table entry: (inPort, inVCI, outPort,__) (chooses an unused InVCI) • Note that the switch (or host for the last link) chooses the InVCI for the link coming into it) inPortInVCIoutPortoutVCI • VC-table entries would be SW1: 2 5 1 SW2: 3 11 0 SW3: 0 7 3
Virtual Circuit (Continued) • hostB gets SetupMess (SM) If willing to accept connection, attaches OutVCI=4 to ack. • Sends ack downstream: hostB - > SW3 - > SW2 - > SW1 - > hostA. • Each SW completes VC-table entry, sends ack with appropriate link-VCI, inPortInVCIoutPortoutVCI • VC-table entries would be SW1: 2 5 1 11 SW2: 3 11 0 7 SW3: 0 7 3 4 • SW1 sends ack to hostA specifying VCI=5. The setup phase is complete. • Second stage is data transfer. Third stage is connection teardown (when done sending) • HostA sends teardown message (TD) to SW1 (SW1 removes table entry) • TD is sent SW1 - > SW2 - > SW3 - > hostB, each SW does similarly.
Virtual Circuit (versus Datagram) Virtual Circuit Model: • Typically 1 RTT (setup) before 1st data packet is sent. • Data packets have only a small identifier (setup mess has full destination address) (per-packet header overhead is small) • If switch/link fails, connection is broken and a new one needs to be set up. • Host reserves resources at setup, gets much info (net is able to transmit, dest is able to receive • VCI service is local (no global server - involving constant communication overhead) • The most popular VC technologies are: • OSI X.25 uses VC model in a 3 part strategy: • Buffers are allocated along the VC when circuit is initialized. • Sliding window is run between pairs of VC nodes for error correction (and flow ctrl) • Circuit setup is rejected by any node with insufficient buffer availability • called hop-to-hop flow control • Thus, there is contention, but never congestion. • Frame Relay is a straight-forward implementation of VC technology. Extremely popular due to its simplicity (Frame Relay PVCs provide almost leased-line-like service) • Some basic QoS and Congestion-avoidance is provided, but it’s minimal. • ATM (coming up in 3.3)
Datagram (versus Virtual Circuit) Datagram Model: • There is no round trip time delay waiting for setup. (Host can send data when ready.) • Source doesn’t know if network can deliver packet or even if the intended destination is up and accepting packets. • Since packets are treated independently, it is possible to route around link/node failures. • Since every packet must carry the full destination address, per packet overhead is higher than for the connection-oriented model.
Source Routing • Uses neither Virtual Circuits nor conventional datagrams. • Address contains entiresequence of out-Ports on source-to-destination path. • List is rotated so the next out-Port is always in front. • Problems? May be difficult for source to know route. Header must be variable size. • Alternatives to rotating OutPort Addresses: • Stripping: Each SW strips off its outPort (eg, 3,0,1 to 3,0 at SW1) • Out-Port Pointer in fixed position in header (eg, |ptr| 3 | 0 | 1 | -- > |ptr| 3 | 0 | 1 | at SW1 ). ` - - - - - ^ ` - -^
Source Routing (continued) • Source routing can be used in both datagram networks or VC networks. • Internet Protocol includes source routing option. • Selected packets can be source routed. • However, the majority are switched datagrams. • Some VC nets use source routing to get VC setup request along path. • Source routing suffers from poor scalability • (Hard for a host to know the complete route in large net).
I/O bus CPU Interface 1 Interface 2 Interface 3 Main memory Performance A switch can be built from a general-purpose workstation. In fact, Unix provides this capability in the kernel (We will consider special-purpose switch hardware later.) • Install multiple NICs (Network Interface Cards). • Use DMA for transferring packets between MM and the NICs. • Build and manage your own buffers. • CPU needs to inspect only the header information to determine out-Port. • Usually the bottleneck is I/O bus bandwidth (all packets must go thru I/O bus) • Such a switch will have severe limitation on aggregate back-plane bandwidth.
Forwarding vs Routing Forwarding: select outPort based on dest-address and forwarding table Routing: process by which forwarding table is built. • Bridge: A forwarding-switch (between LANS, eg Ethernets..) • AKA: LAN switch or LAN Bridge • For ethernets, one could use a repeater (to forward signal), but they impose size limitations. • Could implement using node in promiscuous mode between 2 Ethernets (forwarding all packets • Intelligent bridge (learning bridge) don’t forward all packets (use forwarding table: Host > Port Starts empty; for each packet received, record sender’s port. If host is not in table, forward to all ports (table is just a filter). • All entries timeout after a fixed time to protects against inaccuracies due to host removal. • Loops can form (causing frames to loop forever). Thus, bridges run distr spanning tree alg • Think of the bridge-extended LAN as a graph (vertexes=bridges, edges=connections). • Spanning tree is acyclic sub-graph which covers (spans) all vertexes. • Network as a Graph:
Asynchronous Transfer Mode (ATM) • Connection-oriented, packet-switched network –Virtual Circuit • Used for both WANs & LANs (but predominantly in long haul WANs today • Specified by ATM Forum (www.atmforum.org) • Commonly transmits over SONET at the physical level (but not a requirement) • QoS capabilities are one of the strong selling points. • Fixed length Packets = 53 byte cells: 5-byte header + 48-byte payload. • When any VC is set up, dest address must appear in signaling message. • ATM uses 1 of several dest addr formats (different from MAC addr in LANs) • Two examples (detail later) • NSAP (Network Service Access Point) • E.164 • 48 byte payload was a compromise (US bid for 64B and Europe bid for 32B.)
Part of the B-ISDN standard of ITU in 1984. B-ISDN was motivated by PCs demanding higher bandwidths and lower error rates - was to replace separate Telephone network infrastructure & data networks - was to allow integration on one digital network fabric. - was to scale to gigabit speeds -was to provide a flexible way to divide bandwidth into chunks for different traffic 1988 ITU chose ATM as underlying switching/multiplexing technology for B-ISDN. 1991 ATM Forum was founded to replace ITU as the standards body for ATM. Planned Benefits of ATM: - Efficient use of network bandwidth (bandwidth on demand) - Scalability (LAN-WAN, # of users, speed) - Low latency and low latency variation (virtual circuit and pre-negotiated QoS) - Transparency to existing Applications - Integrated Service - Internetwork-able with existing WANs - Support both constant and variable bit rates A little history on ATM
Cells (Variable versus Fixed-Length? Size?) Fixed-length easier to switch in hardware, simpler, but no optimal length • if small: header-to-data overhead is high • if large: low utilization for small messages • Small size provides a finer-grained pre-emption point for scheduling a link, e.g., • maximum packet = 4KB = 4096 bytes • link speed = 100Mbps • transmission time = 4096 x 8 bits/packet / 100 = 327.68μs / packet • Thus, a high priority packet may sit in the queue for 327.68μs • in contrast, 53 x 8 / 100 = 4.24μs / packet for ATM • Near cut-through behavior, e.g., • two 4KB packets arrive at same time • link idle for 327.68μs while both arrive • at end of 327.68μs, still have 8KB to transmit • in contrast to 53-byte cells where host can transmit first cell after 4.24μs and at the end of 327.68μs, there would be just over 4KB left in queue
Cell Format • User-Network Interface (UNI) (cell format shown above) (host-to-switch format) • GFC Generic Flow Control (Intended for traffic ctrl across user-net interface. Not used) • VCI Virtual Circuit Identifier • VPI: Virtual Path Identifier (size goes to 12 bits for NNIs (when GFC goes away) • Type: • 1st bit: specifies management versus data cells • 2nd bit: (for data cells) EFCI (Effective Forward Congestion Indicator) set by switches about to become congested. • 3rd bit: user signalling (used in conjunction with AAL-5 to delineate frames) • CLP: Cell Loss Priority • Set by source host if cell can be dropped without serious damage to message • HEC: Header Error Check (CRC-8) • Network-Network Interface (NNI) • switch-to-switch format • GFC becomes part of larger VPI field
ATM Model _VOICE VIDEO DATA_ | ATM Adaptation Layer (AAL)| | ATM Layer | | Physical Layer | • Physical Layer • physical interfaces and framing protocols • Several ATM Forum specs for physical connectivity between devices: • DS-1 or T1 at 1.54 Mbps • DS-3 or T3 at 45 Mbps • 100 Mbps access using FIDDI standard • 155 Mbps access using Fiber Channel standard on multimode fiber • SONET (nonUS=SDH, Synchronous Digital Hierarchy - single/multimode fiber at N*51.84 • SONET is predominant physical layer LEVEL LINE-RATES OC-1 51.84 Mbps OC-3 155.52 Mbps OC-12 622.08 Mbps OC-48 2488.32 Mbps
ATM Adaptation Layer (AAL) AAL is the Interface between user applications and the ATM Layer • Performs SAR, segmentation of packets into ATM celss and reassembly of ATM cells into packets. • Also detects and handles out of order or lost cells. • Supports ATM Application Level Service Classes • CBR (Constant Bit rate) Reserves a set bandwidth end-to-end. • VBR (Variable Bit Rate) bursty traffic (realtime, non-rt; Reserves amt of variable bdwd) • ABR (Available Bit Rate) Min bandwidth and burst above it w/o cell loss. • UBR (Unspecified Bit Rate) best effort service similar to Internet Serv ClassTraffic descriptors (at Call Setup) QoS ParametersIntended Uses CBR PCR (Peak Cell Rate) CTD (Cell Transfer Delay) realtime Video CDV (Cell Delay Variation) Voice CLR (Cell Loss Ratio) rt-VBRPCR (Peak Cell Rate) Maximum CTD compressed Voice SRC (Sustained CR)) peak-to-peak CDV compressed Video MBS (Max Burst Size) CLR (Cell Loss Ratio) rt-OLTP ABRPCR (Peak Cell Rate) CLR (Cell Loss Ratio) RPC MCR (Min Cell Rate) NFS/DDBMS UBRPCR (Peak Cell Rate) FTP (file trans) Four AAL protocols were originally defined (AAL-1, AAL-2, AAL-3, AAL-4), then AAL-3 and AAL-4 were merged into AAL-3/4, then AAL-5 was added.
Segmentation and Reassembly User Packets ATM Adaptation Layer (AAL) • AAL 1,2 designed for apps needing guaranteed rate (voice, video; CBR, rt-VBR) • AAL 3/4 designed for packet data (nrt-VBR) • AAL 5 alt standard for packet data (LAN traffic; connection/connectionless VBR) ATM Cells
Segmentation and Reassembly (details) (Convergence sublayer of the AAL layer provides an interface to the application) (SAR sublayer converts messages to cells)
AAL-1 • AAL-1 is the protocol used for real-time, constant-bit-rate, connection-oriented traffic • E.g., Uncompressed audio and video • Bits are fed in by the application at a constant rate and must be delivered at the same rate with minimumdelay, jitter(variation in rate) and overhead • One byte (or two) of ATM payload is used for control information • P-cells are used when message boundaries must be preserved (Pointer gives the offset to the start of the next message in number of bytes) • SN is the cell sequence number • SNP cell sequence number checksum (CRC-3), Even parity bit further reduces liklihood of bad SN)
AAL-2 • AAL-2 is the protocol used for compressed, constant-bit-rate, connection-oriented traffic • E.g., Compressed audio and video • Bit rate can vary strongly over time • One byte (or two) of ATM payload is used for control information • SN is the cell sequence number • IT stands for Information Type and is used to indicate that the cell is the start/middle/end of message • LI is the length indicator (tells how bit the payload is in bytes (could be less than 45) • CRC is a checksum for the entire cell AAL-2 Cell Format
AAL 3/4 8 8 16 < 64 Kbytes 0-24 8 8 16 CPI Btag BASizeUSER DATA Pad 0 Etag Length Convergence Sublayer Protocol Data Unit (CS-PDU = AAL3/4 packet) CPI: common part indicator (CS-PDU version); Btag/Etag: begin/end tag BASize (Buffer size hint) User-data (AAL var len payload) Length: PDU size Originally ITU had different protocols for connection-oriented and Connectionless service for data transport, ie, sensitive to loss and errors but not time dependent. Then they discovered there was no need for 2 protocols so conbined into AAL-3/4 which can operate in stream (no message bddry maintained) or message mode and provide both reliable and unreliable transport as well as multiplexing(not available in any of the others) which allows a host the option of multiplexing multiple sessions onto one VC (saves money, since charging is done by the VC): 40 2 4 10 352 (44 bytes) 6 10 bits ATM header Type SEQ MIDCell PayloadLength CRC-10 ATM Cell AAL3/4 format:Type (BOM/EOM: begin/end of message COM: continuation of message) SEQ: sequence number; MID: message id, AAL3/4 Payload=44B (4B of standard ATM payload for 6 special AAL3/4 fields: Type, SEQ, MID, Length, CRC-10) Length: # of PDU bytes cell | CS-PDU-Header |U S E R D A TA U S E R D A T A | CS-PDU-trailer | | | | | Segmentation: V V V V |ATM-header|AAL-header| Cell-Payload |AAL-trailer| |||pyld|| |||pyld|| … |||pyld padding||
AAL5 < 64 KB 0-47B 16 16 32 USER DATA padReserved Length CRC-32 Convergence Sublayer Protocol Data Unit CS-PDU Format (AAL5 packet format) pad so trailer falls at end of ATM cell Reserved for higher layer sequencing / multiplexing Length: size of PDU (data only – padded to be a multiple of 48bytes) CRC-32 (detects missing or misordered cells) Cell Format - same as AAL3/4 except: end-of-PDU bit in Type field of ATM header |U S E R D A TA U S E R D A T A |pad| CS-PDU-trailer | | Segmentation: V |ATM-header|Cell-Payload |||pyld| ||pyld| … ||pyld| AAL-1 trhu AAL-3/4 were designed by the telecom industry without much input from the computer industry. When the computer industry woke up and realized the implications of, complexity and inefficiency of two headers (2 layers) and the short checksum (10 bits) they invented their own AAL protocol, AAL-5. It was originally called SEAL for Simple Efficient Adaptation Layer. It offers several service options: 1. reliable service (guaranteed delivery and flow control) 2. Unreliable service (no guaranteed delivery – best effort)
VPI/VCI • Host: treat VPI/VCI together as a 24-bit circuit identifier • A Switch that routes many VCs between company sites can use one VPI instead of many VCIs • Makes the Virtual Circuit Tables smaller and makes addressing faster. • Network: A VP aggregates multiple circuits into 1 path
ATM in the LAN • Problem: In common shared-media LANsmulticast/broadcast is easy since every node is connected to the same link. (e.g., Ethernet, Token-Ring) • Protocols were built to take advantage of easy broadcast (eg, Addr Resolution Protocol=ARP) • Two Solutions: • Redesign Protocols that make LAN assumptions which are not true of ATM • E.g., ATMARP doesn’t depend on broadcast • Make ATM behave more like a shared-media LAN (eg, support broadcast/multicast without losing performance advantages of switched network. I.e., add functionality to ATM LANs so anything that runs over sharedmedia LAN runs on ATM LAN • Called LAN Emulation or LANE
LANEterms & addresses are confusing(host/brdige/router=LANE Emulation Client =LECs) LANE must provide, e.g., 48-bit MAC addresses to emulate Ethernet. VCI is very different from an address (need addr for setup, then VCI used for transit) For LANE, ATM switches don’t change, LANE has additional servers(at hosts?)LECS: LAN Emulation Config Serv (New LEC finds LECS: gets LANE info, frame size, LES adr)LES: LAN Em Serv (New LEC sends MAC & ATM addrs to LES. LES gives ATM addr of BUS)BUS: Broadcast & Unknown Server (maintains pt-multipt VC to all clients for broadcasting)
Input Output port port Output Input port port Fabric Output Input port port Output Input port port Switching Hardware Overview • Terminology: n x m switch has n inputs and m outputs • (usually n=m, but not always) • Design Goals • High throughput • Scalability(with respect to n) • Ports and Fabrics • Port • Contains Electric or Optic receivers and transmitters, Provides buffers for packets (cells) waiting to be switched or transmitted, contains circuitry. • InPort determines and attaches outPort# (in predominant case of self-routing fabric) • InPort is the first place to look for performance bottlenecks. • InPort deals with complexities of the outside world so fabric has simple job: • Fabric • Deliver presented packet to the right output. (as simply as possible) • May do buffering also (internal buffering fabric).
Buffering (and Head-of-line blocking) • Head-of-line blocking: E.g., when InPort buffers have head-of-line cells in a FIFO queues destined for the same OutPort, while cells behind them wait unnecessarily (destined for other OutPorts). • Can reduce throughput down to 59% (assuming uniformly distributed arrivals). • Majority of switches use pure outPort or mixed internal/outPort buffering. • Buffering is also important wrt QoS (can’t always use simple FIFO, Chpt 6) • Buffering is needed wherever contention is possible • input ports (contending for fabric) • Internal fabric buffers (contending for output port) • output ports (contending for links)
Conceptually simple (Every input connected to every output) Only possible contention problem is OutPort contention. Complexity of an OutPort grows faster than the number of InPorts. Complexity of switch n2 Designing a switch with low OutPort complexity is difficult. Knockout Switch is one such. (next slide) Crossbar Switch 4X4 crossbar:
Inputs D D D D D D D D D D D D D D 1 2 3 4 Outputs Knockout Switch(not-quite-perfect crossbar)8-to-4 knockout concentrator • Perfect crossbar can route packets from all n inports to 1 outport concurrently. • n-by-l Knockout Concentrator: • OutPort can accept l packets • Pick l small enough to keep costs low • Pick l large enough for hotspots • InPort where arrivals concentrate • E.g., popular website.. • Each OutPort has 3 parts: • Filters (recognize packets for this port) • Concentrator (picks l packets, discard rest • Hard job – needs to be fair • Losers go to the next section. • Winner beats all others in a section: section1 section2 section3 section4 • Queue of length l at each OutPort for accepted packets that are as yet untransmitted
Shifter (a) Buffers Shifter (b) Buffers Shifter (c) Buffers Knockout Switch Output Port Buffer • Each OutPort has l separate buffers • Buffers are filled round-robin (by a shifter) • Occupancy levels always within 1 of each other • Buffers are emptied in round-robin fashion • Preserving arrival order • A) 3 packets arrive • B) 3 packets arrive, 1 leaves • C) 1 packet arrives, 1 leaves.
Shared Media Switches Examples include switches built from PCs (sharing PC bus and memory) Tend to scale poorly (shared resources get overloaded as switching task grows) Nice aspect is large shared buffer space built using COTS parts better utilization possible. Writes only 1 packet to memory at a time. Mux-to-memory bus must be n times faster than link speed. Arriving packet: header is stripped and goesWrite-ctrl logic which gets a memory address from a freelist, writes the packet to that address, adds the address to the appropriate outPort list Read-ctrl takes packets from outPort lists sends to outPort thru demux returns memory address to the freelist.
Self-Routing Fabrics BANYAN Route: 0up, 1down on:left-bit middle-bit right-bit • Banyan Network • Constructed from simple 2 x 2 switching elements as above • InPort attaches self-routing header = Binary_OutPort# • OutPort removes it • Only one path exists from a given input to a given output. • No collisions if inputs are pre-sorted into ascending order • Complexity: n log2 n (n/2 switching elements per stage and log2 n stages)
Banyan Switch exampleThe route two cells take through the switch.6 = 110 (down, down, up)1 = 001 (up, up, down)
Banyan Switch examplesCell collisions on the left, e.g., 5&7; 0&3; 6&4; 2&1. And 2 in middle due to the fact that the inputs are not ordered (assume lesser is taken). Collision-free routing on the right (inputs are ordered)If the cells are sorted by destination andpresented on input lines, 0,2,4,6, 1,3,5,7; then there will be no collisions.
Batcher Network • switching element that sorts inputs (1 path from each In to each Out) • some elements sort into ascending order ( ) • some elements sort into descending order ( ) (if only 1 cell go opposite arrow) • elements arranged to implement merge sort • complexity: n log2 n • Common Design: Batcher-Banyan Switching Fabric
Batcher-Banyan Switches Batcher-Banyan would have to drop packets whenever 2 are headed for the same OutPort. There are switches that deal with this problem. First came Starlite in 1984, Moonshine Switch in 1987, Sunshine Switch in 1991. They differ only in the way their trap component works. The l banyans allow accepting up to l packets destined for any one port at a time (selector makes sure they go each to a different banyan and sends any extras to Delay for recycling). The Trap identifies the extras for Selector to recycle.
Routing software w/ router OS Line card (forwarding buffering) Routing CPU Buffer memory Line card (forwarding buffering) Line card (forwarding buffering) Line card (forwarding buffering) High-Speed IP Routers • Switch (possibly ATM) • Line Cards + Forwarding Engines • link interface • router lookup (input) • common IP path (input) • packet queue (output) • Network Processor • routing protocol(s) • exceptional cases
PC PC PC PC PC PC NI with uP NI with uP NI with uP NI with uP NI with uP NI with uP CPU CPU CPU CPU CPU CPU . . . . . . . . . . . . . . . . . . MEM MEM MEM MEM MEM MEM NI with uP NI with uP NI with uP NI with uP NI with uP NI with uP Alternative Design Crossbar Switch