400 likes | 563 Views
CSE 30264 Computer Networks. Prof. Aaron Striegel Department of Computer Science & Engineering University of Notre Dame Lecture 14 – February 23, 2010. Today’s Lecture. Advanced Routing Multicast MPLS Transport UDP TCP. Application. Transport. Network. Data. Physical.
E N D
CSE 30264Computer Networks Prof. Aaron Striegel Department of Computer Science & Engineering University of Notre Dame Lecture 14 – February 23, 2010
Today’s Lecture • Advanced Routing • Multicast • MPLS • Transport • UDP • TCP Application Transport Network Data Physical CSE 30264
Multicast & MPLS Outline Multicast for LS Multicast for DV Protocol Independent Multicast MPLS Lecture notes unplugged CSE30264
Process Groups • Any set of processes that want to cooperate • Processes can join/leave a group • A process can belong to many groups • Groups can be either open or closed • Use multicast rather than point-to-point messages • group name (address) provides a useful level of indirection • Example uses • data dissemination (e.g., news) • replicated servers CSE30264
Multicast Addresses • Subrange of IP address space reserved for MC (class D for IPv4) • IPv4: 28 bits of possible MC addresses • Ethernet: uses 23 bits for multicast • Mapping 28 bits onto 23 bits: 32 IP addresses map into each one of the Ethernet addresses • Ethernet host joins IP MC group by configuring device to receive Ethernet MC address. IP at host must inspect packet if actually directed to this host CSE30264
Multicast Routing: LS • Each host on a LAN periodically announces the groups it belongs to using IGMP • Augment update message (LSP) to include set of groups that have members on a particular LAN • Each router uses Dijkstra’s algorithm to compute shortest-path spanning tree for each source/group pair • Each router caches tree for currently active source/group pairs CSE30264
Multicast Routing: DV • Reverse Path Broadcast • Each router already knows that shortest path to S goes through router N • When receive multicast packet from S, forward on all outgoing links (except one it arrived on), iff packet arrived from N • Eliminate duplicate broadcast packets by letting only “parent” for LAN (relative to S) forward • shortest path to S (learn from distance vector) • smallest address to break ties CSE30264
DV (cont) • Reverse Path Multicast • Goal: prune networks that have no hosts in group G • Step 1: determine if LAN is a leaf w/o members in G • leaf if parent is only router on the LAN • determine if any hosts are members of G using IGMP • Step 2: propagate “no members of G here” information • augment (destination, cost) update sent to neighbors with set of groups for which this network is interested in receiving multicast packets • only happens when multicast address becomes active CSE30264
Protocol Independent Multicast • PIM: sparse mode (PIM-SM) and dense mode • Routers join/leave groups: Join/Prune messages • Rendezvous Point (RP) for each group • Shared trees and source-specific trees CSE30264
PIM CSE30264
Multiprotocol Label Switching • MPLS: • enable IP capabilities on devices that do not have capability to forward IP datagrams in normal manner. • forward IP packets along ‘explicit routes’. • support certain types of virtual private network services. CSE30264
10 .1.1 /2 4 R3 1 0 0 R1 R2 10 .3.3 /2 4 Pr efix Interface Pr efix Interface R4 10 .1.1 0 10 .1.1 1 10 .3.3 0 10 .3.3 0 ■ ■ ■ ■ ■ ■ Destination-Based Forwarding CSE30264
10.1.1/24 Label = 15, Prefix = 10.1.1 R3 1 0 0 R1 R2 10.3.3/24 R4 Prefix Interface Label Prefix Interface 10.1.1 0 15 10.1.1 1 10.3.3 0 16 10.3.3 0 ■ ■ ■ ■ ■ ■ (a) 10.1.1/24 R3 1 R1 R2 0 0 10.3.3/24 R4 Remote Label Prefix Interface Prefix Interface Label 10.1.1 0 15 15 10.1.1 1 16 10.3.3 0 10.3.3 0 16 ■ ■ ■ ■ ■ ■ (b) Label Distribution Protocol CSE30264
Label = 24, Prefix = 10.1.1 10.1.1/24 R3 1 0 0 R1 R2 10.3.3/24 R4 Remote Remote Label Prefix Interface Label Prefix Interface Label 10. 1. 1 0 15 15 10.1.1 1 24 16 10.3.3 0 10. 3. 3 0 16 ■ ■ ■ ■ ■ ■ Label Distribution Protocol CSE30264
(a) ATM cell GFC VPI VCI PTI CLP HEC DATA header Label “ Shim header (b) “ (for PPP, Ethernet, PPP header Label header Layer 3 header etc.) Label Switching Routers CSE30264
Benefits CSE30264
Explicit Routing • “Fish” network • Resource Reservation Protocol (RSVP) CSE30264
Mid-Term Exam • </FirstHalfMaterials> • Everything through multicast / MPLS on first exam • Exam brief discussion • Shift two page notes, front / back • Key points from Wiki could be very helpful • Take home quiz • Short answer • Computation / work • Discussion / ponder CSE 30264
Reliable Byte-Stream (TCP) Outline Connection Establishment/Termination Sliding Window Revisited Flow Control Adaptive Timeout CSE30264
End-to-End Protocols • Underlying best-effort network • drop messages • re-orders messages • delivers duplicate copies of a given message • limits messages to some finite size • delivers messages after an arbitrarily long delay • Common end-to-end services • guarantee message delivery • deliver messages in the same order they are sent • deliver at most one copy of each message • support arbitrarily large messages • support synchronization • allow the receiver to flow control the sender • support multiple application processes on each host CSE30264
Simple Demultiplexor (UDP) • Unreliable and unordered datagram service • Adds multiplexing • No flow control • Endpoints identified by ports • servers have well-known ports • see /etc/services on Unix • Header format • Optional checksum • pseudo header + UDP header + data CSE30264
UDP CSE30264
Application process Application process Write Read bytes bytes TCP TCP Send buffer Receive buffer ■ ■ ■ Segment Segment Segment Transmit segments TCP Overview • Full duplex • Flow control: keep sender from overrunning receiver • Congestion control: keep sender from overrunning network • Connection-oriented • Byte-stream • app writes bytes • TCP sends segments • app reads bytes CSE30264
Segment Format CSE30264
Segment Format (cont) • Each connection identified with 4-tuple: • (SrcPort, SrcIPAddr, DstPort, DstIPAddr) • Sliding window + flow control • ACK, SequenceNum, AdvertisedWindow • Flags • SYN, FIN, RESET, PUSH, URG, ACK • Checksum • pseudo header + TCP header + data CSE30264
Connection Establishment Active participant Passive participant (client) (server) SYN, SequenceNum = x SYN+ACK, SequenceNum=y, Acknowledgment =x+1 ACK, Acknowledgment =y+1 CSE30264
Connection Termination First participant Second participant FIN, SequenceNum = x ACK, Acknowledgment=x+1, FIN, SequenceNum = y, Acknowledgment = x+1 ACK, Acknowledgment =y+1 CSE30264
State Transition Diagram CSE30264
Sliding Window Revisited • Sending side • LastByteAcked < = LastByteSent • LastByteSent < = LastByteWritten • buffer bytes between LastByteAcked and LastByteWritten • Receiving side • LastByteRead < NextByteExpected • NextByteExpected < = LastByteRcvd +1 • buffer bytes between LastByteRead and LastByteRcvd CSE30264
Flow Control • Send buffer size: MaxSendBuffer • Receive buffer size: MaxRcvBuffer • Receiving side • LastByteRcvd - LastByteRead < = MaxRcvBuffer • AdvertisedWindow = MaxRcvBuffer - ((NextByteExpected - 1) - LastByteRead) • Sending side • LastByteSent - LastByteAcked < = AdvertisedWindow • EffectiveWindow = AdvertisedWindow - (LastByteSent - LastByteAcked) • LastByteWritten - LastByteAcked < = MaxSendBuffer • block sender if (LastByteWritten - LastByteAcked) + y > MaxSenderBuffer • Always send ACK in response to arriving data segment • Persist when AdvertisedWindow= 0 CSE30264
Protection Against Wrap Around • 32-bit SequenceNum Bandwidth Time Until Wrap Around T1 (1.5 Mbps) 6.4 hours Ethernet (10 Mbps) 57 minutes T3 (45 Mbps) 13 minutes FDDI (100 Mbps) 6 minutes STS-3 (155 Mbps) 4 minutes STS-12 (622 Mbps) 55 seconds STS-24 (1.2 Gbps) 28 seconds CSE30264
Silly Window Syndrome • How aggressively does sender exploit open window? • Receiver-side solutions • after advertising zero window, wait for space equal to a maximum segment size (MSS) • delayed acknowledgements Sender Receiver CSE30264
Nagle’s Algorithm • How long does sender delay sending data? • too long: hurts interactive applications • too short: poor network utilization • strategies: timer-based vs self-clocking when application produces data to send if both the available data and the window >= MSS send a full segment else if there is unACKed data in flight buffer the new data until an ACK arrives else send all the new data now CSE30264
Adaptive Retransmission • Round-Trip Time Estimation: • wait at least one RTT before retransmitting • importance of accurate RTT estimators: • Low RTT -> unneeded retransmissions • High RTT -> poor throughput • RTT estimator must adapt to change in RTT • But not too fast, or too slow! • problem: If the instantaneously calculated RTT is 10, 20, 5, 12, 3 , 5, 6; what RTT should we use for calculations? • EstimatedRTT = a * EstimatedRTT + (1 - a) SampleRTT • recommended value for a: 0.8 - 0.9 • retransmit timer set to b RTT, where b = 2 CSE30264
Retransmission Ambiguity A B A B Original transmission Original transmission ACK Sample RTT retransmission retransmission ACK CSE30264
Karn/Partridge Algorithm • Accounts for retransmission ambiguity • If a segment has been retransmitted: • don’t count RTT sample on ACKs for this segment • reuse RTT estimate only after one successful transmission • double timeout after each retransmission CSE30264
Jacobson/Karels Algorithm • Key observation: • using b RTT for timeout doesn’t work • at high loads round trip variance is high • Solution: • if D denotes mean variation • timeout = RTT + 4D CSE30264
Jacobson/Karels Algorithm • New Calculations for average RTT • Diff = SampleRTT - EstimatedRTT • EstimatedRTT = EstimatedRTT + (d * Diff) • Dev = Dev + d * (|Diff| - Dev) • where d is a factor between 0 and 1 • Consider variance when setting timeout value • TimeOut = m * EstimatedRTT + f * Dev • where m = 1 and f = 4 CSE30264
Record Boundaries • Byte-stream protocol: write 8+2+20 bytes and read 5+5+5+5+5+5 (loop). • TCP offers two features to insert record boundaries: • URG flag • push operation CSE30264
TCP Extensions • Implemented as header options • Better way to measure RTT (use actual system clock for sending time and add timestamp to segment). • 64-bit sequence numbers: 32-bit sequence number in low-order 32 bits, timestamp in high-order 32 bits. • Shift (scale) advertised window. CSE30264