190 likes | 207 Views
Summary of best-effort network capabilities, RPC, TCP, sliding window protocol, and more. Learn about connection establishment, flow control, sliding window algorithm, and adaptive retransmission in transport protocols.
E N D
Chapter 5: End-to-End (Transport) Protocols • Summary of underlying best-effort network capabilities (host-host) • drops packets or datagrams • re-orders packets or datagrams (send order versus receive order) • delivers duplicate copies of a given packet or datagram • limits size of packets or datagrams • Arbitrarily long delays for delivering packets or datagrams • End-to-end services desired (process-to-process) • guarantee delivery of messages • Same-order delivery of messages (same order as they are sent) • deliver one copy of each message • arbitrarily large message support • Synchronization support sender-to-receiver (ie, connection oriented) • flow control (allowing receiver to regulate sender’s rate) • multiple application process support on each host
End-to-End (Transport) Protocols continued RPC RPC = Remote Procedure Call
Simple Demultiplexing support (UDP) • Unreliable, unordered, datagram service • Adds demultiplexing • No flow control • Endpoints identified by ports • servers have well-known ports • see /etc/services on Unix • Optional checksum • Header format
Reliable Byte-Stream (TCP) Overview • Connection-oriented, Byte-stream • sending process writes stream of bytes • TCP breaks into segments and sends them as IP_datagrams • receiving process reads stream of bytes (a) 4 512-byte segments sent as IP datagrams (b) 2048B read as stream • Full duplex channel • Flow control provided (to keep sender from overrunning receiver) • Congestion control provided (to keep sender from overrunning network)
End-to-End Issues Based on sliding window protocol similar to that used at data link level, but the situation is very different. • Potentially connects many different hosts • need explicit connection establishment and termination • Potentially different RTT • need adaptive timeout mechanism • Potentially long delay in network • need to be prepared for arrival of very old packets • Potentially different capacity at destination • need to accommodate different amounts of buffering • Potentially different network capacity • need to be prepared for network congestion
Segment Format Advertised • Each connection is identified by a 4-tuple as a demux key: <SrcPort, SrcIPAddr, DestPort, DestIPAddr> • Sliding window alg + flow control involve the following fields: AcknowledgmentNumber, SequenceNumber, AdvertisedWindowSize HdrLen is in 32-bit words 6-bit Flag: used to relay control information URGent set when urgent data is pointed to by UrgPtr ACK set when AckNum is valid, PUSh signifies that sender has flushed buffers ReSeT says receiver has become confused (start over!) SYNchronize, FINish set to establish/terminate connection Checksum: pseudo header(SrcAdr+DestAdr+Lengths) + TCP_header + data
(Active Client) (Passive Server) Connection Establishment and Termination 3-WAY Handshake: Clientsends a segment to the server with its start Seq# (SYN=1, SeqNum=x) Serversends a segment with (SYN=1,ACK=1,AckNum=x+1, SeqNum=y (its own start Seq#)) Clientsends ack segment with (ACK=1, AckNum=y+1) Normal case Call collision
Sliding Window • Guarantees the reliable delivery of data • Ensures that data is delivered in order • Enforces flow control (that sender does not overrun receiver) Basically the same as in the sliding window algorithm at the link level For 1. (guaranteed reliable delivery). Where TCP sliding window differs is that it folds flow control in as well. Rather than fixed size window, receiver advertises a window size thru the AdvertiseWindow field (based on available buffers). Sender then is limited to having no more than that window size. Treatment of Sequence number wrap-around is essentially the same as link level.
Sliding Window Each byte has a Sequence number. ACKs are Cumulative. Sending side LastByteAcked LastByteSent LastByteSent LastByteWritten Bytes between LastByteAcked and LastByteWritten must be buffered Receiving side • LastByteRead < NextByteExpected • bytes between NextByteRead and LastByteRcvd must be buffered Send buffer Window Window
Keeping the Pipe Full Delay x BandwidthProduct 18KB 122KB 549KB 1.2MB 1.8MB 7.4MB 14.8MB Bandwidth T1 (1.5Mbps) Ethernet (10Mbps) T3 (45Mbps) FDDI (100Mbps) STS-3 (155Mbps) STS-12 (622Mbps) STS-24 (1.2Gbps) Time til Wraparound 6.4 hours 57 minutes 13 minutes 6 minutes 4 minutes 55 seconds 28 seconds
Adaptive Retransmission • Original Algorithm • Measure SampleRTT for each segment/ACK pair • Compute weighted average of RTT • EstimatedRTT = a* EstimatedRTT + b* SampleRTT • where a + b = 1 • a between 0.8 and 0.9 • b between 0.1 and 0.2 • Set timeout based on EstimatedRTT • TimeOut = 2 * EstimatedRTT Karn/Partridge Algorithm • Do not sample RTT when retransmitting • Double timeout after each retransmission
Jacobson/Karels Algorithm (used today) • New calculation for average RTT Diff = SampleRTT - EstimatedRTT EstimatedRTT = EstimatedRTT + (d * Diff) Deviation = Deviation + d * (|Diff|- Deviation) (where d is a fraction between 0 and 1) • Setting timeout value TimeOut = mx EstimatedRTT + fx Deviation (where m = 1 and f = 4) • Notes • algorithm only as good as granularity of clock (500ms on Unix) • accurate timeout mechanism important to congestion control (later) • TCP Extensions proposed • Store timestamp in outgoing segments • Use 32-bit timestamp • Make modifications to advertised window
Caller Callee (client) (server) Return Return SELECT Arguments Arguments value value Server Client stub stub Request Reply Request Reply CHAN RPC RPC protocol protocol BLAST IP ETH Remote Procedure Call (RPC) Protocol Stack BLAST: fragments and reassembles large messages CHAN: synchronizes request and reply messages Simple RPC Protocol Stack SELECT: dispatches message to correct process
Sender Receiver Fragment 1 Fragment 2 Fragment 3 Fragment 4 Fragment 5 Fragment 6 SRR Fragment 3 Fragment 5 SRR Bulk Transfer (BLAST) Unlike AAL and IP, BLAST tries to recover from lost fragments Strategy • Accumulates acks • selective retransmission • aka partial acknowledgements Blast header format MID protects against wraparound NumFrags = number of fragments TYPE = DATA or SRR FragMask distinguishes fragments if Type=DATA, identifies this frag if Type=SRR, identifies missing frags
Client Server Client Server Request Request 1 ACK Reply 1 Reply Request 2 Reply 2 ACK … Request/Reply (CHAN) • Guarantees message delivery • Synchronizes client with server • Supports at-most-once semantics Simple timeline Timeline using Implicit Acks
CHAN Header Format typedef struct { u_short Type; /* REQ, REP, ACK, PROBE */ u_short CID; /* unique channel id */ int MID; /* unique message id */ int BID; /* unique boot id */ int Length; /* length of message */ int ProtNum; /* high-level protocol */ } ChanHdr; CHAN Session State typedef struct { u_char type; /* CLIENT or SERVER */ u_char status; /* BUSY or IDLE */ int retries; /* number of retries */ int timeout; /* timeout value */ XkReturn ret_val; /* return value */ Msg *request; /* request message */ Msg *reply; /* reply message */ Semaphore reply_sem; /* client semaphore */ int mid; /* message id */ int bid; /* boot id */ } ChanState;
Dispatcher (SELECT) • Dispatch to appropriate procedure • Synchronous counterpart to UDP • Address Space for Procedures • flat: unique id for each possible procedure • hierarchical: program + procedure number
0 31 0 31 XID XID MsgType = CALL MsgType = REPLY RPCVersion = 2 Status = ACCEPTED Data Program Version Procedure Credentials (variable) Verifier (variable) Data SunRPC • IP implements ~BLAST-equivalent • SunRPC implements ~CHAN-equivalent • UDP + SunRPC implement SELECT-equivalent • UDP dispatches to program (ports bound to programs) • SunRPC dispatches to procedure within program • SUN RPC header: • XID (transaction id) similar to CHAN’s MID • Server does not remember last XID it serviced • Problem if client retransmits request while reply is in transit
Application Application data data Presentation Presentation encoding decoding … Message Message Message Presentation Formatting • Data types considered • integers • floats • strings • arrays • structs • Types of data not considered • images • video • multimedia documents