460 likes | 622 Views
End-to-End Protocols. Outline Simple Demultiplexer Reliable Byte-Stream Remote Procedure Call Performance. End-to-End Protocols. Common end-to-end services guarantee message delivery deliver messages in the same order they are sent deliver at most one copy of each message
E N D
End-to-End Protocols Outline Simple Demultiplexer Reliable Byte-Stream Remote Procedure Call Performance
End-to-End Protocols • Common end-to-end services • guarantee message delivery • deliver messages in the same order they are sent • deliver at most one copy of each message • support arbitrarily large messages • support synchronization • allow the receiver to flow control the sender • support multiple application processes on each host • Underlying best-effort network • drop messages • reorders messages • delivers duplicate copies of a given message • limits messages to some finite size • delivers messages after an arbitrarily long delay
Simple Demultiplexor (UDP) • User Datagram Protocol (UDP) - Unreliable and unordered datagram service • Adds multiplexing to allow multiple application processes on each host to share the network • A port is the abstraction of the communication endpoints. • Use a <port/mailbox, host> pair to identify a process • Endpoints identified by ports • servers have well-known ports – DNS:53, talk:517 • see /etc/services on Unix
Simple Demultiplexor (UDP) • A port is implemented by a message queue. • UDP has no flow control. • UDP header format • Optional checksum: psuedo header + UDP header + data • psuedo header: Protocol number, Source IP address, Destination IP address, and UDP length field • Verify that this message has been delivered between the correct two endpoints. 0 16 31 SrcPort DstPort Checksum Length Data
Reliable Byte-Stream (TCP) Outline Connection Establishment/Termination Sliding Window Revisited Flow Control Adaptive Timeout
TCP Overview • Transmission Control Protocol (TCP) is a reliable, connection-oriented, and byte-stream service. • A byte-stream service • application writes bytes • TCP sends segments • application reads bytes • TCP is a full-duplex protocol. • TCP supports a demultiplexing mechanism.
Application process Application process W rite Read bytes bytes … … TCP TCP Send buffer Receive buffer … Segment Segment Segment T ransmit segments TCP Overview • Flow control: keep sender from overrunning receiver • Congestion control: keep sender from overrunning network • TCP uses the sliding window algorithm.
Data Link Versus Transport • Potentially have many connections between different hosts • need explicit connection establishment and termination • Potentially different RTT • need adaptive timeout mechanism • Potentially long delay in network • need to be prepared for arrival of very old packets • Potentially different capacity at destination • need to accommodate different node capacity • Potentially different network capacity • need to be prepared for network congestion
TCP Segment Format • The packets exchanged between TCP peers are called segments. • How does TCP decide that it has enough bytes to send a segment? • TCP maintains a variable, called the maximum segment size (MSS), and it sends a segment as soon as it has collected MSS bytes from the sending process. • TCP supports a push operation, and the sending process invokes this operation to effectively flush the buffer of unsent byte. • The final trigger is a timer that periodically fires.
TCP Header Format • SrcPort: Source port, DstPort: Destination port • Acknowledgement, SequenceNum, and AdvertisedWindow fields are all involved in TCP’s sliding window algorithm. • The 6-bit Flags field is used to replay control information between TCP peers: • SYN, FIN: establish and terminate a TCP connection. • RESET, PUSH: push operation • URG: urgent data up to UrgPtr bytes • ACK: Acknowledgement
Data (SequenceNum) Sender Receiver Acknowledgment + AdvertisedWindow Segment Format (cont) • Each connection identified with 4-tuple: • (SrcPort, SrcIPAddr, DsrPort, DstIPAddr) • Sliding window + flow control • acknowledgment, SequenceNum, AdvertisedWinow • Flags • SYN, FIN, RESET, PUSH, URG, ACK • Checksum • pseudo header + TCP header + data
Three-Way Handshake • The algorithm used by TCP to establish and terminate a connection is a called a three-way handshake. • A timer is scheduled for each of the first two segments. • The client and server select an initial starting sequence number at random and have to exchange starting sequence numbers with each other at connection setup time. • This is to protect against the chance that a segment from an early connection might interfere with a latter one. • TCP can be specified in a state-transition diagram.
Connection Establishment and Termination Active participant Passive participant (client) (server) SYN, SequenceNum = x , y 1 + SYN + ACK, SequenceNum = x Acknowledgment = ACK, Acknowledgment = y + 1
CLOSED Active open /SYN Passive open Close Close LISTEN SYN/SYN + ACK Send/ SYN SYN/SYN + ACK SYN_RCVD SYN_SENT ACK SYN + ACK/ACK Close /FIN ESTABLISHED Close /FIN FIN/ACK FIN_WAIT_1 CLOSE_WAIT FIN/ACK ACK Close /FIN ACK + FIN/ACK FIN_WAIT_2 CLOSING LAST_ACK Timeout after two ACK ACK segment lifetimes FIN/ACK TIME_WAIT CLOSED State Transition Diagram
Sliding Window • TCP’s sliding window algorithm serves several purposes: • It guarantees the reliable delivery of data. • It ensures that data is delivered in order. • It enforces flow control between the sender and the receiver. • In order to keep the sender from overrunning the receiver’s buffer, the receiver advertises a window size to the sender by specifying the AdvertisedWindow field in the TCP header.
Sending application Receiving application TCP TCP LastByteWritten LastByteRead LastByteAcked LastByteSent NextByteExpected LastByteRcvd Sliding Window Revisited • Sending side • LastByteAcked < = LastByteSent • LastByteSent < = LastByteWritten • buffer bytes between LastByteAcked and LastByteWritten • Receiving side • LastByteRead < NextByteExpected • NextByteExpected < = LastByteRcvd +1 • buffer bytes between NextByteRead and LastByteRcvd
Flow Control • Send buffer size: MaxSendBuffer • Receive buffer size: MaxRcvBuffer • Receiving side • LastByteRcvd - LastByteRead < = MaxRcvBuffer • AdvertisedWindow = MaxRcvBuffer - (LastByteRcvd - NextByteRead) • Sending side • LastByteSent - LastByteAcked < = AdvertisedWindow • EffectiveWindow = AdvertisedWindow - (LastByteSent - LastByteAcked) • LastByteWritten - LastByteAcked < = MaxSendBuffer • block sender if (LastByteWritten - LastByteAcked) + y > MaxSenderBuffer • Always send ACK in response to arriving data segment • Persist when AdvertisedWindow= 0
Protection Against Wrap Around • 32-bit SequenceNum Bandwidth Time Until Wrap Around T1 (1.5 Mbps) 6.4 hours Ethernet (10 Mbps) 57 minutes T3 (45 Mbps) 13 minutes FDDI (100 Mbps) 6 minutes STS-3 (155 Mbps) 4 minutes STS-12 (622 Mbps) 55 seconds STS-24 (1.2 Gbps) 28 seconds
Keeping the Pipe Full • 16-bit AdvertisedWindow Bandwidth Delay x Bandwidth Product T1 (1.5 Mbps) 18KB Ethernet (10 Mbps) 122KB T3 (45 Mbps) 549KB FDDI (100 Mbps) 1.2MB STS-3 (155 Mbps) 1.8MB STS-12 (622 Mbps) 7.4MB STS-24 (1.2 Gbps) 14.8MB
Adaptive Retransmission(Original Algorithm) • Measure SampleRTT for each segment/ ACK pair • Compute weighted average of RTT • EstRTT = axEstimatedRTT + bxSampleRTT • where a+b = 1 • a between 0.8 and 0.9 • b between 0.1 and 0.2 • Set timeout based on EstRTT • TimeOut=2xEstRTT
Karn/Partridge Algorithm Sender Receiver Sender Receiver • Do not sample RTT when retransmitting • Double timeout after each retransmission Original transmission Original transmission TT TT ACK Retransmission SampleR SampleR Retransmission ACK
Jacobson/ Karels Algorithm • New Calculations for average RTT • Diff = sampleRTT - EstRTT • EstRTT = EstRTT + (8x Diff) • Dev = Dev + 8 ( |Diff| - Dev) • where 8 is a factor between 0 and 1 • Consider variance when setting timeout value • TimeOut = mxEstRTT + fxDev • where m = 1 and f = 4 • Notes • algorithm only as good as granularity of clock (500ms on Unix) • accurate timeout mechanism important to congestion control (later)
TCP Extensions • Implemented as header options • Store timestamp in outgoing segments • Extend sequence space with 32-bit timestamp (PAWS) • Shift (scale) advertised window
Remote Procedure Call Outline Basics Protocol Stack Presentation Formatting
Remote Procedure Call Basics • Problems with sockets • The read/write (input/output) mechanism is used in socket programming. • Socket programming is different from procedure calls which we usually use. • To make computing transparent from locations, input/output is not the best way.
Remote Procedure Call Basics • A procedure call is a standard abstraction in local computation. • Procedure calls are extended to distributed computation in Remote Procedure Call (RPC) as shown in Figure 5.11. • A caller invokes execution of procedure in the callee via the local stub procedure. • The implicit network programming hides all network I/O code from the programmer. • Objectives are simplicity and ease of use.
Remote Procedure Call Basics • The concept is to provide a transparent mechanism that enables the user to utilize remote services through standard procedure calls. • Client sends request, then blocks until a remote server sends a response (reply). • Advantages: user may be unaware of remote implementation (handled in a stub in library); uses standard mechanism. • Disadvantages: prone to failure of components and network; different address spaces; separate process lifetimes.
Caller Callee (client) (server) Return Return Arguments Arguments value value Server Client stub stub Request Reply Request Reply RPC RPC protocol protocol RPC Components • Protocol Stack • BLAST: fragments and reassembles large messages • CHAN: synchronizes request and reply messages • SELECT: dispatches request to the correct process • Stubs
RPC Timeline Client Server Blocked Request Blocked Computing Reply Blocked
SunRPC • IP implements BLAST-equivalent • except no selective retransmit • SunRPC implements CHAN-equivalent • except not at-most-once • UDP + SunRPC implement SELECT-equivalent • UDP dispatches to program (ports bound to programs) • SunRPC dispatches to procedure within program
Sun RPC • It is designed for client-server communication over Sun NFS network file system. • UDP or TCP can be used. If UDP is used, the message length is restricted to 64 KB, but 8 - 9 KB in practice. • The Sun XDR is originally intended for external data representation. • Valid data types supported by XDR include int, unsigned int, long, structure, fixed array, string (null terminated char *), binary encoded data (for other data types such as lists).
Sun XDR • A program number and a version number are supplied. • The procedure number is used as a procedure definition. • Single input parameter and output result are being passed.
Files interface in Sun XDR const MAX = 1000; typedef int FileIdentifier; typedef int FilePointer; typedef int Length; struct Data { int length; char buffer[MAX]; }; struct writeargs { FileIdentifier f; FilePointer position; Data data; }; struct readargs { FileIdentifier f; FilePointer position; Length length; }; program FILEREADWRITE { version VERSION { void WRITE(writeargs)=1; 1 Data READ(readargs)=2; 2 }=2; } = 9999;
Sun RPC • The interface compiler rpcgen is used to generate the following from interface definition. • client stub procedures • server main procedure, dispatcher and server stub procedures • XDR marshalling and unmarshalling procedures used by dispatcher and client, server stub procedures. • Binding: • portmapper records program number, version number, and port number. • If there are multiple instance running on different machines, clients make multicast remote procedure calls by broadcasting them to all the port mappers.
Example (Sun RPC) • long sum(long) example • client localhost 10 • result: 55 • Need RPC specification file (sum.x) • defines procedure name, arguments & results • Run (interface compiler) rpcgen sum.x • generates sum.h, sum_clnt.c, sum_xdr.c, sum_svc.c • sum_clnt.c & sum_svc.c: Stub routines for client & server • sum_xdr.c: XDR (External Data Representation) code takes care of data type conversions
RPC XDR File (sum.x) struct sum_in { long arg1; }; struct sum_out { long res1; }; program SUM_PROG { version SUM_VERS { sum_out SUMPROC(sum_in) = 1; /* procedure number = 1*/ } = 1; /* version number = 1 */ } = 0x32123000; /* program number */
Example (Sun RPC) • Program-number is usually assigned as follows: • 0x00000000 - 0x1fffffff defined by SUN • 0x20000000 - 0x3fffffff defined by user • 0x40000000 - 0x5fffffff transient • 0x60000000 - 0xffffffff reserved
RPC Client Code (rsum.c) #include ''sum.h'' main(int argc, char* argv[]) { CLIENT* cl; sum_in in; sum_out *outp; // create RPC client handle; need to know server's address cl = clnt_create(argv[1], SUM_PROG, SUM_VERS, ''tcp''); in.arg1 = atol(argv[2]); // number to be squared // Call RPC; note convention of RPC function naming if ( (outp = sumproc_1(&in, cl)) == NULL) err_quit(''%s'', clnt_sperror(cl, argv[1]); printf(''result: %ld\n'', outp->res1); }
RPC Server Code (sum_serv.c) #include "sum.h" sum_out* sumproc_1_svc (sum_in *inp, struct svc_req *rqstp) { // server function has different name than client call static sum_out out; // why is this static? int i; out.res1 = inp->arg1; for (i = inp->arg1 - 1; i > 0; i--) out.res1 += i; return(&out); } // server's main() is generated by rpcgen
Compilation Linking rpcgen sum.x cc -c rsum.c -o rsum.o cc -c sum_clnt.c -o sum_clnt.o cc -c sum_xdr.c -o sum_xdr.o cc -o client rsum.o sum_clnt.o sum_xdr.o cc -c sum_serv.c -o sum_serv.o cc -c sum_svc.c -o sum_svc.o cc -o server sum_serv.o sum_svc.o sum_xdr.o
Internal Details of Sun RPC • Initialization • Server runs: register RPC with port mapper on server host (rpcinfo –p) • Client runs: clnt_create contacts server's port mapper and establishes TCP connection with server (or UDP socket) • Client • Client calls local procedure (client stub: sumproc_1), that is generated by rpcgen. Client stub packages arguments, puts them in standard format (XDR), and prepares network messages (marshaling). • Network messages are sent to remote system by client stub. • Network transfer is accomplished with TCP or UDP.
Internal Details of Sun RPC • Server • Server stub (generated by rpcgen) unmarshals arguments from network messages. Server stub executes local procedure (sumproc_1_svc) passing arguments received from network messages. • When server procedure is finished, it returns to server stub with return values. • Server stub converts return values (XDR), marshals them into network messages, and sends them back to client • Back to Client • Client stub reads network messages from kernel • Client stub returns results to client function
0 31 0 31 XID XID MsgType = CALL MsgType = REPLY RPCVersion = 2 Status = ACCEPTED Data Program Version Procedure Credentials (variable) Verifier (variable) Data SunRPC Header Format • XID (transaction id) is similar to CHAN’s MID • Server does not remember last XID it serviced • Problem if client retransmits request while reply is in transit