520 likes | 840 Views
Elementary TCP Sockets The presentation will provide sufficient information to build a COMPLETE TCP client and server. In addition the topic of concurrency will be developed.
E N D
Elementary TCP Sockets • The presentation will provide sufficient information to build a COMPLETE TCP client and server. • In addition the topic of concurrency will be developed. • This is an important design pattern in that it may be adapted for other software development where concurrent behavior is required. • It must be noted that in the current era the concurrent server model has been somewhat modified in that a connection 'pool' algorithm has been adapted. • The 'fork' is expensive in terms of time and space. • Apache uses a connection pool architecture. • JCA employs a connection pool approach.
socket function • The first action a process must take to perform network I/O is to call the 'socket' function. • This function specifies the type of communication protocol desired. • TCP (IPv4) • UDP • Unix Domain Stream protocol # include <sys/socket.h> int socket (int family, int type, int protocol); returns a nonnegative descriptor if OK, -1 (ffff ffff) if error condition.
int socket (int family, int type, int protocol); family specifies the protocol family. One of the constants below: AF_INET (IPv4 protocol) AF_INET6 (IPv6 protocol) AF_LOCAL (Unix domain protocols) AF_ROUTE routing sockets AF_KEY key sockets - interface into keys for security type is one of the constants found below: SOCK_STREAM stream socket SOCK_DGRAM datagram socket SOCK_RAW raw socketd protocol is set to 0 except for raw sockets (Chapter 25)
Socket function • Returns a small, nonnegative integer similar to a file descriptor. This is called a socket descriptor or sockfd. • The socket function does not specify the local protocol address or the foreign protocol address. • Connect function • Used by a TCP Client to establish a connection with a TCP server. # include <sys/socket.h> int connect (int sockfd, const struct sockadder *servadder, socklen_t addrlen);
connect Function int connect (int sockfd, const struct sockadder *servadder, socklen_t addrlen); • sockaddr is a pointer to a socket address structure. • socklen is the size of the socket address structure. • The socket address structure must contain the IP address and port number of the server. • The client does NOT have to call bind before calling connect. • The kernel will choose both an ephemeral port number and the source IP address if necessary.
connect Function • The connect function initiates TCPs 3-way handshake. • Function only returns when the connection is established or an error occurs. • ETIMEDOUT is returned if no response to SYN segment after 75 seconds (BSD 4.4) • If server responds with RST then no service is waiting and the ECONNREFUSED error is returned. • RST is a TCP segment that is transmitted on an error. • RST is generated on: SYN to a port that has no listening server, TCP wants to abort existing connection, TCP receives a segment for a connection that does not exist. • If the SYN elicits an ICMP destination unreachable error from some intermediate router - a soft error.
bind Function • Assigns a local protocol address to a socket. (32 bit IPv4 address and 16 bit port number). #include <sys/socket.h> int bind(int sockfd, const struct sockadd *myaddr, socklen_t, addrlen); Servers bind their well-known port at startup. Most servers use a well-known port. Clients are assigned an emphemeral port. RPC is an exception as RPC uses the RCP port mapper. A process can bind a specific IP address to its socket. Normally a TCP client does NOT bind an IP address to its socket.
bind Function • If a TCP server does not bind an IP address to its socket teh kernel uses the destination IP addr of the client's SYN as the server's source IP address.. • If the a port number of 0 is specified the kernel chooses an ephemeral port number. • If an ephemeral port number is chosen bind does NOT return the port number. (because const cannot be used in the right-hand side of an assignment). • Must use getsockname to return the protocol address. • Provision of web servers to multiple organizations. • each org has its own domain name. • each org name is mapped to an IP (typically same subnet) • All the IPs are then aliased onto a single network interface (use alias option of ifconfig command)
listen Function • Only called by a TCP server • listen converts an unconnected socket into a passive socket thereby indicating to the kernel to accept incoming connections to that socket. • call to listen moves the socket from CLOSED to LISTEN in the TCP state diagram. • The second argument specifies the maximum number of connections that the kernel should queue for this socket. int listen (int sockfd, int backlog); Called after the socket and bind functions and MUST be called before the accept function.
Queues: • For each listening socket the kernel maintains two queues. • An incomplete connection queue contains an entry for each SYN that has arrived from a client for which the server is awaiting the completion of the 3 way handshake. • These sockets are in the SYN_RCVD state. • A completed connection queue which contains an entry for each client with whom the TCP 3-way handshake has been completed. • These sockets are in the ESTABLISHED state. • When a process calls accept the first entry on the completed queue is returned to the process - if the queue is empty the process is put to sleep.
listen Function • There has never been a formal definition of what backlog means. • BSD 4.2 "defines the maximum length the queue of pending connections may grow to". • The definition does not define whether a pending connection is in the SYN_RCVD or ESTABLISHED (and not yet accepted). • Common size of backlog is now around 100. • Steven's has some wrapper code which allows the backlog value to be an environment variable (no recompiling the server). • An entry will remain on the queue for approximately one RTT (round trip time)
Backlog queues: • If a client SYN arrives when the queues are full TCP ignores the SYN and does NOT send an RST. • Condition is considered temporary. • Data that arrives after the connection but before the accept is queued up in the receiver buffer. • backlog should specify the maximum number of completed connections allowed in the queue. • IP spoofing; sending a flood of SYNs with a bogus source IP address. • Thus overunning the incomplete connection queue and effecting a denial of service.
accept Function • accept is called by a TCP server to return the next completed connection from the front of the completed connection queue. • If the queue is empty the process is put to sleep (assuming a blocking socket). #include <sys/socket.h> int accept (int sockfd, struct sockdaddr *clidaddr, socklen_t, *addrlen) • cliaddr and addrlen arguments are used to return the protocol address of the connected peer process. • addrlen is a value-result argument • Prior to the call the integer value pointed by *addrlen is set to the size of the socket address structure pointed to by *cliaddr. • On return *addrlen points to a value which is the actual number of bytes in the socket address struct.
accept Function (continued) • If accept is successful its return value is abrand dnew descriptor that was automatically created by the kernal. • The fd refers to the TCP connection with the client. • For discussion purposes the first arg to accept is the listening socket (fd created by socket) • Used as an arg in the call to both bind and listen • The returned value from accept is referred to as the connected socket. • The use of value-result arguments is common in Unix kernel invocations.
fork Function • Used to create a new process (the only way in Unix) pid_t fork(void); • fork is called once but it returns twice. • Returns once to the parent with the PID of the newly created process • Also returns to the child with a value of zero. • Therefore the process can determine if it is a parent or a child. • Child has only one parent. • A parent can have many children and therefore must have the PIDs to distinguish them.
fork (continued) • All descriptors (fd) open in the parent before the fork are shared with the child after fork() returns. • This is desireable for our server model in that it allows the child to access the socket (read/write) but the parent can still close it. • A server is using the fork() to allow another process to handle the connection while the parent can return to handling requests for connections. • A fork() can be used to execute another program. • This approach requires the use of the exec() function.
exec function • The only way in Unix to load an executable program file on disk is via the exec function to be called by an extant process. • exec replaces the current process image with the new program file (arguments). • This executable will start at its main function. • PID does NOT change on the invocation of an exec. • Reference to the exec function is generic as there are six exec functions.
exec function (continued) • Difference in the six exec functions: • whether the called program is referenced by its pathname or filename. • whether the arguments to the new program are listed sequentially or referenced through an array of pointers (argv, argc). • Whether the environment of the calling process is passed to the new program or whether a new environment is specified. • Normally execve is a system call within the kernel which the other five exec functions call. • Must terminate any argv arrays with a null pointer.
Concurrent Server int sockfd; /* listening socket */ sockfd = socket(AF_INET, SOCK_STREAM, 0); struct sockaddr_in myaddr; /* local server address */ bzero(&myaddr, sizeof(myaddr)); /* initialize the structure */ myaddr.sin_family = AF_INET; /* address family */ myaddr.sin_port = htons(80); /* port number */ myaddr.sin_addr.s_addr = htonl(INADDR_ANY); /* take any interface */ if ( bind( sockfd, (struct sockaddr *) &myaddr, sizeof(myaddr)) < 0) error_handling(); if ( listen( sockfd, 100) < 0) /* choose a large value here */ error_handling();
Concurrent server (continued) /* loop to accept connections and process requests concurrently */ while (1) { struct sockaddr_in client_addr; int newSockfd = accept( sockfd, &client_addr, sizeof(client_addr)); if (newSockfd< 0) error_handling(); if ( fork() == 0) { /* child: handle this request */ close(sockfd); /* close the listening socket (decrease the reference count) */ process_request(newSockfd); exit(0); } else { /* parent: continue to accept connections */ close(newSockid); /* decrease the reference count */ } } /* end while */
Reference Counts • Every file or socket has a reference count. • The reference count is maintained in the file table entry. • This is a count of the number of descriptors that are currently open that refer to the particular file or socket. • After socket returns the file table entry associated with listenfd has a reference count of 1. • After accept returns the file table entry associated with connfd has a reference count of 1. • After fork() returns both descriptors are shared (duplicated) between the parent and the child. • This means that the file table entries for both have a reference count of 2
Reference Counts • This means that when the parent closes connfd the kernel decrements the reference count from 2 to 1. • A real close on the socket does NOT take place until the reference count is 0. client connect() server listenfd connfd connection status of client-server after return from accept
client connect() parent • Status of client-server after fork returns. server listenfd connfd connection child listen fd connfd
client connect() parent Status of client-server after parent and child close appropriate sockets. server listenfd connfd this is the desired final state of the sockets.The child is handling the connection with the client and the parent can call accept again on the listening socket to handle the next client connection. connection child listen fd connfd
Concurrent Server • close Function int close (int sockfd); • The default action of close with a TCP socket is to mark the socket as closed and return to the process immediately. • This renders the socket descriptor unusable. • However close does NOT guarantee that a TCP FIN will be sent. • The only way to ensure a FIN is to call shutdown(). • It is imperative that the server code be aware of this situation as it is possible to exceed the maximum allowable number of fd for a given process.
getsockname & getpeername Functions • getsockname returns the local protocol address associated with a socket. • getpeername returns the foreign protocol address associated with a socket. int getsockname (int sockfd, sturct sodkaddr *localaddr, socklen_t, *addrlen); int getpeername (int sockfd, struct sockaddr *peeraddr, socklen_t *addrlen) • Both functions fill in the socket address structure pointed to by the localaddr or peeradr.
getsockname & getpeername • raison d`etre • If a connect without a bind then getsockname returns the local IP addres/port number assigned. • If a bind with a port number of 0 (tells kernel to choose a local port) getsockname returns the local port num choosen. • getsockname can used to determine the address family of a socket. • In a TCP server that binds the wildcard IP getsockname can be used to obtain the local IP address assigned the connection. • When a server is exec'ed by the process that calls accept the ONLY way a server can obtain the identity of the client is to call getpeername (inetd).
Assignment: • Undergraduates: Problems 4.1, 4.2, 4.3 • Graduates: Problems 4.1 through 4.5 • Due next week. • Printed, stapled, name on each sheet. • ALL: read Chapter Five. Be prepared
Chapter Five • Developing an echo server • Client reads a line of text from stdin and writes the line to the server • Server reads the line from network input and echoes the line back to the client. • Client reads the echo'ed line and prints it on stdout. • While this is simplistic the problem covers all the components necessary to build a 'real' server. • Can use this model to examine boundary conditions: • Startup • Client crash • Server crash
Chapter Five • The code on slides 19 and 20 represents the basic structure of a TCP server with the following change: while(1) { clilen = sizeof (cliaddr); connfd = accept(listenfd, (SA *) &cliaddr, &clilen); if (childpid = fork()) == 0) { close(listenfd); str_echo(connfd); /* process the request */ exit(0); } close (connfd); //* parent close of connected soc */ }
TCP Server • str_echo function (Figure 5.3) void str_echo(int sockfd) { ssize_t n; char line(MAXLINE); for(;;) { if ( (n = Readline(sockfd, line, MAXLINE) ) == 0) return; Writen(sockfd, line, n); } }
TCP Echo Client processing loop void str_cli(FILE *fp, int sockfd) { char sendline(MAXLINE), recvline(MAXLINE); while (Fgets(sendline, MAXLINE, fp) != NULL) { Writen(sockfd, sendline, strlen(sendline)); if (Readline(sockfd, recvline, MAXLINE) == 0) err_quit("str_cli: server terminated prematurely"); Fputs(recvline, stdout); } }
TCP Echo Client int main(int argc, char **argv) { int sockfd; struct sockaddr_in, servaddr; if (argc !=2) err_quit("Usage: tcpcli <IPaddress>"); sockfd = Socket(AF_INET, SOCK_STREAM, 0); bzero(&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_port = htons(SERV_P0RT); Inet_pton(AF_INET, argv[1], &servaddr.sin_addr); Connect(sockfd, (SA*) &servaddr, sizeof(servaddr)); str_cli(stdin, sockfd); exit(0); }
Startup of server/client • start server. • run netstat to verify the server's listening socket • use -a option to only list listening sockets. • Start client on same host. • After the 3 way handshake: • The client calls str_cli which blocks in the call to fgets (no input on stdin) • When accept returns, server calls fork and the child calls str_echo, which calls readline, which calls read, which blocks waiting for a line to be sent from the client. • Server parent is now blocking on accept. • How many processes? • Do netstat -a
TCP Client/Server • Normal Termination • Type in two lines followed by a terminal EOF character (Control-D) • Upon receipt of the EOF fgets returns a null pointer and str_cli returns. • Then client main calls exit(). • exit causes all descriptors to be decremented hence client sends a FIN to server. • At this point server socket is in the CLOSE_WAIT state and client socket is in FIN_WAIT_2 state.. • When server receives the FIN the server child is blocked in a call to readline. Readline then returns a 0. str_echo now returns to server child main. • server child terminates by calling exit. • All open descriptors in the server child are closed. This causes the final two segments of the TCP termination to fire. • FIN from server and ACK from client.
TCP Client/Server termination • When the server child terminates SIGCHLD is transmitted. • The child enters the zombie state. • Posix Signal Handling. • A signal is indication that an event has happened (in the old world they were called software interrupts). • Signals are usually asynchronous. • Signals can be sent from one process to another • Or by the kernel to the process. • Every signal has a disposition which is the action associated with a signal (like the vector table on Intel hardware interrupts). • SIGCHLD is sent by the kernel whenever a process terminates to the parent of the terminating process
POSIX signal handling • In Unix a function can be tied to a specific signal. This function is called the signal handler which ‘catches’ the signal. • void handler ( int signumber) • A signal can be ignored by setting its disposition to SIG_IGN. • SIGKILL and SIGSTOP cannot be ignored. • The default disposition for a signal is achieved by setting its disposition to SIG_DFL • The default is normally to terminate the process on receipt. • This is how you can get core dumps (abend). Some signals have a default action of generating a core image of the process in its current working directory. • A few signals (SIGCHLD and SIGURG) have a default action of being ignored.
POSIX signal handling • Steven’s provides a nifty way to provide a signal disposition, meet Posix standards and maintain backward compatibility. • Uses a defined function called ‘signal’ which calls the Posix sigaction function. • First arg to signal is the signal name and the second ar is either a pointer to a function or one of the constants SIG_IGN or SIG_DFL. • Avoids some of the trickiness of calling sigaction directly. • Signal masks: sigemptyset(&act.sa_mask); // part of the struct used by sigaction. The mask allows the specification of a set of signals that will be blocked when the signal handler is called. A blocked signal cannot be delivered to the registered process. The example uses the emptyset so that nothing will be blocked during the sigaction. Posix guarantees that the signal being handled is blocked during execution of the signal handler.
POSIX signal semantics • Once a signal handler is installed it remains installed. • While a signal handler is executing the signal being delivered is blocked. • If a signal is gen’ed 1 or more times while it is blocked it is NOT queued but is delivered once after unblocking. • POSIX 1003.1b defines a set of reliable signals that are queued (not used in this course). • Sets of signals can be selectively masked and unmasked to protect critical regions. This is a technique commonly used in the world of designing and implementing software that will run directly on hardware without benefit of some OS or ‘kernel’. • To block or unblock selectively use the sigprocmask function.
Back to zombies and our hanging child. • Whenever we fork and create a child must wait for them to prevent them from becoming zombies. • To implement the wait, establish a signal handler to catch SIGCHLD and then call the wait within. Signal (SIGCHLD, sig_chld); and the function void sig_chld ( int signo) { pid = wait ( &stat); Engineering and WAY beyond
A significant problem with the example of Figure 5.2 (pg 122). • The parent blocks in its call to accept when the SIGCHLD signal occurs. The signal handler executes (sig_chld), wait fetches the child’s PID, and the signal handler returns. • Since the signal was caught by the parent while the parent was blocked in a slow system call (accept) the kernel causes the accept to return an error (EINTR - interrupted system call). • Slow system calls are any call that can block forever. • The parent then aborts. • Therefore must be aware of interrupted system calls and must provide a means to handle them. • This is the purpose of the SA_RESTART flag; to automatically restart interrupted system calls.
Handling Interrupted System Calls • Basic rule: when a process is blocked in a slow system call and the process catches a signal and the signal handler returns, the system call can return an EINTR. • Some kernels automatically restart some interrupted system calls. • To handle the interrupted accept for (; ; ) if ( ( connfd = accept (……) ) < 0 ) if (errno == EINTR) continue;
Handling Interrupted System Calls • connect cannot be restarted using the self restart. Must use a select. • The select can be used to check for a successful or unsuccessful completion of the connection. • The code will time wait on the connection establishment and can thereby specify a time limit to wait (can wait forever therefore infinite blocking). • This is typically used with a non-blocking TCP socket on which a connect is called. • The non-blocking TCP socket allows multiple connections to be established at the same time; used with some Web browsers.
Wait function • The wait function returns two values; one through a return and one through a value-result pair. • One value is the process id of the terminated child (returned). • The other value is an integer (value-result) that represents the termination status of the child. (normally, killed-by-signal, or a job control stop). • The waitpid function provides more control allowing deterministic choice of which process to wait for. Also a variety of options are available for further definition of the wait state behavior. • waitpidaddresses the shortcoming of establishing a signal handler and simply calling wait; it won’t prevent zombies.
In a multiple child termination process, a number of termination signals can be generated prior to the signal handler executing. • The signal handler will only execute once, since Unix normally does not queue signals. • Hence the signal handler will only execute one or two times (same machine or different) leaving N-2 or N-1 zombies. • The solution is to run waitpid in a loop which will obtain the status of any children to be terminated. • Must use the WNOHANG option (3rd argument). This tells waitpid not to block if there are children running.