440 likes | 651 Views
I/O Models. Satish Krishnan. I/O Models. Blocking I/O Non-blocking I/O I/O Multiplexing Signal driven I/O Asynchronous I/O. application. kernel. no datagram ready. recvfrom. wait for data. process blocks in a call to recvfrom. datagram ready. copy datagram.
E N D
I/O Models Satish Krishnan
I/O Models • Blocking I/O • Non-blocking I/O • I/O Multiplexing • Signal driven I/O • Asynchronous I/O
application kernel no datagram ready recvfrom wait for data process blocks in a call to recvfrom datagram ready copy datagram copy data from kernel to user return OK process datagram copy complete Blocking I/O Model system call
system call return OK Non-blocking I/O Model application kernel recvfrom no datagram ready EWOULDBLOCK system call no datagram ready recvfrom wait for data EWOULDBLOCK process repeatedly calls recvfrom, waiting for an OK return (polling) system call recvfrom datagram ready copy datagram copy data from kernel to user process datagram copy complete
return OK I/O Multiplexing Model application kernel system call no datagram ready select process blocks in a call to select waiting for one of possibly many sockets to become readable wait for data return readable datagram ready system call recvfrom copy datagram process blocks while data copied into application buffer copy data from kernel to user process datagram copy complete
return return OK Signal Driven I/O Model application kernel sigaction system call establish SIGIO signal handler process continues executing wait for data deliver SIGIO datagram ready signal handler system call recvfrom copy datagram process blocks while data copied into application buffer copy data from kernel to user process datagram copy complete
kernel no datagram ready wait for data datagram ready copy datagram copy data from kernel to user copy complete Asynchronous I/O Model application system call aio_read return process continues executing deliver signal signal handler process datagram specified in aio_read
I/O Multiplexing • The capability to tell the kernel that we want to be notified if one or more I/O conditions is ready. • Provided by select and pollfunctions. • Is required when • a client handles multiple descriptors(e.g. Interactive input and network socket). • a client handles multiple sockets • a server handles both TCP and UDP transport. • a server handles multiple services or protocols (e.g. inetd)
select Function • It tells the kernel to wait for one or more descriptors to be ready (for reading, writing or an exception condition) and for how long to wait. int select(int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset, const struct timeval *timeout); returns positive count of ready descriptors, 0 on timeout, -1 on error.
select Function Contd... • maxfdp1 specifies the number of descriptors to be tested. • readset, writeset and exceptset are value-result arguments. • timeout can make select wait forever, or wait for a specified amount of time, or return immediately. • select is not automatically restarted in Berkeley implementations if it is interrupted by a signal. • The value of timeout can be modified in Linux implementations.
When is a socket ready for read? • The number of bytes in the socket receive buffer is >= the low water mark for the socket receive buffer. It defaults to 1 for TCP and UDP sockets. • The read half of the connection is closed (FIN received). • The socket is a listening socket and the number of completed transactions is non-zero (accept returns the connection). • A socket error is pending. read will return an error (-1) with errno set.
When is a socket ready for write? • The number of bytes of available space in the socket send buffer is >= the low water mark for the socket send buffer. It defaults to 2048 for TCP and UDP sockets. • The write half of the connection is closed. SIGPIPE generated on write. • A socket error is pending. write will return an error (-1) with errno set.
Exception Condition • A socket has an exception condition pending if there exists out-of-band data for the socket. • Out-of-band data is high priority data. • TCP terms it urgent mode. OOB data is placed in the next available set of bytes in the send buffer. The next TCP segment sent will have the URG flag set in the TCP header and the urgent offset pointing to the OOB byte. • The receiving process is notified on receipt of the SIGURG signal.
pselect Function int pselect(int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset, const struct timespec *timeout, const sigset_t *sigmask); returns count of ready descriptors, 0 on timeout, -1 on error • timespec structure can specify time in nanoseconds.
Pselect contd… • sigmask allows the program to block or unblock certain signals. This mask is applicable only during the call to pselect. After pselect returns the old mask replaces this mask.
poll Function int poll(struct pollfd *fdarray, unsigned long nfds, int timeout); returns count of ready descriptors, 0 on timeout, -1 on error. • Each element of fdarray is a pollfd structure that specifies the condition to be tested for a given descriptor. • pollfd contains: file descriptor fd, events of interest on fd, and events that occured on fd. • To switch off a descriptor, set the fd member of the pollfd structure to a negative value.
Socket Options Satish Krishnan
Agenda • Functions to get and set socket options • Socket Options • Generic socket options • Protocol specific socket options
Socket Options • There are 3 ways to get and set options affecting sockets - • the getsockopt and setsockopt functions. • the fcntl function • the ioctl function
getsockopt() int getsockopt(int sockfd, int level, int optname, void *optval, socklen_t *optlen); returns 0 if OK, -1 on error. • level indicates whether the socket option is generic or protocol specific. • optval is a pointer to a variable into which the current value of the option is stored by getsockopt(). • optlen is the size of optval. It is a value-result argument.
setsockopt() int setsockopt(int sockfd, int level, int optname, const void *optval, socklen_t optlen); returns 0 if OK, -1 on error. • level indicates whether the socket option is generic or protocol specific. • optval is a pointer to a variable from which the new value of the option is fetched by setsockopt(). • optlen is the size of optval.
Socket Options • Two basic type of options - • Flags - binary options that enable or disable a feature. • Values - options that fetch and return specific values. • Not supported by all implementations. • Socket option fall into 4 main categories - • Generic socket options • SO_RCVBUF, SO_SNDBUF, SO_BROADCAST, etc. • IPv4 • IP_TOS, IP_MULTICAST_IF, etc. • IPv6 • IPv6_HOPLIMIT, IPv6_NEXTHOP, etc. • TCP • TCP_MAXSEG, TCP_KEEPALIVE, etc.
Socket States • Options have to be set or fetched depending on the state of a socket. • Some socket options are inherited from a listening socket to the connected sockets on the server side. • E.g. SO_RCVBUF and SO_SNDBUF These options have to be set on the socket before calling listen() on the server side and before calling connect() on the client side.
Generic Socket Options • SO_BROADCAST • Enables or disables the ability of a process to send broadcast messages. • It is supported only for datagram sockets. • Its default value is off. • SO_ERROR • Pending Error - When an error occurs on a socket, the kernel sets the so_error variable. • The process can be notified of the error in two ways - • If the process is blocked in select for either read or write, it returns with either or both conditions set. • If the process is using signal driven I/O, the SIGIO signal is generated for the process.
Generic Socket Options Contd... • SO_KEEPALIVE • Purpose of this option is to detect if the peer host crashes. The SO_KEEPALIVE option will detect half-open connections and terminate them. • If this option is set and no data has been exchanged for 2 hours, then TCP sends keepalive probe to the peer. • Peer responds with ACK. Another probe will be sent only after 2 hours of inactivity. • Peer responds with RST (has crashed and rebooted). Error is set to ECONNRESET and the socket is closed. • No response. 8 more probes are sent after which the socket’s pending error is set to either ETIMEDOUT or EHOSTUNREACH and the socket is closed.
Generic Socket Options Contd... Receive Low Water Mark - • Amount of data that must be in the socket receive buffer for a socket to become ready for read. Send Low Water Mark - • Amount of space that must be available in the socket send buffer for a socket to become ready for write. • SO_RCVLOWAT and SO_SNDLOWAT • These options specify the receive low water mark and send low water mark for TCP and UDP sockets.
Generic Socket Options Contd... • SO_RCVTIMEO and SO_SNDTIMEO • These options place a timeout on socket receives and sends. • The timeout value is specified in a timeval structure. struct timeval { long tv_sec ; long tv_usec ; } • To disable a timeout, the values in the timeval structure are set to 0.
Generic Socket Options Contd... • SO_REUSEADDR • It allows a listening server to restart and bind its well known port even if previously established connections exist. • It allows multiple instances of the same server to be started on the same port, as long as each instance binds a different local IP address. • It allows a single process to bind the same port to multiple sockets, as long as each bind specifies a different local IP address. • It allows completely duplicate bindings only for UDP sockets (broadcasting and multicasting).
UDP Satish Krishnan
UDP and TCP • When to use? • Applications use broadcasting or multicasting • Cost of connection establishment is high compared to data transferred • When not to use? • Flow control is very important • Packet sequence has to be maintained
Application details • Applications will have to do for themselves • Acknowledgement of packets • Flow control • Error detection Eg dns name query , NFS etc
Socket functions • ssize_t recvfrom(int sockfd,void *buff,size_t nbytes,int flags,struct sockaddr *from,socklen_t addrlen); • ssize_t sendto(int sockfd,const void *buff,size_t nbytes,int flags,const struct sockaddr *tosocklen_t addrlen);
recvfrom • Blocking call • Receives data from bound address and port • Gives receivers information on struct sockaddr pointer
Send to • Used to send data when the socket is not in a connected state • address of the target is specified in the argument list • Errors are indicated by a return value of -1
Connect? • Connect is done on a udp socket specifying destination address and port • Send can be used instead of sendto as destination address is known • Does not do any TCP like connection
Name Conversions • struct hostent *gethostbyname(const char *hostname) • struct hostent { • char *hname; • char **h_aliases • int h_addrtype; • int h_length • int *h_adr_list • }
Name conversions • Struct hostent *gethostbyaddr(const char *addr,size_t len,int family); • Uname(struct utsname *name) utsname has sysname,nodename,release,machine
Services • Struct servent * Getservbyname(const char *servname,const char *prototype) • Struct Servent has • char * S_name • char ** s_aliases • int port; • char * s_proto
References • Unix Network Programming, Volume I • W. Richard Stevens (Ch. 7). • TCP/IP Illustrated, Volume II • W. Richard Stevens