300 likes | 315 Views
CIS5930 Internet Computing. Advanced TCP/IP Programming Part 1 Prof. Robert van Engelen. Role of TCP/IP. Lots of networks Ethernet, FDDI, Wi-Fi, … Each physical network has its own technology-dependent communication interface
E N D
CIS5930Internet Computing Advanced TCP/IP ProgrammingPart 1 Prof. Robert van Engelen
Role of TCP/IP • Lots of networks • Ethernet, FDDI, Wi-Fi, … • Each physical network has its own technology-dependent communication interface • TCP/IP provides communication services that run between the programming interface of a physical network and user applications • Routers are machines that transport data packets from one network to the other • Each host machine (router/computer) on the network is identified with an IP address CIS 5930 Fall 2006 - TCP/IP
IP Address • IPv4 address is a 32 bit integer:net-id { sub-net-id } host-id • IP numbers are often represented in octet form:144.174.137.110 CIS 5930 Fall 2006 - TCP/IP
The TCP/IP Protocol Layers • OSI reference model has 7 layers • For Internet, 4 layers are relevant • Application layer:applications at endpoints communicate over sockets connected to ports, e.g. FTP, telnet, SMTP, HTTP, … • Transport layer:provides end-to-end data transfer with TCP (reliable) and UDP (unreliable datagram) protocols • Internetwork layer:internet protocol (IP) layer provides a (virtual) network image of the Internet • Network interface layer:interface to the actual network hardware CIS 5930 Fall 2006 - TCP/IP
Domain Names • Physical addressing: IP address of a host • Logical addressing: a domain name of a host (or hosts) • The domain namespace is hierarchical (organizational) CIS 5930 Fall 2006 - TCP/IP
Domain Name Servers • The mapping of names to addresses consists of independent, cooperative systems called name servers • A user program issues a request, such as getaddrinfo() • The resolver formulates a query to the name server • The name server performs the following steps: • If the answer is in its local authoritative database or cache, then return it to the client; • else query other available name server(s), starting down from the root of the DNS tree or as high up the tree as possible • The user program will finally be given a corresponding IP address (or host name, depending on the query) or an error CIS 5930 Fall 2006 - TCP/IP
Stub Resolver • A stub resolver forwards the queries to a name server for processing • Responses are cached by the name server, but not usually by the resolver, although this is implementation-dependent • On UNIX, the stub resolver is implemented by two library routines, getaddrinfo() and getnameinfo(), for converting host names to IP addresses and vice versa CIS 5930 Fall 2006 - TCP/IP
Mapping Domain Name to IP • From name to IP address • Note: obsolete gethostbyname() is often used but is not thread safe int getaddrinfo(const char *node, const char *service, const struct addrinfo *hints, struct addrinfo **res) void freeaddrinfo(struct addrinfo *res) const char *gai_strerror(int ecode) struct addrinfo { int ai_flags; // AI_PASSIVE, AI_CANONNAME, AI_NUMERICHOST int ai_family; // PF_xxx, e.g. PF_UNSPEC, PF_INET int ai_socktype; // SOCK_xxx, e.g. SOCK_STREAM, SOCK_DGRAM int ai_protocol; // 0 or IPPROTO_xxx for IPv4 and IPv6 size_t ai_addrlen; // length of ai_addr struct sockaddr *ai_addr; // binary address char *ai_canonname; // canonical name for nodename struct addrinfo *ai_next; // next structure in linked list }; CIS 5930 Fall 2006 - TCP/IP
Mapping Address to Name • From IP address to host and port (service) info int getnameinfo(const struct sockaddr *sa, socklen_t salen, char *host, size_t hostlen, char *serv, size_t servlen, int flags) Where flags is NI_NUMERICHOSTFind IP numeric host NI_NUMERICSERVFind numeric serv value NI_NAMEREQDHost name must be in DNS NI_DGRAMCheck for UDP service CIS 5930 Fall 2006 - TCP/IP
Example #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/socket.h> #include <netdb.h> int main(int argc, char **argv) { struct addrinfo hints, *res, *r; int err; char host[NI_MAXHOST], serv[NI_MAXSERV]; if (argc < 3) exit(1); // argv[1] is host, argv[2] is serv memset(&hints, 0, sizeof(hints)); // clear the hints hints.ai_family = PF_UNSPEC; // PF_INET for accepting TCP/UDP only hints.ai_socktype = SOCK_STREAM; // SOCK_DGRAM for UDP if ((err = getaddrinfo(argv[1], argv[2], &hints, &res))) fprintf(stderr, "Error: %s\n", gai_strerror(err)); else { for (r = res; r; r = r->ai_next) // iterate over the results { if (!getnameinfo(r->ai_addr, r->ai_addrlen, host, sizeof(host), serv, sizeof(serv), NI_NUMERICHOST | NI_NUMERICSERV)) printf("IP=%s port=%s\n", host, serv); } freeaddrinfo(res); // free list of results } } CIS 5930 Fall 2006 - TCP/IP
Find the Common Port Numbers Ports < 1024 are owned by root CIS 5930 Fall 2006 - TCP/IP
TCP and UDP Communications • TCP/IP built on top of IP • IP is point to point, packed based • Unreliable (network failures, buffers fill up, etc.) • Dynamic routing (packet order may be lost, packets can be duplicated) • TCP/UDP addressing using host (IP address) and port numbers • TCP • Connection-oriented (note: IP is still packed based) • Reliable with internal flow control • Byte stream (packets put together to form a stream) • UDP • Connectionless • Unreliable • Datagram (packed based) CIS 5930 Fall 2006 - TCP/IP
Establishing a TCP Connection: Initial State • TCP is a peer-to-peer connection oriented protocol • Flow control provides reliable stream-based communications between peers • Initially all peers are the same • Applications can choose at any time to act as server or client • A server does a passive open on a port and waits for a client to connect • A client connects by performing an active open on the server’s port • The server program receives a request, performs the service operations, and returns a reply to the client Image provided by Alan Dix CIS 5930 Fall 2006 - TCP/IP
Establishing a TCP Connection: Passive Open • Server process does a passive open on a selected service port, and waits … • Informs the TCP layer which process to connect to when a client is requesting service • No communications yet Image provided by Alan Dix CIS 5930 Fall 2006 - TCP/IP
Establishing a TCP Connection: Active Open • Client performs an active open • Client process can be on the same or a different machine • A (hidden) port number at the client side is generated • TCP layer sends a message from client to server to request a connection Image provided by Alan Dix CIS 5930 Fall 2006 - TCP/IP
Establishing a TCP Connection: Rendezvous • Server side accepts the connection • A bi-directional byte stream is established • The server port is not consumed • The port stays available to accept more connections Image provided by Alan Dix CIS 5930 Fall 2006 - TCP/IP
Establishing a TCP Connection: Servicing • Server provides service to the client • One request and one response message • Sequence of request/response message pairs • Both close the connection when done • More clients can connect to the same port • Server forks or uses threads to serve multiple clients concurrently Image provided by Alan Dix CIS 5930 Fall 2006 - TCP/IP
Establishing a TCP Connection: What’s Next? • Looks easy, but in reality there is a lot more to worry about • Details of the socket API • Errors, interrupts, and signals (must check function returns, SIGPIPE) • Enable I/O timeouts (no timeout: peer can hold connection indefinitely) • Blocking or non-blocking I/O (synchronous/asynchronous?) • Multi-threaded client/server (handle multiple clients concurrently) • Thread safety (check safety of API calls, lock all shared data) • Memory leaks (free data, also when errors occur) • Which application protocols to use? • Communication performance and quality of service • Security • Message encryption, access permissions, no buffer overruns, peer authentication, non-repudiation, and principle of least privilege • Connecting through firewalls and proxies • Connection persistence CIS 5930 Fall 2006 - TCP/IP
Server Init (IPv4/IPv6,Win/UNIX) • For Windows, start WSA service (use winsock2.h):WSADATA w;if (WSAStartup(MAKEWORD(1, 1), &w)) … handle error … • Get host info:int srv_sk, clt_sk, set = 1;struct addrinfo hints, *res = NULL;memset(hints, 0, sizeof(hints));hints.ai_family = PF_UNSPEC;hints.ai_socktype = SOCK_STREAM;hints.ai_flags = AI_PASSIVE;// Set host = NULL for current host, serv indicates service portif (getaddrinfo(host, serv, &hints, &res) || !res) … handle error … • Create a socket and set socket options:srv_sk = socket(res->ai_family, res->ai_socktype, res->ai_protocol);setsockopt(srv_sk, SOL_SOCKET, SO_REUSEADDR, &set, sizeof(int));//DANGER! • Bind the socket and set listen parameters for TCPif (bind(srv_sk, res.ai_addr, res.ai_addrlen)) … handle error …if (res->ai_socktype == SOCK_STREAM) listen(srv_sk, 100); CIS 5930 Fall 2006 - TCP/IP
Server Loop • Server loop: • Accept and wait on a client to connect:struct sockaddr addr;int addrlen = sizeof(addr);if ((clt_sk = accept(srv_sk, &addr, &addrlen)) == -1) … error … • If concurrent server: fork or delegate the work below to a thread • Get request message from client socket by repeatedly invoking read:ret = read(clt_sk, buf, len);if (ret < 0) … handle error and close socket …if (ret == 0) … client sent EOF, server may still be able to send …// got ret bytes in buf[], where ret <= len, so may have to repeat • Perform service • Send response to client socket by repeatedly invoking write:// got len bytes in buf[] to sendret = write(clt_sk, buf, len);if (ret < 0) … handle error, close socketif (ret == 0) … could not send, client closed the socket …// sent ret bytes from buf[], where ret <= len, may have to repeat • Close the client socket:close(clt_sk); // use closesocket(clt_sk) on windows CIS 5930 Fall 2006 - TCP/IP
Client Init (IPv4/IPv6,Win/UNIX) • For Windows, start WSA service (use winsock2.h):WSADATA w;if (WSAStartup(MAKEWORD(1, 1), &w)) … handle error … • Get host info:int clt_sk;struct addrinfo hints, *r, *res = NULL;memset(hints, 0, sizeof(hints));hints.ai_family = PF_UNSPEC;hints.ai_socktype = SOCK_STREAM;// host to connect to, serv indicates service portif (getaddrinfo(host, serv, &hints, &res) || !res) … handle error …for (r = res; r; r = r->ai_next){ clt_sk = socket(r->ai_family, r->ai_socktype, r->ai_protocol); if (clt_sk != -1) { if (!connect(clt_sk, r->ai_addr, r->ai_addrlen)) break;close(clt_sk); clt_sk = -1; }}if (clt_sk == -1) … handle error … CIS 5930 Fall 2006 - TCP/IP
Client Body • Send request by repeatedly invoking write// got len bytes in buf[] to sendret = write(clt_sk, buf, len);if (ret < 0) … handle error, close socketif (ret == 0) … could not send, server closed the socket …// sent ret bytes from buf[], where ret <= len, may have to repeatif (no more sends) shutdown(clt_sk, 1); // send EOF • Get server response by repeatedly invoking readret = read(clt_sk, buf, len);if (ret < 0) … handle error and close socket …if (ret == 0) … server sent EOF// got ret bytes in buf[], where ret <= len, so may have to repeat • Use close to closeclose(clt_sk); // use closesocket(clt_sk) on windows CIS 5930 Fall 2006 - TCP/IP
Send/Recv or Write/Read? • UDP must use: • ssize_t send(int sk, const void *buf, size_t len, int flags)sk socket descriptorbuf byte data to sendlen length of data to sendflags see man pagesreturns number of bytes actually send, 0 on EOF, or <0 error • ssize_t recv(int sk, void *buf, size_t len, int flags)sk socket descriptorbuf buffer to store byte datalen max length of bufferflags see man pagesreturns number of bytes actually received, 0 on EOF, or <0 error • TCP may also use: • ssize_t write(int sk, const void *buf, size_t len) • ssize_t read(int sk, void *buf, size_t len) CIS 5930 Fall 2006 - TCP/IP
Read and Write with Sockets • Sockets are bi-directional • How to do EOF without closing the other direction?shutdown(clt_sk, dir)where dir=0 stop reads dir=1 stop sends dir=2 stop both • Reading may block • Wait until data received ret > 0 • Connection closed ret = 0 • Network error ret < 0 • Writing may block • Send to the network may block until buffers available ret > 0 • Connection closed ret = 0 • Network error ret < 0 CIS 5930 Fall 2006 - TCP/IP
IP Fragmentation • Sending lots of small packets degrades performance • Internet is collection of heterogeneous networks that may use different packet sizes determined by MTU (max transmission unit) • TCP divides data up, limit is UNIX read/write buffers • When a fragment is lost, whole datagram is lost • TCP flow control uses exponential back-off to avoid rapid resent when congestion occurs • When using write or send • Make the buffer large, >64Kb • Probably not too large: you may waste time filling it before sending it off • Use setsockopt to change internal buffer size • int len = …;setsockopt(sk, SOL_SOCKET, SO_SNDBUF, (char*)&len, sizeof(int));setsockopt(sk, SOL_SOCKET, SO_RCVBUF, (char*)&len, sizeof(int)); • Systems can behave differently • Works in testing, fails in production CIS 5930 Fall 2006 - TCP/IP
TCP Read/Write Wrappers • Blocking write to send full buffer content:int write_sk(int sk, const char *buf, int len){ int ret; do { ret = write(sk, buf, len); buf += ret; len -= ret; } while (ret > 0 && len > 0); return ret;} • Blocking read to receive data until buffer full or EOF:int read_sk(int sk, const char *buf, int len){ int ret; do { ret = read(sk, buf, len); buf += ret; len -= ret; } while (ret > 0 && len > 0); return ret;} CIS 5930 Fall 2006 - TCP/IP
Non-Blocking Read/Write • Setting socket to non-blocking state (UNIX):#include <fcntl.h>fcntl(sk, F_SETFL, fcntl(sk, F_GETFL)|O_NONBLOCK); • Windows:#include <io.h>#include <fcntl.h>u_long nonblocking = 1;ioctlsocket(sk, FIONBIO, &nonblocking); • Read/write on sk don’t block and may return 0 (!=EOF)ret = read(sk, buf, len);if (ret > 0) … got data …if (ret == 0) … EOF …if (ret < 0 && errno != EINTR && errno != EAGAIN) … error …if (ret < 0) … interrupted or “would block”: try again if needed …ret = write(sk, buf, len);if (ret > 0) … data written …if (ret <= 0 && errno != EINTR && errno != EAGAIN) … EOF or error …if (ret <= 0) … interrupted or “would block”: try again if needed … CIS 5930 Fall 2006 - TCP/IP
Select • ret = select(fdnum, &rdfds, &wrfds, &errfds, &timeout)wherefdnum number of fds in bitmaps to checkrdfds bitmap of fds to check for inputwrfds bitmap of fds to check for outputerrfds bitmap of fds to check for errorstimeout timeout in seconds and microsecondsret > 0 one of input, output, or error fds is readyret = 0 timeout expiredret < 0 signal was caught or other error • Testing if input on sk is ready to be consumed:struct timeval timeout;fd_set rdfds;timeout.tv_sec = …;timeout.tv_usec = …;FD_ZERO(&rdfds);FD_SET(sk, &rdfds);ret = select(sk + 1, &rdfds, NULL, NULL, &timeout);if (FD_ISSET(sk, &rdfds)) // … input is ready on sk CIS 5930 Fall 2006 - TCP/IP
Select (cont’d) • Beware: fd_set is a bitmap of fixed size, so fdnum should not exceed FD_SETSIZE on UNIX/Linux (typically 1024) • Beware: select before write: • If state changes between select and write, write may block • Use non-blocking write • Beware: when using timeouts, always set timeout value before select, since select updates the timeout • Remember: you can use multiple fds with select • You can also use select to check if accept is ready • Use select for servers to prevent indefinite blocking when clients are non-responsive • Use select when listening for multiple events in a single thread CIS 5930 Fall 2006 - TCP/IP
Signals • UNIX signals may occur during read, write, and select • For example, SIGPIPE raised when peer goes down • Signal not caught: abort • Define signal handler, for example:#include <signal.h>signal(my_handler, SIGPIPE);signal(my_handler, SIGINTR);void my_handler(int x) { … } • When signal is caught, read, write, and select return error (< 0) and set errno to EINTR • GetLastError() == WSAEINTR on Windows • Some systems support SO_NOSIGPIPE socket option:int set = 1;setsockopt(sk, SOL_SOCKET, SO_NOSIGPIPE, (char*)&set, sizeof(int)); CIS 5930 Fall 2006 - TCP/IP