330 likes | 483 Views
Sockets. The building blocks of Internet communications. Motivation. We need to present programs with an abstraction of the network naming, rendezvous, service type and quality The abstraction implies an API (Application Programming Interface)
E N D
Sockets The building blocks of Internet communications
Motivation • We need to present programs with an abstraction of the network • naming, rendezvous, service type and quality • The abstraction implies an API (Application Programming Interface) • What happens if we let everyone define their own API? • tower of Babel • Independently developed programs can’t talk to one another • Code can’t be shared • Models can’t be understood and developed upon
Overview • What is a socket • Sockets undressed • Writing a server • Writing a client • Problems with sockets • Sockets and Java
Sockets • A socket is the "thing" that a program talks to when it talks to the network. • network endpoint • looks like a "file" • Sockets can be • sent upon, received from, created, bound to addresses, listened on, and connected to. • Sockets are a programming interface, NOT a protocol. • TCP/IP is a protocol. • We can talk TCP/IP by way of the sockets interface.
socket green-host red-host Two types of sockets • Sockets are a communication endpoint • "Stream" sockets provide reliable, sequenced, two-way communication. • telnet, www, ftp • "Datagram" sockets provide unreliable communication. • routing, mbone, bootp, some RPC services
Why use unreliable services? • An unreliable service can have a substantially lower implementation cost. • no connection state • US mail vs. the telephone • "best efforts" means never having to say you're sorry. • Often more effective to devise own reliability protocol on top of an unreliable protocol. • do nothing • retransmit • find an alternate route
All you need to know about networking • Key idea is Data Encapsulation • every network packet contains a header and some data • the header is protocol specific • the data is what we are really interested in sending • protocols are built by layering Ethernet An ethernet packet contains some data, which is an IP packet which contains some data which is a UDP packet, which contains some data which is a TFTP packet, which contains some data, which is our file. IP UDP Sockets work mostly at this level TFTP File data
Why use sockets? • Sockets abstract the network into something easier to think about. • The OS provides a socket interface • API for passing data over a socket. • The socket implementation takes care to encapsulate our data inside the appropriate packets • The UDP or TCP layer sends data between socket endpoints. • The IP layer is responsible for routing the data to the appropriate machine
Sockets Undressed • Although sockets are easier to use than the network, they still have a lot of grunge inside them. • addresses, ports, byte ordering, synchronization, • For the most part, we'll ignore this grunge when we start looking at the network from a Java perspective. • But, to really understand how they work, we need to take a quick peek with our C hats on.
Addresses • Necessary to associate a socket with an internet address. struct in_addr { unsigned long s_addr; /* 4 bytes */ }; struct sockaddr_in { short sin_family; unsigned short int sin_port; struct in_addr sin_addr; unsigned char sin_zero[8]; } When we want to tell the socket API about network addresses (endpoints), we fill in one of these structures.
Byte Ordering • Different computers use different representations for basic data types • big endian or little endian • the issue is which byte in a word is the "high order" byte • totally arbitrary and uninteresting • When different machines communicate, different interpretations can cause real problems. • Solution is to legislate a standard network byte ordering.
Network byte ordering • All data "seen" by the sockets layer and passed on to the network must be in network order • Programmer's responsibility to convert • shorts (2 byte values) and longs (4 byte values) • htons, htonl, ntohs, ntohs • Extremely error prone struct sockaddr_in s; /* WRONG */ s.sin_port = 23; /* RIGHT */ s.sin_port = htons(23);
Specifying Addresses • My machine is bershad-pc.cs.washington.edu • IP address is 128.95.4.109 • I want to talk to the ftp server there. • FTP listens at port 21 in the TCP domain. /* Pray that we get this right! */ struct sockaddr_in s; char bytes4[4]; s.sin_family = AF_INET; s.sin_port = htons(21); bytes4[0] = 128; bytes4[1] = 95; bytes4[2] = 4; bytes4[3] = 109; s.sin_addr.in_addr = htonl( *(unsigned long *)bytes4 ); bzero(&s.sin_zero, 8);
Some helper functions • inet_addr("128.95.4.109") • returns the in_addr (in network order) • char *inet_ntoa(ina.sin_addr) • returns the ascii address • Nevertheless, this stuff is extremely hard to get right.
Using Sockets • A socket is an OS resource • represents some context within the operating system kernel • looks much like a file descriptor • Before we can use one, we've got to make one. #include <sys/types.h> #include <sys/sockets.h> int s; /* our socket */ int domain = AF_INET; /* in which communication "domain" */ int type = SOCK_STREAM; /* datagram or stream? */ int protocol = 0; /* specific protocol, usually 0 */ s = socket(domain, type, protocol);
Binding an address to a socket • The socket() call returns an unbound endpoint. • it has no "internetness" to it. • We may need to bind the endpoint to a particular internet (IP, port) address #include <sys/types.h> #include <sys/sockets.h> struct sockaddr_in my_addr; int s = socket(AF_INET, SOCK_STREAM, 0); my_addr.sin_family = AF_INET; my_addr.sin_port = hton(1234); my_addr.sin_addr.s_addr = inet_addr("128.95.4.109"); bzero(&(my_addr.sin_zero), 8); bind(s, (struct sockaddr*)&my_addr, sizeof(my_addr)); ...
More on binding • We can let the OS decide where we really are • my_addr.sin_port = htons(0); • my_addr.sin_addr.s_addr = htonl(INADDR_ANY); • Ports below 1024 are "reserved" • must be a superuser in order to associate a local socket with a "small" port. • this is a very weak form of network security • assumption is that messages sent from a small port come from a privileged process • would you trust your bank account to this?? • Binding only necessary if you care about your address • OS can implicitly bind for you in some cases
Connecting to a socket • The connect(s, struct sockaddr *server, int addrlen) call lets us connect to a remote socket. #include <sys/types.h> #include <sys/socket.h> main() { int s = socket(AF_INET, SOCK_STREAM, 0); struct sockaddr_in dest; dest.sin_family = AF_INET; dest.sin_port = htons(789); dest.sin_addr.s_addr = inet_addr("128.95.4.109"); bzero(&dest.sin_addr.s_addr, 8); connect(s, (struct sockaddr*)&dest, sizeof(dest)); /* returns -1 if we fail */ }
Listening on a socket • A connect() request initiates communication with a peer. • We can only connect to a socket that someone is listening on. s = socket(...) bind(s, (struct sockaddr*)&my_addr)...); for (;;) { if (listen(s) == -1) error(); /* someone somewhere has "connected" to the address for s. */
Listening leads to acceptance • listen() tells us that there is someone calling. • But, we are under no obligation to accept • The accept() call says "ok, let's start talking." • communication happens over a NEW socket • the original socket is used for future connections s = socket(...) bind(s, (struct sockaddr*)&my_addr)...); for (;;) { struct sock_addr_in peer_addr; int peer_addrlen = sizeof(peer_addr); if (listen(s) == -1) error(); /* someone somewhere has "connected" to the address for s. */ int new_socket = accept(s, &peer_addr, &peer_addrlen); /* start communicating on peer_addr */ }
Stepping back s = socket(); bind(s, red_addr); listen(s); new_socket = accept(s, &peer); /* peer = blue_addr */ /* ready to xmit/recv over new_socket */ s = socket(); bind(s, blue_addr); connect(s, red_addr); /* ready to xmit/recv over s */
Sending and Receiving • Once connected, we send and receive data in the same way on both sides char *blueInfo = "I'm so blue"; int len; /* connect up... */ len = send(s, blueInfo, strlen(blueInfo), 0); char peerInfo[32]; int len; /* bind, listen, accept */ ... len = recv(new_socket, peerInfo, 32, 0); flags that we rarely use.
Datagrams are unconnected • Connection only required for stream sockets. • Key idea is that once connected, all data flows through same pair of endpoints. • With datagrams, each message sent must carry its own addressing information • simpler operating system state • more burden for the programmer • Two calls provided • sendto(int sockfd, char *msg, int len, int flags, sockaddr *to, int tolen); • recvfrom(int sockfd, char *msg, int *len, int flags, sockaddr *from, int *fromlen);
Shutting down • close(s) will shutdown a socket. • no more sends or receives • more graceful shutdown services are provided • shutdown(s, 0) • no more receives • shutdown(s, 1) • no more sends • shutdown(s, 2) • no more receives or sends (like close(s))
Who's Out There? • getpeername(s, &peer, &peerlen) • returns the sockaddr_in of the other end of a connected socket. • struct hostent gethostbyname("bershad-pc"); • return the internet address of a named host. • relies on the Domain Name System (DNS) server. • may involve hidden network communication
Writing a Server • Sockets provide the basis for client/server communication. • the server listens at at an internet address (host addr, port) • the client connects to the port • the client sends a request • the server sends a response • the server shuts the connection down.
A Simple Web Server #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> main() { int sockfd, client_socket; struct sockaddr_in my_addr; struct sockaddr_in peer_addr; sockfd = socket(AF_INET, SOCK_STREAM, 0); my_addr.sin_family = AF_INET; my_addr.sin_port = htons(8080); my_addr.sin_addr.s_addr = INADDR_ANY; bzero((my_addr.sin_zero), 8); bind(sockfd, (struct sockaddr *)&my_addr, sizeof(my_addr)); listen(sockfd, 5); for (;;) { int peer_size = sizeof(struct sockaddr_in); client_socket = accept(sockfd, (struct sockaddr*)&peer_addr, &sin_size); process_request(client_socket); } }
Processing the request process_request(int client_socket) { char reqBuf[64]; char *url; char urlData[4096]; int len; recv(client_socket, reqBuf, 64, 0)); if (strncmp(reqBuf, "GET", 3) == 0) { url = reqBuf + 4; len = readFile(url, urlData); send(client_socket, urlData, len, 0); } else send(client_socket, "Error", 5, 0); close(client_socket, 2); }
The Client Side #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> /* invoked as urlget host object */ main(char **argv, int argc) { int sockfd; struct sockaddr_in peer_addr; sockfd = socket(AF_INET, SOCK_STREAM, 0); char *msg = strcat("GET ", argv[2]); char urlData[4096]; int len; peer_addr.sin_family = AF_INET; peer_addr.sin_port = htons(8080); he = gethostbyname(argv[1]); peer_addr.sin_addr.s_addr = *((struct in_addr*)he->h_addr bzero((peer_addr.sin_zero), 8); connect(sockfd, (struct sockaddr*)&peer_addr, sizeof(peer_addr)); send(sockfd, msg, strlen(msg),0 ); len = recv(sockfd, urlData, 4096, 0); printf("Received %s\n", urlData); close(sockfd); }
Concurrency • In our example, the server can't accept any requests while it is processing for the client. • suppose it takes 100 ms to handle a request. • not unreasonable if we have to go do disk • at most, server can handle 10 requests per second. • would you buy such a web server?? • The solution is to allow the server to process requests concurrently • Simplest way is to have the server create a new "copy" of itself to handle each new request.
Concurrent requests • The fork() system call duplicates the server process • what's the bet here? s = accept(); fork(); if parent continue s = accept(); fork(); if parent continue else processRequest(s) This process continues to accept new requests. While this process handles the specific request on s.
Problems with Sockets • Even simple programs are difficult to write. • byte ordering, connection management, msg construction and deconstruction • lots of reliance on wierd C idioms • Difficult to distinguish server interface from implementation • connection protocol is "part" of the interface, but at a different level than what goes into the message • Network oriented • difficult to "optimize" communication for more efficient channels
Summary • Sockets are the OS's way to present the network to applications. • They are powerful but clunky. • We will see ways to create even more powerful, but less clunky distributed programming interfaces.