890 likes | 1.13k Views
Internet. L4: Internet. Last Lecture Re-cap: IP Internet. Based on the TCP/IP protocol family IP (Internet protocol) : Provides basic naming scheme (DNS) and unreliable delivery capability of packets (datagrams) from host-to-host UDP (Unreliable Datagram Protocol)
E N D
Internet L4: Internet
Last Lecture Re-cap: IP Internet • Based on the TCP/IP protocol family • IP (Internet protocol) : • Provides basic naming scheme (DNS) and unreliable delivery capabilityof packets (datagrams) from host-to-host • UDP (Unreliable Datagram Protocol) • Uses IP to provide unreliable datagram delivery from process-to-process • TCP (Transmission Control Protocol) • Uses IP to provide reliable byte streams from process-to-process over connections • Accessed via a mix of Unix file I/O and functions from the sockets interface
Hardware and Software Organization of an Internet Application Internet client host Internet server host Client Server User code Sockets interface (system calls) TCP/IP TCP/IP Kernel code Hardware interface (interrupts) Hardware and firmware Network adapter Network adapter Global IP Internet
Today's Lecture • TCP • Connection establishment, flow control, reliability, congestion control • I/O • Unix I/O, (custom) RIO, standard I/O
Establishing Connection:Three-Way handshake • Each side notifies other of starting sequence number it will use for sending • Why not simply chose 0? • Must avoid overlap with earlier incarnation • Security issues • Each side acknowledges other’s sequence number • SYN-ACK: Acknowledge sequence number + 1 • Can combine second SYN with first ACK SYN: SeqC ACK: SeqC+1 SYN: SeqS ACK: SeqS+1 Server Client
Tearing Down Connection • Either side can initiate tear down • Send FIN signal • “I’m not going to send any more data” • Other side can continue sending data • Half open connection • Must continue to acknowledge • Acknowledging FIN • Acknowledge last sequence number + 1 A B FIN, SeqA ACK, SeqA+1 Data ACK FIN, SeqB ACK, SeqB+1
Sender/Receiver State Sender Receiver Next expected Max acceptable Max ACK received Next seqnum … … … … Sender window Receiver window Sent & Acked Sent Not Acked Received & Acked Acceptable Packet OK to Send Not Usable Not Usable
Window Flow Control: Send Side Packet Received Packet Sent Source Port Dest. Port Source Port Dest. Port Sequence Number Sequence Number Acknowledgment Acknowledgment HL/Flags Window HL/Flags Window D. Checksum Urgent Pointer D. Checksum Urgent Pointer Options… Options... App write acknowledged sent to be sent outside window
TCP congestion control • Need to share network resources. • But neither the sender or the receiver knows how much b/w is available. • How much should we send?
TCP congestion control • Slow start • Increase the congestion window +1 for every ack. • Fast recovery • On detection of dropped packet (dup ack) reduce the congestion window by half. • Called multiplicative decrease • Congestion avoidance • Increase 1 every RTT • Called additive increase • Details in computer networks course
TCP Saw Tooth Behavior Congestion Window Timeouts may still occur Time Slowstart to pace packets Fast Retransmit and Recovery Initial Slowstart
Important Lessons • TCP state diagram setup/teardown • Making sure both sides end up in same state • TCP congestion control • Need to share some resources without knowing their current state • Good example of adapting to network performance • Sliding window flow control • Addresses buffering issues and keeps link utilized • Need to ensure that distributed resources that are known about aren’t overloaded
Today • TCP • I/O • Unix I/O • RIO (robust I/O) package • Standard I/O
Unix Files • A Unix file is a sequence of m bytes: • B0 , B1 , .... , Bk , .... , Bm-1 • All I/O devices are represented as files: • /dev/sda2(/usrdisk partition) • /dev/tty2(terminal) • Even the kernel is represented as a file: • /dev/kmem(kernel memory image) • /proc(kernel data structures)
Unix File Types • Regular file • File containing user/app data (binary, text, whatever) • OS does not know anything about the format • other than “sequence of bytes”, akin to main memory • Directory file • A file that contains the names and locations of other files • Character special and block special files • Terminals (character special) and disks (block special) • FIFO (named pipe) • A file type used for inter-process communication • Socket • A file type used for network communication between processes
Unix I/O • Key Features • Elegant mapping of files to devices allows kernel to export simple interface called Unix I/O • Important idea: All input and output is handled in a consistent and uniform way • Basic Unix I/O operations (system calls): • Opening and closing files • open()and close() • Reading and writing a file • read()and write() • Changing the current file position(seek) • indicates next offset into file to read or write • lseek() Bk+1 B0 B1 • • • Bk-1 Bk • • • Current file position = k
Opening Files • Opening a file informs the kernel that you are getting ready to access that file • Returns a small identifying integer file descriptor • fd == -1indicates that an error occurred • Each process created by a Unix shell begins life with three open files associated with a terminal: • 0: standard input • 1: standard output • 2: standard error intfd; /* file descriptor */ if ((fd = open("/etc/hosts", O_RDONLY)) < 0) { perror("open"); exit(1); }
Closing Files • Closing a file informs the kernel that you are finished accessing that file • Closing an already closed file is a recipe for disaster in threaded programs (more on this later) • Moral: Always check return codes, even for seemingly benign functions such as close() int fd; /* file descriptor */ int retval; /* return value */ if ((retval = close(fd)) < 0) { perror("close"); exit(1); }
Reading Files • Reading a file copies bytes from the current file position to memory, and then updates file position • Returns number of bytes read from file fd into buf • Return type ssize_t is signed integer • nbytes < 0indicates that an error occurred • Short counts(nbytes < sizeof(buf)) are possible and are not errors! char buf[512]; int fd; /* file descriptor */ int nbytes; /* number of bytes read */ /* Open file fd ... */ /* Then read up to 512 bytes from file fd */ if ((nbytes = read(fd, buf, sizeof(buf))) < 0) { perror("read"); exit(1); }
Writing Files • Writing a file copies bytes from memory to the current file position, and then updates current file position • Returns number of bytes written from buf to file fd • nbytes < 0indicates that an error occurred • As with reads, short counts are possible and are not errors! char buf[512]; int fd; /* file descriptor */ int nbytes; /* number of bytes read */ /* Open the file fd ... */ /* Then write up to 512 bytes from buf to file fd */ if ((nbytes = write(fd, buf, sizeof(buf)) < 0) { perror("write"); exit(1); }
Simple Unix I/O example • Copying standard in to standard out, one byte at a time #include "csapp.h" intmain(void) { char c; while(Read(STDIN_FILENO, &c, 1) != 0) Write(STDOUT_FILENO, &c, 1); exit(0); } cpstdin.c Note the use of error handling wrappers for read and write (Appendix A).
Dealing with Short Counts • Short counts can occur in these situations: • Encountering (end-of-file) EOF on reads • Reading text lines from a terminal • Reading and writing network sockets or Unix pipes • Short counts never occur in these situations: • Reading from disk files (except for EOF) • Writing to disk files • One way to deal with short counts in your code: • Use the RIO (Robust I/O) package from your textbook’s csapp.cfile (Appendix B)
Today • TCP • I/O • Unix I/O • RIO (robust I/O) package • Standard I/O
The RIO Package • RIO is a set of wrappers that provide efficient and robust I/O in apps, such as network programs that are subject to short counts • RIO provides two different kinds of functions • Unbuffered input and output of binary data • rio_readn and rio_writen • Buffered input of binary data and text lines • rio_readlineb and rio_readnb • Buffered RIO routines are thread-safe and can be interleaved arbitrarily on the same descriptor • Download from http://csapp.cs.cmu.edu/public/code.html src/csapp.cand include/csapp.h
Unbuffered RIO Input and Output • Same interface as Unix read and write • Especially useful for transferring data on network sockets • rio_readnreturns short count only if it encounters EOF • Only use it when you know how many bytes to read • rio_writennever returns a short count • Calls to rio_readnand rio_writencan be interleaved arbitrarily on the same descriptor #include "csapp.h" ssize_trio_readn(intfd, void *usrbuf, size_t n); ssize_trio_writen(intfd, void *usrbuf, size_t n); Return: num. bytes transferred if OK,0 on EOF (rio_readn only), -1 on error
Implementation of rio_readn /* * rio_readn - robustly read n bytes (unbuffered) */ ssize_trio_readn(intfd, void *usrbuf, size_t n) { size_tnleft = n; ssize_tnread; char *bufp = usrbuf; while (nleft > 0) { if ((nread = read(fd, bufp, nleft)) < 0) { if (errno == EINTR) /* interrupted by sig handler return */ nread = 0; /* and call read() again */ else return -1; /* errno set by read() */ } else if (nread == 0) break; /* EOF */ nleft -= nread; bufp += nread; } return (n - nleft); /* return >= 0 */ } csapp.c
Buffered I/O: Motivation • Applications often read/write one character at a time • getc, putc, ungetc • gets, fgets • Read line of text on character at a time, stopping at newline • Implementing as Unix I/O calls expensive • read and write require Unix kernel calls • > 10,000 clock cycles • Solution: Buffered read • Use Unix read to grab block of bytes • User input functions take one byte at a time from buffer • Refill buffer when empty already read unread Buffer
Buffered I/O: Implementation • For reading from file • File has associated buffer to hold bytes that have been read from file but not yet read by user code • Layered on Unix file: rio_cnt already read unread Buffer rio_buf rio_bufptr Buffered Portion not in buffer already read unread unseen Current File Position
Buffered I/O: Declaration • All information contained in struct rio_cnt already read unread Buffer rio_buf rio_bufptr typedefstruct { intrio_fd; /* descriptor for this internal buf */ intrio_cnt; /* unread bytes in internal buf */ char *rio_bufptr; /* next unread byte in internal buf */ char rio_buf[RIO_BUFSIZE]; /* internal buffer */ } rio_t;
Buffered RIO Input Functions • Efficiently read text lines and binary data from a file partially cached in an internal memory buffer • rio_readlineb reads a text line of up to maxlen bytes from file fd and stores the line in usrbuf • Especially useful for reading text lines from network sockets • Stopping conditions • maxlen bytes read • EOF encountered • Newline (‘\n’) encountered #include "csapp.h" void rio_readinitb(rio_t *rp, intfd); ssize_trio_readlineb(rio_t *rp, void *usrbuf, size_tmaxlen); Return: num. bytes read if OK, 0 on EOF, -1 on error
Buffered RIO Input Functions (cont) • rio_readnb reads up to n bytes from file fd • Stopping conditions • maxlen bytes read • EOF encountered • Calls to rio_readlineb and rio_readnb can be interleaved arbitrarily on the same descriptor • Warning: Don’t interleave with calls to rio_readn #include "csapp.h" void rio_readinitb(rio_t *rp, intfd); ssize_trio_readlineb(rio_t *rp, void *usrbuf, size_tmaxlen); ssize_trio_readnb(rio_t *rp, void *usrbuf, size_t n); Return: num. bytes read if OK, 0 on EOF, -1 on error
RIO Example • Copying the lines of a text file from standard input to standard output #include "csapp.h" int main(int argc, char **argv) { int n; rio_t rio; char buf[MAXLINE]; Rio_readinitb(&rio, STDIN_FILENO); while((n = Rio_readlineb(&rio, buf, MAXLINE)) != 0) Rio_writen(STDOUT_FILENO, buf, n); exit(0); } cpfile.c
Standard I/O Functions • The C standard library (libc.so) contains a collection of higher-level standard I/O functions • Documented in Appendix B of K&R. • Examples of standard I/O functions: • Opening and closing files (fopen and fclose) • Reading and writing bytes (fread and fwrite) • Reading and writing text lines (fgets and fputs) • Formatted reading and writing (fscanf and fprintf)
Standard I/O Streams • Standard I/O models open files as streams • Abstraction for a file descriptor and a buffer in memory. • Similar to buffered RIO • C programs begin life with three open streams (defined in stdio.h) • stdin (standard input) • stdout (standard output) • stderr (standard error) #include <stdio.h> extern FILE *stdin; /* standard input (descriptor 0) */ extern FILE *stdout; /* standard output (descriptor 1) */ extern FILE *stderr; /* standard error (descriptor 2) */ int main() { fprintf(stdout, "Hello, world\n"); }
Buffering in Standard I/O • Standard I/O functions use buffered I/O • Buffer flushed to output fd on “\n” or fflush() call printf("h"); printf("e"); printf("l"); printf("l"); printf("o"); buf printf("\n"); h e l l o \n . . fflush(stdout); write(1, buf, 6);
Standard I/O Buffering in Action • You can see this buffering in action for yourself, using the always fascinating Unix strace program: #include <stdio.h> int main() { printf("h"); printf("e"); printf("l"); printf("l"); printf("o"); printf("\n"); fflush(stdout); exit(0); } linux> strace ./hello execve("./hello", ["hello"], [/* ... */]). ... write(1, "hello\n", 6) = 6 ... exit_group(0) = ?
Today • TCP • Unix I/O • RIO (robust I/O) package • Standard I/O • Conclusions
Unix I/O vs. Standard I/O vs. RIO • Standard I/O and RIO are implemented using low-level Unix I/O • Which ones should you use in your programs? fopen fdopen fread fwrite fscanf fprintf sscanf sprintf fgets fputs fflush fseek fclose C application program rio_readn rio_writen rio_readinitb rio_readlineb rio_readnb Standard I/O functions RIO functions open read write lseek stat close Unix I/O functions (accessed via system calls)
Choosing I/O Functions • General rule: use the highest-level I/O functions you can • Many C programmers are able to do all of their work using the standard I/O functions • When to use standard I/O • When working with disk or terminal files • When to use raw Unix I/O • Inside signal handlers, because Unix I/O is async-signal-safe. • In rare cases when you need absolute highest performance. • When to use RIO • When you are reading and writing network sockets. • Avoid using standard I/O on sockets.
A Programmer’s View of the Internet • Hosts are mapped to a set of 32-bit IP addresses • 128.2.203.179 • The set of IP addresses is mapped to a set of identifiers called Internet domain names • 128.2.203.179 is mapped to www.cs.cmu.edu • A process on one Internet host can communicate with a process on another Internet host over a connection
IP Addresses • 32-bit IP addresses are stored in an IP address struct • IP addresses are always stored in memory in network byte order (big-endian byte order) • True in general for any integer transferred in a packet header from one machine to another. • E.g., the port number used to identify an Internet connection. /* Internet address structure */ structin_addr { unsigned ints_addr; /* network byte order (big-endian) */ }; • Useful network byte-order conversion functions (“l” = 32 bits, “s” = 16 bits) • htonl: convert uint32_t from host to network byte order • htons: convert uint16_t from host to network byte order • ntohl: convert uint32_t from network to host byte order • ntohs: convert uint16_t from network to host byte order
Dotted Decimal Notation • By convention, each byte in a 32-bit IP address is represented by its decimal value and separated by a period • IP address:0x8002C2F2 = 128.2.194.242 • Functions for converting between binary IP addresses and dotted decimal strings: • inet_aton:dotted decimal string → IP address in network byte order • inet_ntoa:IP address in network byte order → dotted decimal string • “n” denotes network representation • “a” denotes application representation
IP Address Structure • IP (V4) Address space divided into classes: • Network ID Written in form w.x.y.z/n • n = number of bits in host address • E.g., CMU written as 128.2.0.0/16 • Class B address • Unrouted (private) IP addresses: 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 0 1 2 3 8 16 24 31 0 Class A Net ID Host ID 1 0 Class B Net ID Host ID 1 1 0 Class C Net ID Host ID 1 1 1 0 Class D Multicast address 1 1 1 1 Class E Reserved for experiments
Internet Domain Names unnamed root .net .edu .gov .com First-level domain names Second-level domain names mit cmu berkeley amazon Third-level domain names cs ece www 2007.171.166.252 ics sp greatwhite 128.2.220.10 i386-f7 128.2.200.47
Domain Naming System (DNS) • The Internet maintains a mapping between IP addresses and domain names in a huge worldwide distributed database called DNS • Conceptually, programmers can view the DNS database as a collection of millions of host entry structures: • Functions for retrieving host entries from DNS: • gethostbyname: query key is a DNS domain name. • gethostbyaddr: query key is an IP address. /* DNS host entry structure */ structhostent { char *h_name; /* official domain name of host */ char **h_aliases; /* null-terminated array of domain names */ inth_addrtype; /* host address type (AF_INET) */ inth_length; /* length of an address, in bytes */ char **h_addr_list; /* null-terminated array of in_addrstructs */ };
Properties of DNS Host Entries • Each host entry is an equivalence class of domain names and IP addresses • Each host has a locally defined domain name localhost which always maps to the loopback address127.0.0.1 • Different kinds of mappings are possible: • Simple case: one-to-one mapping between domain name and IP address: • greatwhile.ics.cs.cmu.edu maps to 128.2.220.10 • Multiple domain names mapped to the same IP address: • eecs.mit.eduandcs.mit.eduboth map to18.62.1.6 • Multiple domain names mapped to multiple IP addresses: • google.commaps to multiple IP addresses • Some valid domain names don’t map to any IP address: • for example: ics.cs.cmu.edu
A Program That Queries DNS int main(intargc, char **argv) { /* argv[1] is a domain name */ char **pp; /* or dotted decimal IP addr */ structin_addraddr; structhostent *hostp; if (inet_aton(argv[1], &addr) != 0) hostp = Gethostbyaddr((const char *)&addr, sizeof(addr), AF_INET); else hostp = Gethostbyname(argv[1]); printf("official hostname: %s\n", hostp->h_name); for (pp = hostp->h_aliases; *pp != NULL; pp++) printf("alias: %s\n", *pp); for (pp = hostp->h_addr_list; *pp != NULL; pp++) { addr.s_addr = ((structin_addr *)*pp)->s_addr; printf("address: %s\n", inet_ntoa(addr)); } }
Using DNS Program linux> ./dns greatwhite.ics.cs.cmu.edu official hostname: greatwhite.ics.cs.cmu.edu address 128.2.220.10 linux> ./dns 128.2.220.11 official hostname: ANGELSHARK.ICS.CS.CMU.EDU address: 128.2.220.11 linux> ./dns www.google.com official hostname: www.l.google.com alias: www.google.com address: 72.14.204.99 address: 72.14.204.103 address: 72.14.204.104 address: 72.14.204.147 linux> dig +short -x 72.14.204.103 iad04s01-in-f103.1e100.net.
Querying DIG • Domain Information Groper (dig) provides a scriptable command line interface to DNS linux> dig +short greatwhite.ics.cs.cmu.edu 128.2.220.10 linux> dig +short -x 128.2.220.11 ANGELSHARK.ICS.CS.CMU.EDU. linux> dig +short google.com 72.14.204.104 72.14.204.147 72.14.204.99 72.14.204.103 linux> dig +short -x 72.14.204.103 iad04s01-in-f103.1e100.net.