410 likes | 499 Views
Rensselaer Polytechnic Institute CSCI-4220 – Network Programming David Goldschmidt, Ph.D. Sockets and HTTP {week 4 }. Protocols. A protocol is an agreed-upon convention that defines how communication occurs between two (or more?) endpoints
E N D
Rensselaer Polytechnic Institute CSCI-4220 – Network Programming David Goldschmidt, Ph.D. Sockets and HTTP{week 4}
Protocols • A protocol is an agreed-upon convention that defines how communication occurs between two (or more?) endpoints • All endpoints must “understand” andcorrectly implement the protocol • Protocols must be formally defined, unambiguous, and well-documented • Protocols should address error conditionsand unexpected scenarios
Network APIs • Network APIs provide the bridge between applications and protocol software • Services are made available (often by the OS) Application Network API (via the OS) But also OS-independent! Protocol Y Protocol Z Protocol X
Generic programming interface • Network API oftenprovides a generic programming interface: • Support for multiple communicationprotocol suites/families (e.g. TCP, UDP, IP) • Endpoint address representation independence • Network data types (for portability) • With from-host and to-host conversion functions • e.g. htons(), ntohs(), htonl(), ntohl(), etc. Application Network API (via the OS) Protocol Y Protocol Z Protocol X
Network API functions • Functions providedby the Network API include: • Specifying communication endpoints • Initiating a connection (e.g. for TCP) • Waiting for incoming connections • Sending and receiving messages • Terminating a connection • Error detection and handling Application Network API (via the OS) Protocol Y Protocol Z Protocol X
listener socket 128.113.2.68:80 server socket 128.113.2.68:9500 client socket 66.195.8.137:21202 Sockets • A socket is an endpoint for communication • Communication takes place over a pair of sockets <ip-address>:<port>
Pitfalls of using sockets • Pitfalls of socket-based communication between client and server include: • Once a server binds to a port, noother server may bind to that port • If client and server do not obeythe rules of the protocol,errors may occur • Client/server communicationmust often be synchronized
File descriptors • Each process has a file descriptor table (maintained by the operating system) • This table is inheritedfrom the parent process • Defaults to stdin,stdout, and stderr • When open() or socket() is called,the next available descriptor is assigned
Socket descriptors • Socket descriptors are used to keep trackof open socket connections Family: PF_INET Service: SOCK_STREAM Local IP: 123.113.12.34 Local Port: 35029 Remote IP: 66.195.43.21 Remote Port: 44287
Socket protocol families • The domain parameter of socket() specifies the protocol family • PF_INET: IPv4 Internet protocols • PF_INET6: IPv6 Internet protocols • PF_UNIX / PF_LOCAL: Local communication • etc.
Socket service types • The type parameter of socket() specifies the communication semantics • SOCK_STREAM: Connection-oriented, sequenced, reliable, two-way communicationof byte streams • SOCK_DGRAM: Connectionless, unreliable communication of datagrams (messages offixed length)
Socket options • Use get/setsockopt() to manage options on an existing socket • SO_ACCEPTCONN • SO_BROADCAST • SO_DONTROUTE • SO_ERROR • SO_KEEPALIVE • SO_LINGER man 7 socket
HTTP • HTTP is the protocol for communication between browser apps and Web servers • Web servers are essentially HTTP servers • Protocols have versions • Most clients and servers support version 1.1 • But 1.0 is also in use (maybe also 0.9?!) why?
P Internet messages • Each layer prepends or appends its information in a header or trailer HTTP Request TCP Hdr | HTTP Request IP Hdr | TCP Hdr | HTTP Request Ethernet Hdr | IP Hdr | TCP Hdr | HTTP Request | Cksum
Q P Interprocess communication
A few relevant RFCs • RFC 1945 is the HTTP 1.0 standard • see http://www.ietf.org/rfc/rfc1945.txt • RFC 2616 is the HTTP 1.1 standard • see http://www.ietf.org/rfc/rfc2616.txt • RFC 2396 is the URI standard • see http://www.ietf.org/rfc/rfc2396.txt
What is HTTP? (i) • From the RFC: • HTTP is an application-levelprotocol with the lightnessand speed necessary fordistributed, hypermediainformation systems
What is HTTP? (ii) • Again from the RFC: • HTTP communication generally takes placeover TCP/IP connections • The default port is TCP 80,but other ports can be used • HTTP is not dependent ona specific transport layer https is typically TCP port 443
Connection-oriented • HTTP defines a very simple structure: • A client sends a request • The server sends a response • HTTP supports multiple request/response exchanges over a single connection • e.g. try using telnet to access a Web server....
HTTP 1.0/1.1 request structure (i) • HTTP requests are line-based ASCII text • Lines must alwaysend with "\r\n"(a.k.a. CRLF) • Headers are optional • A blank line separatesthe request from thecontent Request-Line Header(s) ... ... -- blank line -- Content ... ... ... what content?!
HTTP 1.0/1.1 request structure (ii) • The Request-Line consists of 3 tokens: • Each token is separated by a space character • Though "\r\n" is required by the protocol, "\n" seems to work in practice • The HTTP-Version is either HTTP/1.0or HTTP/1.1 Method URI HTTP-Version\r\n
HTTP request methods (i) Method URI HTTP-Version\r\n • The HTTP request’s Method can be: • GET – request information identified bythe given URI (absolute or relative?) • HEAD – request metadata regardingthe given URI (search engines!) • POST – send (i.e. post) informationto the given URI (e.g. via a form)
HTTP request methods (ii) Method URI HTTP-Version\r\n • The HTTP request’s Method can be: • PUT – store information in the locationidentified by the given URI • DELETE – remove the entity identifiedby the given URI (really?)
HTTP request methods (iii) Method URI HTTP-Version\r\n • The HTTP request’s Method can be: • TRACE – used to trace HTTP forwardingthrough proxies, tunnels, etc. • OPTIONS – determines the capabilities ofthe Web server or the characteristics of the named resource
HTTP request methods (iv) Method URI HTTP-Version\r\n • The GET, HEAD, and POST methods are supported everywhere • Check out homework #2! • HTTP 1.1 servers might support thePUT, DELETE, TRACE, and OPTIONS methods (but not always!)
Universal Resource Identifier • The URI is defined in RFC 2396 • An absolute URI consists of four parts: • A relative URI omits the scheme and server: • The server is assumed(since we’re already connected) scheme://hostname[:port]/path /path which one should we use in our HTTP Request-Line?
URIs in practice • In general, relative URIs are used inthe HTTP Request-Line • HTTP 1.1 servers are required to supportabsolute URIs, but not all do • When using a proxy HTTP server, an absolute URI is required • Or else, the proxy server won’t know whereto find the resource (i.e. document)
Request headers (i) • After the Request-Line, the request might have header lines • Header lines specifyattribute name/valuepairs (e.g. User-Agent:) • Note that HTTP 1.1requires the Host:header always beincluded Request-Line Header(s) ... ... -- blank line -- Content ... ... ...
Request headers (ii) • Request headers provide information to the server about the client • Who is making the request • What kind of client is making the request • What kind of content will be accepted • In HTTP 1.0, all headers are optional • In HTTP 1.1, the Host: header must be sent
Example request headers (i) • Headers can be included in any order: • For GET and HEAD requests, that’s the end(though don’t forget the blank line!) GET /index.html HTTP/1.1 Accept: text/html Host: www.rpi.edu From: goldschmidt@gmail.com User-Agent: Mozilla/4.0 Referer: http://somewhere.else.com/rpi.html -- blank line --
Example request headers (ii) • If a POST request is made, the headers must include Content-Length: POST /~goldsd/changegrade.php HTTP/1.1 Accept: */* Host: www.cs.rpi.edu User-Agent: SecretAgent v3.0 Referer: http://somewhere.devious.com/x.php Content-Length: 36 -- blank line -- rin=660123456&item=midterm&grade=104
HTTP response structure (i) • HTTP responses are line-based ASCII text • A Status-Line isalways returned • A blank line separatesthe response from thecontent • Content is a sequenceof bytes (e.g. HTML,image, text, etc.) Status-Line Header(s) ... ... -- blank line -- Content ... ... ...
HTTP response structure (ii) • The Status-Line consists of 3 tokens: • The HTTP-Version is either HTTP/1.0or HTTP/1.1 (and does not necessarily match the corresponding request) • Response status is represented using a 3-digit Status-Code and a human-readable Message HTTP-Version Status-Code Message
HTTP status codes • Status codes are grouped as follows: • 1xx – Informational • 2xx – Success • 3xx – Redirection • 4xx – Client Error • 5xx – Server Error (click me)
Example status lines • Example status lines include: • HTTP/1.0 200 OK • HTTP/1.0 301 Moved Permanently • HTTP/1.0 400 Bad Request • HTTP/1.0 403 Forbidden • HTTP/1.0 500 Internal Server Error
Response headers (i) • After the Status-Line, the response typically has header lines • Header lines specifyattribute name/valuepairs (e.g. Date:) • As with request headers,response headers endwith a blank line Status-Line Header(s) ... ... -- blank line -- Content ... ... ...
Response headers (ii) • Response headers provide information to the client about the entity (i.e. document) • What kind of entity/document • How many bytes are in the document • How the document is encoded • When the document was last modified • The Content-Type header is required, as is the Content-Length header (usually)
Example response headers • Headers can be included in any order: HTTP/1.1 200 OK Date: Wed, 30 Jan 2002 12:48:17 EST Server: Apache/1.17 Content-Type: text/html Content-Length: 1756 Content-Encoding: gzip -- blank line -- 2309fjfjef0jefe0fje2f0je2f0je2f0e2jfe0fje20fj2e0fjef0jef0e2jf0efje0fje02fje20fje2f0ejf0jef2e09fj209g209fj20gag09ha0gh0agha0gjg0jg
Request/response cycle • For HTTP 1.0, default behavior is as follows: • Client sends a complete HTTP request • Server sends back a complete HTTP response • Server closes its socket • Therefore: • If the client needs another document(e.g. images, CSS, etc.), the client mustopen a new socket connection!
HTTP 1.0 persistent connections • In HTTP 1.0, support for persistent connections is available • Multiple requests can be handled over a single TCP/IP socket connection • The Keep-Alive: header is used to keep the connection alive
HTTP 1.1 persistent connections • As of HTTP 1.1, support for persistent connections is available (and is the default) • Multiple requests can be handled over a single TCP/IP socket connection • The Connection: header is used to exchange information about persistence • e.g. Connection:close