130 likes | 546 Views
HTTP. Here, we examine the hypertext transfer protocol (http) originally introduced around 1990 but not standardized until 1997 (version 1.0) protocol permits transfer of hypertext documents the request is usually generated by clicking on a hyperlink in a browser
E N D
HTTP • Here, we examine the hypertext transfer protocol (http) • originally introduced around 1990 but not standardized until 1997 (version 1.0) • protocol permits transfer of hypertext documents • the request is usually generated by clicking on a hyperlink in a browser • server responds to the request and sends back requested document HTTP request PC running Explorer HTTP response HTTP request Server running Apache Web server HTTP response Different types of machines can request the same resource Apache, is just one of many web servers Mac running Navigator
Some Definitions • Client – the machine requesting a resource • often through a web browser • Server – the machine that responds to requests and transfers documents to fulfill the requests • usually a dedicated machine running some server software • Request – message that contains an HTTP method (we will cover these shortly) sent from client to server • Response –document/file requested, along with a message • or the message alone if the document/file does not exist or the request was ill-formed or not understood • Header – both request and response are placed into headers – headers are usually not visible to the user • header requests start with the method (e.g., GET), the resource requested and the protocol/version • we explore headers in more detail in a few slides
HTTP Methods • The method is the action that the client wishes the server to perform • GET – request a resource, to be displayed in the web browser (if possible, else save to disk) • Conditional GET includes • If-Modified-Since – comes with a specified date, server returns the requested item if it has been modified since that date • If-Unmodified-Since • If-Match – comes with a condition tested by the server that if true causes the server to return the resource • If-None-Match • If-Range – return the resource if it falls within a given range • For example: • GET /index.html HTTP/1.1 • If-Modified-Since: Mon, 11 Jan 2010 12:30:15 GMT • GET is the most common method • Conditional GETs are used to prevent the server from taking time or Internet usage when it may not be necessary/desired
Other Methods • HEAD – return the header portion only, not the actual page • PUT – used to upload a page (or content) – must be sent with the content to be uploaded • can only be used if either the user has been authenticated or the server does not require authentication (this would be a security flaw if PUT is allowed without authentication) • POST – same as PUT except that POST appends to a file • this can be used to place data into a bulletin/posting board or database • OPTIONS – queries the web server to find out what methods are available for use • DELETE – used to delete the specified resource • TRACE – used for troubleshooting (trace the route) • CONNECT – used in conjunction with a proxy server
Headers • The header is a portion of the message transmitted • if a request, the header is the request • if a response, it precedes the resource being returned • Request headers will include • the method, resource location, protocol and version • host name • user agent (browser) if sent by browser, including version of browser and preferred language (e.g., English) • what form(s) of encoding is preferred • how long the request should remain active • Response headers will include • protocol and version, status of request (see next slide) • date/time • server name • last modification date/time • content-type
More on Headers • Four classes of headers • general headers consist of four parts • Connection indicates whether the TCP connection should close at the end of the request or response or be persistent (the default) • Date (date/time of when the message was sent) • Transfer-encoding (what if any type of encoding has been applied) • Warning – status code • request headers are sent when a browser makes a request of a server and may contain the following • Accept – what types of media are acceptable by the client, provided in MIME format, e.g., text/html, image/png, etc • Accept-Charset – what character sets are acceptable • Accept-Encoding – what types of encoding *can* be applied • Accept-Language – what language(s) is(are) preferred • From and Host specifiers • Conditions – if-match, if-modified-since, if-range, if-unmodified-since, range • User-Agent – the type of browser
Continued • Response headers – sent by the server to the requester (which may be a proxy server, a web browser, another program (e.g., web crawler) or a command via nc or curl for instance) and may contain • Accept-Ranges (if the request had a range header) • ETags – an identifier generated from the file’s inode • Server – information about the server (web server software and version, platform) • Entity headers – may be sent in response to a document being sent via post, put, etc • Allow – lists set of methods available for the server • Content-Encoding, Content-Language, Content-Length, Content-Range, Content-Type – information about the document being sent • Last-Modified – if the item being sent already existed, last modification information about it
Examples • GET / HTTP/1.1 • Host: www.alcpress.com • User-agent: Mozilla/5.0... • Accept */* • Accept-Language: en • Accept-Encoding: gzip,deflate,compress,identity • Keep-Alive: 300 • Connection: keep-alive • HTTP/1.1 200 OK • Date: Tue, 07 Aug 2001 23:06:18 GMT • Server: Apache 1.3.20 • Cache-Control: max-age=604800 • Expires: Tue, 14 Aug 2001 23:06:18 GMT • Last-Modified: Tue, 06 Feb 2001 20:16:28 GMT • Etag: 1033e-607-3a7fd5d0 • Acept-Ranges: bytes • Content-Length: 2357 • Keep-Alive: timeout=15, max=100 • Connection: keep-alve • Content-Type: text/html • [data] Example GET header Example response from a GET request
Status Codes • See Appendix A for the complete list • 100 codes – informational • 100 – continue, 101 – switching protocols • 200 codes – success • 200 – request succeeded, 201 – resource created, 202 – command accepted, 204 – request succeeded but no content sent back, 205 – reset content • 300 codes – redirection (URL redirected to a different resource) • 300 – multiple choices, 301 – resource permanently moved, 302 – resource temporarily moved, 305 – use proxy • 400 codes – client error codes • 400 – bad request, 401 – unauthorized, 402- payment required, 403 – forbidden, 404 – not found, 405 – method not allowed, 406 – not accepted, 408 – timeout, 410 – gone • 500 codes – server error codes • 500 – internal server error, 501 – not implemented, 503 – service unavailable, 504 – gateway timeout
URLs • The URL is the specification of the resource • [protocol:]//host[:port][path/file][?query] • protocol is typically http but could be https or ftp or other • port defaults to 80 but can be overridden, for instance if the client knows that a different port should be used to fulfill the given request • path specifies where to look in the web server’s document space, servers may have defaults if the file is omitted (e.g., index.html, index.php, index.cgi) • query is used to specify a given location within a file (e.g., a database record) • URI is a more genetic form of identifier used in the semantic web (the book will use URL & URI interchangeably) • URLs consist only of letters, digits, $, -, _, ., +, !, *, ’, () • URLs may be case sensitive (true for Linux/Unix servers, not necessarily true for Windows servers)
Negotiation • In some cases, a request does not precisely match a resource in which case negotiation may take place • Language negotiation – if a file exists in multiple languages and the client has specified a preference, the server will respond with the document that fits the most preferred language if possible • Accept-Language: de, en-us;q=0.7,en;q=0.3 • request German first, and if not available, then American English and finally non-American English • Content negotiation – preference of types by placing types in prioritized list of MIME types • Accept: image/png,image/jpg;q=0.8,image/gif;q=0.5 • Content coding – lists what type(s) of encoding can be used to help reduce the message traffic over the Internet • These may include gzip (or x-gzip), compress (or x-compress), deflate and identity (no encoding)
Other Topics • Caching – to reduce Internet traffic, caching can take place in three different locations • web browser (client) caching • server caching • proxy caching • we cover proxy caching in chapter 11 • Cookies – HTTP is a stateless form of communication – you cannot store what is currently going on in the communication • a cookie is a file that stores the state (e.g., passwords, preferred pages, contents of shopping carts) • since cookie information is meant to be transmitted to a server, they can represent security holes – what if a cookie is set up by server1 but server2 asks for that information? Cookies can also violate privacy