600 likes | 751 Views
Overview of HTTP. Herng-Yow Chen. Outline. HTTP: the Internet’s Multimedia Courier Web Clients and Servers Resources Transactions Messages Connections Protocol Versions Architectural Components of the Web. HTTP: The Internet’s Multimedia Courier.
E N D
Overview of HTTP Herng-Yow Chen
Outline • HTTP: the Internet’s Multimedia Courier • Web Clients and Servers • Resources • Transactions • Messages • Connections • Protocol Versions • Architectural Components of the Web
HTTP: The Internet’s Multimedia Courier • Billions of multimedia data cruise through the Internet • Text files, HTML pages • Images • Videos • Programs (e.g. Java applets) • HTTP uses reliable data-transmission protocols (e.g. TCP/IP) • Good for users and application developers • Don’t have to worry about data/program integrity. • How does HTTP transport the Web’s traffic?
Web Clients and Servers • Web content lives on web servers. • The servers speak HTTP, so they are often called HTTP servers. • The simplest model: request and response • HTTP clients send HTTP requests to servers, • and the servers return the requested data in HTTP responses.
HTTP Request and Response When you browse to NCNU’s homepage at http://www.ncnu.edu.tw/index.html HTTP request Get me the document called /index.html HTTP response client server Okay, here it is, it’s in HTML format and is 4843 bytes www.ncnu.edu.tw
Resources • Static file resources • on the web server’s filesystem. • They may be data. • They can be programs. • Dynamic content resources • generated by software programs in the web server, or • generated by remote programs, gateways, or agents.
Media Types • HTTP tags the object being transported with a data format label called a MIME types. • MIME (Multipurpose Internet Mail Extensions) • Originally designed to solve problems in moving multimedia message between different email systems. • MIME worked so well for email that HTTP adopted it to describe and label its own multimedia content.
Media Types (cont.) • Web servers attach a MIME type to all HTTP object data. • When a web browser gets an object back from a server, it displays the object according to the associated MIME types. • displays image files, • parses and formats HTML files, • plays audio files, • launches external plug-in software, • or launches external helping software.
Media Types (cont.) • A MIME type is a textural label. • It is represented as a primary object type and a specific subtype, separated by a slash. • An HTML formatted text: text/html. • A plain ASCII text file: text/plain. • A JPEG image: image/jpeg. • A GIF image: image/gif. • An Apple QuickTime movie: video/quicktime. • A Microsoft PowerPoint file: application/vnd.ms-powerpoint.
MIME types (cont.) HTTP response Content-type: image/jpeg Content-length: 12345 client server Okay, here it is, it’s in JPEG format and is 12345 bytes www.csie.ncnu.edu.tw
URI • The server resource name is called a uniform resource identifier, or URI. • URIs are like the postal addresses of the Internet, uniquely identifying and locating information resources around the world. • URIs come in two flavors, called • URLs and • URNs
URL • The uniform resource locator (URL) is the most common form of resource identifier. • a URL tells precisely where a resource is located and how to access it. • Use what protocol2. Go to where3. Grab what resource http://www.csie.ncnu.edu.tw/pics1/陳恒佑.jpg client server www.csie.ncnu.edu.tw
Example URLs • Examples: http://www.ncnu.edu.tw http://english.csie.ncnu.edu.tw/login.php?name=hychen ftp://hychen:1234@ftp.ncnu.edu.tw/img.gif • Most URLs follow a standardized format of three main parts • The first part is called the scheme, and it describes the access protocol • The second part gives the server Internet address. • The rest names a resource on the server. • Today, almost every URI is a URL.
URN • The second flavor of URI is the uniform resource name, or URN. • A URN serves as a unique name for a particular piece of content, independent of where the resource currently resides. • Advantages: • Location independent: allow resource to move from place to place. • Access protocol independent: allow resource to be accessed by multiple network access protocol while maintaining the same name. • For example, access “RFC 2141” by • urn:ietf:rfc:2141
URN (cont.) • URN is still experimental and not yet widely adopted. • To work effectively, URNs need a supporting infrastructure to resolve resource location; the lack of such infrastructure has also slowed their adoption. • But URNs do hold some exciting promise for future.
Transactions • An HTTP transaction consists of a request command (sent from client to server), and a response result (sent from server to client). • This communication happens with formatted blocks of data called HTTP messages.
HTTP Transaction (cont.) HTTP request message contains The command and the URI GET /pics/hychen.jpg HTTP/1.0 Host: www.csie.ncnu.edu.tw www.csie.ncnu.edu.tw server client HTTP/1.0 200 OK Content-type: image/jpeg Content-length: 12345 HTTP response message contains The result of the transaction
Methods • HTTP supports several different request commands, called HTTP methods. • Every HTTP request message has a method, which tells the server what action to perform, such as • fetch a web page, • run a gateway program • delete a file, etc.
Some common HTTP methods HTTP method description • GET Send name resource from the server to the client. • PUT Send data from client into a named server resource. • DELETE Delete the named resource from a server. • POST Send client data into a server gateway application. • HEAD Send just the HTTP headers from the response for the named resource.
Status Codes • Every HTTP response message comes back with a status code, a three-digit number code that tells the client • If the request succeeded, or • If other actions are required. • HTTP also sends an explanatory texture “reason phrase” followed by each status code. • The texture phrase is included only for descriptive purposes; the numeric code is used for all processing.
Some common HTTP status code HTTP status code description • 200 OK. Document returned correctly. • 302 Redirect. Go someplace else to get the resource. • 404 Not Found. Can’t find this resource. • The following status codes and reason phrases are treated identically by HTTP software: • 200 OK • 200 Document attached • 200 Success
Web pages can consists of multiple objects • A web browser issues a cascade of HTTP transactions to fetch and display a graphics-rich web page. • First, the browser performs one transaction to fetch the HTML “skeleton”, • Then it issues additional HTTP transactions for each embedded image, graphic pane, Java applet, etc. • Note that these embedded resources might even reside on different servers.
Composite web pages require separate HTTP transactions Server 1 Internet Server 2 Server 3
Messages • Request Message vs. Response Message • HTTP messages consists of three parts: • Start line: • The first line of the message. • Indicate what to do for a request or what happened for a response. • Header fields: • Zero or more header field follow the start line. • Each header field consists of name and a value, separated by colon (:) for easy parsing. • The headers end with a blank line. • Body: • Is an optional part containing any kind of data (e.g. textural and binary data). • Request bodies carry data to the web server; response body carry data back to the client.
A line-oriented text message structure (a) Request message (a) Response message GET /text/hi-there.txt HTTP/1.0 Accept: text/* Accept-Language: en, fr HTTP/1.0 200 OK Content-type: text/plain Content-length: 19 Hi! I’m a message! Start line Headers Body
Another message example (a) Response message (a) Request message HTTP/1.0 200 OK Date: Sun, 01 Oct 2003 23:25:17 GMT Server: Apache/1.3.11 Last-modified: Tue: 04 Jul 2003 09:46:21 GMT Content-type: text/html Content-length: 403 <HTML> <HEAD> Web Technologies </HEAD> <BODY> <H1> Web technologies </H1> … </Body> </HTML> GET /tools.html HTTP/1.0 User-agent: Mozilla/4.75[en] Host: www.csie.ncnu.edu.tw Accept: text/html, image/gif, image/jpeg Accept-Language: en
Connections • How is an HTTP message moved from place to place, across Transmission Control Protocol (TCP) connections? • HTTP is an application layer protocol, which doesn’t worry about the details of network communication. Instead, it leaves the details of networking to TCP/IP.
TCP/IP • TCP provides: • Error-free data transportation • In-order delivery • Unsegmented data stream • The HTTP protocol is layered over TCP. Namely, HTTP uses TCP to transport its message data. • Likewise, TCP is layered over IP.
HTTP network protocol stack H T T P Application layer T C P Transport layer I P Network layer Network-specific link interface Data link layer Physical network hardware Physical layer
Connections, IP addresses, Port Numbers • Before HTTP client can send a message to a server, it needs to establish a TCP/IP connection between the client and server using Internet protocol (IP) address and port numbers. • DNS server: Domain name -> IP • Default port number: 80
c. Send the request GET /~hychen/index.html HTTP/1.0 User-agent: Netscape Accept: text/plain Accept: text/html Accept: image/* Internet Listen ... port 80. b. Find & setup connection to www.csie.ncnu.edu.tw a. click anchor: <A href=“http://www.csie.ncnu.edu.tw:80/~hychen/index.html”>
Parse the request b. Send error headers to client Method: GET Document: /~hychen/index.html Protocol: HTTP, version1.0 User-agent: Netscape Accept: text/plain,text/html,image/* HTTP/1.0 403 Not Found Server: Apache 1.2b7 Date: Thu, 22, May ... Content-type: text/html Content-length: 0 Internet b. Send headers to client HTTP/1.0 200 Document follows Server: Apache 1.2b7 Date: Thu, 22, May 1997 14:00:00 GMT Content-type: text/html Content-length: 1066 Last-modified: Sun, 18, May 1997 .... a. Look for /~hychen/index.html d. break connection c. Send file (index.html) to client
c. Send the request GET /cgi-bin/add?name=hychen& year=58&month=6&..... ...... HTTP/1.0 User-agent: Netscape Accept: text/plain Accept: text/html Accept: image/* Internet Listen ... b. Find & setup connection to www.csie.ncnu.edu.tw a. Submit : <form action=“cgi-bin/add” method =“GET”>
d. httpd sends headers & result to client parse the request Status: 200 Document follows Server: Apache 1.2b7 Date: .... Contenet-type: text/html GET /cgi-bin/add?name=hychen& year=58.... HTTP/1.0 User-agent: Netscape Accept: text/plain Accept: text/html Accept: image/* c. add returns html to httpd Content-type: text/html <html> <head> <title> .... </title></head> <body> <h1> Add successfully! </h1> </body> </html> Internet httpd add b. Setup excutable environment Query_String: name=hychen&year=58&..... cgi-bin add
Simulate an HTTP client using Telnet • The Telnet utility can connect your keyboard to a destination TCP port and connects the TCP port output back to your display screen. • You can use Telnet utility to talk directly to web servers. The web server treats you as a web client, and any data sent back on the TCP port is displayed on screen. • Try: telnet www.csie.ncnu.edu.tw 80
Try another tool • For a more flexible tool, you can check out nc (netcat). • The nc tool lets you easily manipulate and script UDPs and TCPs-based traffic, including HTTP. • See http://www.bgw.org/tutorials/utilities/nc.php for details
Protocol Versions • There are several versions of the HTTP protocol in use today. The HTTP applications need to work hard to robustly handle different variations of them. • HTTP/0.9 • HTTP/1.0 • HTTP/1.0+ • HTTP/1.1 • HTTP-NG (a.k.a. HTTP/2.0)
HTTP/0.9 • The 1991 prototype version of HTTP is known as HTTP/0.9. This protocol contains many serious design flaws and should be used only to interoperate with legacy clients. • HTTP/0.9 supports only the GET method to fetch simple HTML objects; it does not support MIME type, HTTP headers, or version numbers. • It was soon replaced with HTTP/1.0.
HTTP/1.0 • 1.0 was the first version of HTTP that was widely deployed. • HTTP/1.0 added version numbers, HTTP headers, additional methods, and multimedia object handling. • HTTP/1.0 made it practical to support graphically appealing web pages and interactive forms, which helped promote the wide-scale adoption of the WWW. • The specification was never specified. It represented a collection of best practices in time of rapid commercial and academic evolution of the protocol.
HTTP/1.0+ • Many popular web clients and servers rapidly added features to HTTP in the mid-1990s to meet the demands of a rapidly expanding, commercially successful WWW. • Many of these features, including long-lasting “keep-alive” connections, virtual hosting support, and proxy connection support, were added to HTTP and became unofficial, de facto standards. • This informal, extended version of HTTP is often referred to as HTTP/1.0+.
HTTP/1.1 • HTTP/1.1 focused on correcting architectural flaws in the design of HTTP, specifying semantics, introducing significant performance optimizations, and removing mis-features. • HTTP/1.1 also included support for the more sophisticated web applications and development that were under way in the late 1990s. • HTTP/1.1 is the current version of HTTP.
HTTP-NG (a.k.a. HTTP/2.0) • HTTP-NG is a prototype proposal for an architectural successor to HTTP/1.1 that focuses on significant performance optimizations and a more powerful framework for remote execution of server logic. • The HTTP-NG research effort concluded in 1998, and so far, there are no plans to advance this proposal as a replacement for HTTP/1.1.
Architectural Components of the Web • In addition to most popular web applications (i.e., web browsers and servers), there are many other web applications that we interact with on the Internet, including: • Proxies:HTTP intermediaries that sit between clients and servers • Caches: HTTP storehouses that keep copies of popular web pages close to clients. • Gateways: Special web servers that connect to other applications. • Tunnels:Special proxies that blindly forward HTTP communications. • Agents: Semi-intelligent web clients that make automated HTTP requests.
Proxies • HTTP proxy servers, sitting between clients and servers, are important building blocks for web security, application integration, and performance optimization. • Receiving all of the client’s HTTP requests, • And Replaying the requests to the server (perhaps after modifying the requests). • These applications act as a proxy for the user, accessing the server on the user’s behalf. • Proxies are often used for security, acting as trusted intermediaries through which all web traffic flows. • Can also filter requests and responses; for example, • To detect application viruses in message. • To filter adult content away from elementary-school student. • We will talk about proxies in detail in later lectures.
Proxy (cont.) Internet • Proxies relay traffic between client and server. Proxy server client
Caches • A web cache or caching proxy is a special type of HTTP proxy server that keeps copies of popular documents passing through the proxy. • The next client requesting the same document can be served from the cache’s personal copy; consequently, the client may be able to download the document much more quickly from a nearby cache that from a distance web server. • HTTP defines may facilities to make caching more effective and to regulate the freshness and privacy of cached content. • We shall talk about caching technology in a later lecture.
Caches (cont.) • Caching proxies keep local copies of popular document to improve performance. Proxy Internet client Proxy cache server
Gateways • Gateways are special servers that act as intermediaries for other servers. • They are often used to convert HTTP traffic to another protocol. • A gateway always receives requests as if it was the origin server for the resource; the client may not be aware it is communicating with a gateway. • For example, in the following figure, an HTTP/FTP gateway receives requests for FTP URIs via HTTP requests but fetches the documents using the FTP protocol. The resulting document is packed into an HTTP message and sent to the client. • We shall talk about caching technology in a later lecture.
HTTP/FTP gateway FTP HTTP HTTP/FTP gateway HTTP client FTP Server