540 likes | 666 Views
Internet Engineering Course. Web Servers. Introduction. Company needs to provide various web services Hosting intranet applications Company web site Various internet applications Therefore there is a need to provide http server First we have a look at what http protocol is
E N D
Internet Engineering Course Web Servers
Introduction • Company needs to provide various web services • Hosting intranet applications • Company web site • Various internet applications • Therefore there is a need to provide http server • First we have a look at what http protocol is • Then we talk about Web Servers and Apache as leading web server application
The World Wide Web (WWW) • Global hypertext system • Initially developed in 1989 • By Tim Berners Lee at the European Laboratory for Particle Physics, CERN in Switzerland. • To facilitate an easy way of sharing and editing research documents among a geographically dispersed groups of scientists. • In 1993, started to grow rapidly • Mainly due to the NCSA developing a Web browser called Mosaic (an X Window-based application) • First graphical interface to the Web More convenient browsing • Flexible way people can navigate through worldwide resources in the Internet and retrieve them
Web Browsers • Provides access to a Web server • Basic components • HTML interpreter • HTTP client used to retrieve HTML pages • Some also support • FTP, NTTP, POP, SMTP, …
Web Servers • Definitions • A computer, responsible for accepting HTTP requests from clients, and serving them Web pages. • A computer program that provides the above mentioned functionality. • Common features • Accepting HTTP requests from the network • Providing HTTP response to the requester • Typically consists of an HTML • Usually capable of logging • Client requests/Server responses
Web Servers cont. • Returned content • Static • Comes from an existing file • Dynamic • Dynamically generated by some other program/script called by the Web server. • Path translation • Translate the path component of a URL into a local file system resource • Path specified by the client is relative to the server’s root dir
Basic Client/Server Architecture in WWW • Overall organization of the Web. • Basic function operation is to fetch documents • Client issues requests, browser displays document • Server responsible for retrieving document from local file system • Client/server communications based on HTTP protocol
Dynamic Content Parts of documents may be specified via scripts/programs • Client-side (executed on client machine, e.g., within the browser) • Client-side script - Script embedded in html document • Applet - pre-compiled program passed to client • Server-side (executed on server machine) • Server-side script embedded in document • Servelet - precompiled program executed within the server’s address space • CGI scripts
Common Gateway Interface (CGI) • The principle of using server-side CGI programs. • Allows documents can be generated dynamically “on-the-fly” • Provides a standard way for web server to execute a program using user-provided data as input • To the server, CGI program appears as program responsible for fetching the requested document
Architectural Overview • Architectural details of a client and server in the Web. • Document fetch (and possibly server-side script): 2b-3b • Execute CGI Script (separate process): 2c-3c-4c • Execute servlet program (run within server): 2a-3a-4a
http protocol • Defines the communication between a web server and a client • Used to deliver virtually all files and other data (collectively called resources) on the World Wide Web • A browser is an HTTP client because it sends requests to an HTTP server (Web server • The standard (and default) port for HTTP servers to listen on is 80, though they can use any port.
Structure of http transactions • Request/Response, text based protocol • Format of a http message: <initial line, different for request vs. response> Header1: value1 Header2: value2 Header3: value3 <optional message body goes here, like file contents or query data; it can be many lines long, or even binary data >
cr cr cr cr lf lf lf lf headers lines The Format of a Request method sp URL sp version header : value header : value Entity Body
Request Example GET /index.html HTTP/1.1[CRLF] Accept: image/gif, image/jpeg[CRLF] User-Agent: Mozilla/4.0[CRLF] Host: www.ui.ac.ir:80[CRLF] Connection: Keep-Alive[CRLF] [CRLF]
method request URL version headers Request Example GET /index.html HTTP/1.1 Accept: image/gif, image/jpeg User-Agent: Mozilla/4.0 Host: www.ui.ac.ir:80 Connection: Keep-Alive [blank line here]
cr cr cr cr lf lf lf lf headers lines The Format of a Response status line version sp status code sp phrase header : value header : value Entity Body
Response Example HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 <html> <body> <h1>Hello World</h1> (more file contents) . . . </body> </html>
version status code reason phrase headers message body Response Example HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 <html> <body> <h1>Hello World</h1> (more file contents) . . . </body> </html>
Initial line • A typical initial request line: • GET /path/to/file/index.html HTTP/1.0 • Initial response line: • HTTP/1.0 200 OK • HTTP/1.0 404 Not Found • Status code: • 1xx indicates an informational message only • 2xx indicates success of some kind • 3xx redirects the client to another URL • 4xx indicates an error on the client's part • 5xx indicates an error on the server's part • Common status codes: • 200 OK • 404 Not Found • 301 Moved Permanently • 302 Moved Temporarily • 303 See Other(HTTP 1.1 only) • 500 Server Error
Header lines • Typical request headers: • From: email address of requester • User-Agent: for example User-agent: Mozilla/3.0Gold • Typical response headers: • Server: for example Server: Apache/1.2b3-dev • Last-modified: for example Last-Modified: , 19 Feb 2006 23:59:59 GMT
Message body • In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. • In a request, this is where user-entered data or uploaded files are sent to the server. • If an HTTP message includes a body, there are usually header lines in the message that describe the body. In particular, • The Content-Type: header gives the MIME-type of the data in the body, such as text/html or image/gif. • The Content-Length: header gives the number of bytes in the body.
MIME Media types • Multipurpose Internet Mail Extensions • HTTP sends the media type of the file using the Content-Type: header • Some important media types are • text/plain, text/html • image/gif, image/jpeg • audio/basic, audio/wav • model/vrml • video/mpeg, video/quicktime • application/*, application-specific data that does not fall under any other MIME category, e.g. application/octet-stream
Sample HTTP exchange • To retrieve the file at the URL http://www.somehost.com/path/file.html • Request: GET /path/file.html HTTP/1.0 From: someuser@jmarshall.com User-Agent: HTTPTool/1.0 [blank line here] • Response: HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 <html> <body> <h1>Happy New Millennium!</h1> (more file contents) . . . </body> </html>
HTTP methods • GET: request a resource by url • HEAD • is just like a GET request, except it asks the server to return the response headers only, and not the actual resource (i.e. no message body). • This is useful to check characteristics of a resource without actually downloading it, thus saving bandwidth. • POST • A POST request is used to send data to the server to be processed in some way, like by a CGI script. • There's a block of data sent with the request, in the message body. There are usually extra headers to describe this message body, like Content-Type: and Content-Length:. • The request URI is not a resource to retrieve; it's usually a program to handle the data you're sending. • The HTTP response is normally program output, not a static file.
HTTP 1.1 • It is a superset of HTTP 1.0. Improvements include: • Faster response, by allowing multiple transactions to take place over a single persistent connection. • Faster response and great bandwidth savings, by adding cache support. • Faster response for dynamically-generated pages, by supporting chunked encoding, which allows a response to be sent before its total length is known. • Efficient use of IP addresses, by allowing multiple domains to be served from a single IP address.
Manually Experimentingwith HTTP >telnet eng.ui.ac.ir 80 Trying 192.168.50.84… Connected to eng.ui.ac.ir Escape character is ‘^]’.
Sending a Request > GET /~ladani/index.htm HTTP/1.0 [blank line]
The Response HTTP/1.1 200 OK Date: Fri, 29 Feb 2008 08:23:33 GMT Server: Apache/2.0.52 (CentOS) Last-Modified: Wed, 07 Nov 2007 12:27:44 GMT ETag: "6ccb6-741c-43e55e05a5000" Accept-Ranges: bytes Content-Length: 29724 Connection: close Content-Type: text/html; charset=WINDOWS-1256 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <meta name=" GENERATOR" content="Microsoft FrontPage 5.0"> ….
GET /~ladani/index.htm HTTP/1.0 HTTP/1.1 200 OK HTML code
GET /~ladani/no-such-page.htm HTTP/1.0 HTTP/1.1 404 Not Found HTML code
GET /index.html HTTP/1.1 HTTP/1.1 400 Bad Request HTML code Why is it a Bad Request? HTTP/1.1 without Host Header
Session-persistent State • What does session-persistent state mean? • State information that is preserved between browsing sessions. • Information that is stored semi-permanently (i.e., on disk) for later access. • Why was calculator example not session-persistent? • Sum, current display, etc. not preserved if we went to a different website and back to calculator.
Why session-persistence? • User-based customizations. • MyYahoo, E*Trade, etc. • Long transactions. • Electronic shopping carts. • Order preparation • Server-side state maintenance. • Large amounts of state info that you don’t want to pass back and forth.
Cookie Overview • HTTP cookies are a mechanism for creating and using session-persistent state. • Cookies are simple string values that are associated with a set of URL’s. • Servers set cookies using an HTTP header. • Client transmits the cookie as part of HTTP request whenever an associated URL is visited in the future.
Anatomy of a cookie. • Cookie has 6 parts: • Name • Value • Domain • Path • Expiration • Security flag • Name and Value are required, others have default value.
Setting a cookie. • A cookie is set using the “Set-cookie” header in an HTTP response. • String value of the Set-cookie header is parsed into semi-colon separated fields that define the different parts of the cookie. • Cookie is stored by the client.
Sending cookies • Every time a client makes an HTTP request, it tests every cookie for a match. • Cookies match if… • Cookie domain is suffix of URL server. • Cookie expiration has not passed. • Cookie path is prefix of URL path. • Cookie security flag is on and connection is secure. • If a match is made, then name/value pair of cookie is sent as “Cookie” header in request.
Setting a Cookie • Full cookie: Set-Cookie: my_cookie = This is my cookie value; domain=.eng.ui.ac.ir; path=/~ladani; expires Thu, 06-March-08 12:00:00 GMT • Can have more than one Set-Cookie header, or can combine more than one cookie in one header by separating with ,
Cookie Matching • Biggest misunderstanding: • Servers do not RETRIEVE cookies!!!! • Servers RECEIVE cookies previously planted. • Step 1: • Some response by server installs cookie with “Set-cookie” header. • Client saves cookie to disk.
Cookie Matching • Step 2: • Browser goes to some page which matches previously received cookie. • Cookie name and value sent in request as “Cookie” HTTP header. • Step 3: • CGI program detects presence of cookie and uses it. • Where is the cookie info? • Environment variable HTTP_COOKIE
Where are cookies stored on client? • Client-specific locations. • No standard. • Latest IE stores in a folder called “Temporary Internet Files” • Each cookie stored in a separate file. • Netscape stores in “cookies.txt”
Typical Cookie Usages • Cookies as Database Index • Most common use of cookies. • State information is kept in some sort of database and the cookie acts as an index. • Cookies as State Variables • Name of cookie is like variable name. • Value of cookie is state information.
Cookie Security • Security flag restricts when browser will send a cookie back to server. • Requires “secure” connection. • For example: https in effect. • What does this mean about when the cookies was set?
First Web Server • Berners-Lee wrote two programs • A browser called WorldWideWeb • The world’s first Web server, which ran on NeXSTEP • The machine is on exhibition at CERN’s public museum
Most Famous Web Servers • Apache HTTP Server from Apache Software Foundation • Internet Information Services (IIS) from Microsoft • Google Web Server (GWS) • Started from May 2007 • Lighttpd • powers several popular Web 2.0 sites like YouTube, wikipedia and meebo
Web Servers Usage – Statistics • The most popular Web servers, used for public Web sites, are tracked by Netcraft Web Server Survey • Details given by Netcraft Web Server Reports • Apache is the most popular since April 1996 • Currently (February 2008) about • 50.93% Apache • 35.56 % Microsoft (IIS, PWS, etc.) • 5.16 % Google • 0.99% Lighttpd
Web Servers Usage – Statistics cont. Total Sites Across All Domains August 1995 - February 2008
Web Servers Usage – Statistics cont. Market Share for Top Servers Across All Domains August 1995 - February 2008
Web Servers Usage – Statistics cont. Totals for Active Servers Across All DomainsJune 2000 - February 2008
Apache (A PAtCHy) Web Server • Origins: NCSA (Univ. of Illinois,Urbana/Champaign) • Now: Apache Software Foundation (www.apache.org), developers world-wide • Most widely used web server today [NetCraft web survey, 2/2008] • Open source software • Geographically distributed developers • Modular, extensible design needed where third-party developers could override or extend basic characteristics