530 likes | 633 Views
The World Wide Web. Modified by Linda Kenney 2/4/08. Using the Web, it’s possible for anyone to publish their own Web pages on a host running a Web server and have those pages available to any Internet user with a Web browser . Hypertext. The Web was invented in 1990.
E N D
The World Wide Web Modified by Linda Kenney 2/4/08 CS403 Introduction
Using the Web, it’s possible for anyone to publish their own Web pages on a host running a Web server and have those pages available to any Internet user with a Web browser. CS403 The World Wide Web
Hypertext • The Web was invented in 1990. • But it was based on the concept of hypertext which had been around for decades. • The basic idea of hypertext is to take the passive cross-references that are common in printed text and make them active. • When reading a book, a cross-reference passively informs the reader where to turn for additional info and the reader must manually perform the actions necessary to obtain that additional info if it is desired. • Examples? CS403 The World Wide Web
Hypertext • On a computer, it’s easy to make cross-references active. You notify the reader that additional info is available, but let the computer take the actions necessary to obtain that info if the reader desires it. • Such an active cross-reference is called a hyperlink (or just “link”) and text that contains such links is called hypertext. • This concept is fundamental to the Web. CS403 The World Wide Web
Web presentations • Most Web pages do not exist in isolation. • The vast majority of them are grouped together into collections of pages with a common purpose or theme. • Such a collection of Web pages is called a Web presentation or Web site. • Typically, all the pages within a given presentation are under the editorial control of a single individual or organization. CS403 The World Wide Web
Web presentations (cont.) • A given Web page is likely to contain several links to other pages. • Often, those links will lead to other resources within the same presentation. These links are called “local links” or “links to local resources”. • Some of those links may lead to other resources which are part of a different presentation. These links are called “remote links” or “links to remote resources”. CS403 The World Wide Web
Clients and servers on the Web • Like most Internet services, the Web is based on the client/server model. • A Web browser is just a specific example of a client program. CS403 The World Wide Web
Clients and servers on the Web (cont.) • The browser can’t accomplish much without the cooperation of a server. • A Web server is a program that makes files available to Web browsers upon request. • In general, the files a Web server makes available contain Web pages and the images, sounds, videos and other media that supplement them. • And all the files a Web server has access to are generally stored in the secondary storage of the host on which the server runs. CS403 The World Wide Web
Hypertext Transfer Protocol • Hypertext Transfer Protocol (HTTP) is the protocol that Web browsers and Web servers use to communicate with one another. • As a protocol, it carefully defines the range of possibilities, determining precisely what a browser may say to a server and when. • It also dictates what servers can say to browsers and when. CS403 The World Wide Web
Hypertext Transfer Protocol “I need the file page.html” “Here is the file page.html” Server Browser CS403 The World Wide Web
HTTP requests and responses • When “speaking” HTTP, a Web browser generally sends an HTTP GET request to the Web server on a specific host requesting a specific resource. • When it receives an HTTP GET request from a browser, a Web server, in turn, sends some sort of HTTP response back to the browser. • Note that HTTP requests and responses rely on TCP (Transmission Control Protocol) and IP to get across the Internet. (see p 72-74) • In other words, HTTP is layered on top of TCP and IP. HTTP GET request for /page.html HTTP response Status code: 200 Content-type: text/html Content-length: 4370 [contents of /page.html] HTTP response Status code: 404 Not Found Content-type: text/html Content-length: 1634 [contents of error status page] Browser Server CS403 The World Wide Web
The server’s responsibilities • When it receives an HTTP GET request, a Web server must prepare an appropriate HTTP response message. • The request will specify the file it is requesting. • The server must first locate the requested file within the file system of its host. • If the file cannot be located, the server sends back a ‘404 File not found’ response message. CS403 The World Wide Web
The server’s responsibilities (cont.) • Having found the file, however, the server must also verify that the file permissions allow it to access the file. • If the server is not able to access the file, it will typically return a ‘403 Forbidden’ response message. • If the requested file is located and accessible, the server generates a ‘200 OK’ response message that includes the contents of the file as well as a variety of headers that provide information about the file, such as its type, size and last modified date. CS403 The World Wide Web
Locating files • A typical host stores thousands of files, all of which must be uniquely identified. • It’s impractical to give 100,000 files unique names. • Instead, a host uses a file system consisting of a hierarchy of directories to create uniquely identified locations in which files may be stored. CS403 The World Wide Web
Locating files (cont.) • Each location can be uniquely identified by the sequence of steps necessary to reach it from the top of the hierarchy. • The list of steps needed to reach a location from the top of the hierarchy is called the absolute path to that location, and every location has a unique absolute path. CS403 The World Wide Web
Locating files (cont.) • All items in a given location must have unique names. • So each item in the hierarchy can be uniquely identified by combining its absolute path with its filename to form an absolute pathname. CS403 The World Wide Web
Uniform Resource Locators • Before a browser can request a resource, it needs to know where it can find that resource and what type of server will be providing it. • To find a specific resource, the browser must be told not only the name of the file containing that resource, but also what host it is on and where it is in the file system of that host. • All the information needed to find a specific resource, out of the billions available on the Web, is contained in that resource’s Uniform Resource Locator (URL). • Every resource available on the Web is identified by a unique URL that contains all the information necessary for a browser to retrieve that resource. CS403 The World Wide Web
Uniform Resource Locators (cont.) • The browser always does the same thing with the URL: it requests the resource and renders it on the screen. • In computer science, we use the term render to refer to the process of producing an image by interpreting some data. • A browser renders a Web resource by determining what to display on the screen based upon what it finds in the HTTP response that contains the contents of that resource. CS403 The World Wide Web
The anatomy of a URL • Consider a typical URL • A URL typically begins with the protocol to use when accessing the resource. • The remainder of the URL is the identifier that tells the browser how to locate the resource. • The identifier starts with a hostname that uniquely identifies the host on which the resource is stored. • The rest of the identifier is the pathname that uniquely locates the resource in that host’s file system. • The pathname consists of a path and a file name. http://www.sample.com/products/catalog/prod1.html http://www.sample.com/products/catalog/prod1.html http://www.sample.com/products/catalog/prod1.html CS403 The World Wide Web
The Web step-by-step – step 1 • The process of displaying a Web resource begins when the browser is given the URL of that resource by the user. • The browser examines that URL to find out what it needs to do next. • The first part (ex: http://) tells the browser what protocol to use, and indirectly what type of server to contact. • The identifier tells the browser where the resource is located. • The hostname in the identifier tells the browser which host is running the server responsible for the resource. • The pathname in the identifier tells the browser precisely where the desired resource is stored in that host’s file system. • Using this information, the browser composes an HTTP GET request message. • The GET request contains the pathname of the desired resource as well as the hostname of the server’s host and various other information. CS403 The World Wide Web
The Web step-by-step – step 2 • The HTTP GET request must be sent to the appropriate server. • Since it must arrive in its entirety at a specific host, the request gets sent over the Internet using TCP and IP. • To establish a TCP connection with the server, the browser needs to know the IP address of the host running the server. • To get the IP address of the server’s host, the browser resolves the hostname in the URL’s identifier using DNS. • Using the IP address of the server’s host, the browser establishes connection with the server. • The HTTP GET request message is sent to the server over this connection. Since the request message is small, it takes little time to send. CS403 The World Wide Web
The Web step-by-step – step 3 • When a Web server receives an HTTP GET request, it composes an HTTP response. • Using the pathname specified in the request, the server attempts to locate the file containing the resource within the file system of its host. • Once the resource’s file has been located, the server verifies that it has permission to access that file. CS403 The World Wide Web
The Web step-by-step – step 3 (cont.) • If the server is able to locate and access the file, the HTTP response will indicate success. • The response will also indicate the date and time at which the file was last modified, the type of resource the file contains and how big it is. • And the server will include the contents of the resource’s file in the response message. • Note that this means the size of the response message is primarily determined by the size of the resource being requested. • If the server is unable to locate or access the file, the HTTP response will indicate the nature of the problem. • The response may also contain some content for the browser to use in lieu of the requested resource. CS403 The World Wide Web
The Web step-by-step – step 4 • The server must now send the response back to the requesting browser. • It gets the IP address for the browser from the packet that carried the HTTP request. • Because they typically contain the contents of the requested resource, HTTP response messages tend to be significantly larger than HTTP request messages. • To minimize the time a user must wait to receive a requested resource, it’s up to the creator of that resource to minimize the size of the file(s) containing the resource(s). CS403 The World Wide Web
The Web step-by-step – step 5 • Upon receiving an HTTP response message, the browser is responsible for rendering the resource it contains. • Many resources will be Web pages, which are written in Extensible Hypertext Markup Language (XHTML). • Rendering a Web page involves interpreting the XHTML to determine what the page should look like. • Other resources, however, will be other forms of media such as images, sounds and video. • Rendering multimedia resources involves interpreting the data those resources contain and producing the image, sound or video that data represents. • Browsers therefore need to understand a range of resource types. CS403 The World Wide Web
The Web step-by-step – step 5 (cont.) • It’s also useful to note at this stage that even though a Web page may appear to contain images, sounds and videos, each of those resources must be stored separately in its own file. • And each of those resources must therefore be retrieved from a server with a separate HTTP transaction. • So, the time it takes to retrieve a Web page is the sum of the time it takes to retrieve all of its parts. CS403 The World Wide Web
The browser lends a hand • Browsers can play a role in minimizing the time the user must wait for a page to load. • A user often revisits the same resources repeatedly. • So, what you want is for the browser to have save the resource so that you can return to it without having to request it from the server again. CS403 The World Wide Web
The browser cache • As a browser receives each requested resource, it stores a copy of that resource in a special place called the browser cache. • Along with the contents of the resource it stores the current date and time and the URL used to retrieve the resource. • Each time a resource is requested, the browser checks to see if that resource is already stored in its cache. • If it’s not, then the browser goes about retrieving the resource as we’ve already described. CS403 The World Wide Web
When things go wrong… • Although it often goes off without a hitch, there are places in an HTTP transaction where problems can occur. • Knowing what might go wrong can help us make sense of otherwise cryptic or confusing error messages we may get from our browser. • Of course, different browsers and servers are free to use different error messages as they see fit, so the wording may differ. CS403 The World Wide Web
When things go wrong… (cont.) If the hostname in the URL cannot be resolved to an IP address using DNS, there’s no way to establish the necessary TCP connection to the server. In this case, we’ll get an error to the effect of “Unable to locate server”. CS403 The World Wide Web
When things go wrong… (cont.) The hostname may resolve but the TCP connection may not be able to be established for a variety of other reasons. In this case, we’ll get an error to the effect of “No response”. CS403 The World Wide Web
When things go wrong… (cont.) If we’re able to get a TCP connection and send an HTTP request to the server, there’s no guarantee it will be successful. • If the server is unable to locate the requested file, we’ll get an error to the effect of “Not found”. • If the server locates the file but does not have permission to access it, we’ll get an error to the effect of “Forbidden” or “Access denied”. CS403 The World Wide Web
…And how to fix it • Understanding the root cause of an error can often help you devise a solution to the problem. CS403 The World Wide Web
…And how to fix it (cont.) • If you get an “Unable to locate server” error, you know there’s a problem with the hostname in the URL. • Double-check your typing of the hostname. • Make sure your network connection is still working. • Ensure that your DNS server is functioning in general. CS403 The World Wide Web
…And how to fix it (cont.) • If you get a “No response” error, you know the hostname is okay but the server is not able to respond. • Often, there’s nothing you can do about this yourself. • However, since this is often a temporary problem, try again a little later. CS403 The World Wide Web
…And how to fix it (cont.) • If you get a “Not found” error, you know there’s a problem with the pathname in the URL. • Again, double-check your typing, paying attention to case. • Try eliminating steps from the pathname one at a time, moving from right to left. CS403 The World Wide Web
…And how to fix it (cont.) • If you get a “Forbidden” error, the problem is with the permissions on the file containing the requested resource. • If the file belongs to you, simply adjust the permissions. • Otherwise, there’s little you can do about this problem yourself except contact the owner of the resource. CS403 The World Wide Web
Resource types • As we’ve seen, the Web consists of a variety of resource types. • In each HTTP response, the server includes an indicator of the resource’s type so the browser knows how to render it. • Since servers and browsers must agree on the meaning of this type info, it needs to be standardized. CS403 The World Wide Web
Resource types (cont.) • The standard used for this purpose is called Multipurpose Internet Mail Extensions (MIME). • As you can tell from its name, MIME was originally designed for use with e-mail. • A MIME type consists of an indicator of the general resource type (text, image, audio, etc.) followed by a / followed by an indicator of the specific resource type (html, jpeg, mpeg, etc.). • For example, XHTML files are assigned a MIME type of text/html. • JPEG image files are assigned a MIME type of image/jpeg. • MP3 sound files are assigned a MIME type of audio/mpeg. CS403 The World Wide Web
Filename extensions • The server needs to know the type of each resource for which it is responsible. • Otherwise, it wouldn’t know what MIME type to list in the HTTP response message. • Servers are set up to use the extension of the resource’s filename to determine its type. • A filename extension is part of the actual filename, but it comes at the end and starts with a dot. • Examples? • The server is configured to associate certain filename extensions with specific MIME types. CS403 The World Wide Web
Filename extensions (cont.) • For this reason, it’s important to name all of the files containing your Web resources with appropriate filename extensions. • We’ll generally use only a small number of resource types in this course. • XHTML files are given .html (or .htm) extensions. • JPEG images are given .jpg (or . jpeg ) extensions. • GIF images are given .gif extensions. • CSS files are given .css extensions. CS403 The World Wide Web
What Browsers Understand • A browser understands the HTTP protocol for retrieving Web pages. • Most browsers also understand protocols for other Web services like file transfer, instant messaging, e-mail and network news. • A browser understands XHTML and HTML and can interpret it in order to render Web pages. • Many also understand other popular languages like CSS, JavaScript and XML . CS403 The World Wide Web
What Browsers Understand (cont.) • Most browsers understand common image file formats like JPEG and GIF and can render images stored in these formats. • Some also understand image file formats like BMP and PNG. • Many browsers understand other forms of media as well. • Flash presentations are used for interactive animations. • MP3 is a file format commonly used for storing sounds and music. • MPEG and AVI are common file formats for storing video. CS403 The World Wide Web
What Browsers Understand (cont.) • A good browser is designed to provide the functionality most Web users are likely to need. • Since people use the Web in many different ways most browsers are designed to accept two different types of add-ons that extend their capabilities. CS403 The World Wide Web
Add-Ons : Helpers and Plug-Ins (p. 76-83) • An application is a program you run on your computer to accomplish specific tasks. • You can obtain applications from retail software stores or the Internet. • A browser often uses other applications to view the Web. • You can customize what applications your browser uses. CS403 The World Wide Web
Helpers • A helper application is an application a browser can launch. It can be any application on your computer. • Examples? • When your browser encounters a file that requires special handling, it looks for an appropriate helper application and opens the file in that application. CS403 The World Wide Web
Plug-Ins • A browser plug-in is an application that expands the capabilities of a web browser. • When you install a plug-in, you extend the capabilities of your browser to handle a file type that it wasn’t originally designed to handle. • Any file requiring that plug-in will be displayed inside the browser window, with the plug-in working as if it were a part of your browser. CS403 The World Wide Web
Plug-Ins (cont.) • Plug-ins support everything from audio to animation to documents • Plug-ins increase your browser’s memory requirements and launch time. • You can find Web pages to help you locate plug-ins for your browser. CS403 The World Wide Web
Common plug-ins and helper applications: CS403 The World Wide Web
Key terms Absolute path Absolute pathname Browser cache Browsing Conceptual network File system Filename extension Helper app Hostname HTTP HTTP GET request HTTP HEAD request HTTP response Hyperlink Hypermedia Hypertext Identifier Link Local link MIME MIME type Pathname Permissions Plug-in Remote link Render Scheme URL Web browser Web presentation Web server Web site World Wide Web XHTML CS403 The World Wide Web