550 likes | 675 Views
Office Automation & Intranets. BUSS 909. Lecture 6 Web Architecture and Standards. Notices (1). Assignment 2 is available from the BUSS909 Intranet- includes a Marking Criteria sheet there are files on the intranet that provide information needed for the assignment:
E N D
Office Automation & Intranets BUSS 909 Lecture 6 Web Architecture and Standards
Notices (1) • Assignment 2 is available from the BUSS909 Intranet- includes a Marking Criteria sheet • there are files on the intranet that provide information needed for the assignment: • Organising Structures and Schemes • Media & Content Classification • Navigation, Labeling and Searching
Notices (2) • Additional files have been placed on the BUSS909 Intranet • a fundamentals of ‘Information Theory and Systems Theory’ file called sl909-00. ppt • an introduction to different types of services on the internet is available in a file called sl909-03.ppt
Agenda (1) • WWW Basics • Web Server Overview • Web Documents & Trees • Hypertext Transfer Protocol (HTTP) • Serving a Web Document- Example
WWW Basics • WWW and the Internet • Web Client and Web Server Software • Universal Resource Locators (URLs) • Hypertext Transfer Protocol (HTTP) • Hypertext Markup Language (HTML)
Uniform Resource Locators (1)Definition • a Uniform Resource Locator (URL) is the address of a network resource. URLs for the WWW actually contain several components • the first component identifies the URL scheme or protocol being used to transfer information
Uniform Resource Locators (2)Some Popular URL Schemes Hypertext Transfer Protocol http HTTP using Secure Sockets Layer (SSL) https E-mail Address mailto File Transfer Protocol ftp Finger protocol finger Gopher protocol gopher Wide Area Information Server wais Usenet news news Usenet news via Network News Transfer Protocol (NNTP) nntp Usenet news via SSL-encrypted NNTP snews Host-specific filenames file Internet Relay Chat session irc Telnet interactive session telnet
Uniform Resource Locators (3)Server Name & Resource • the second component identifies the name of a server sitting on the Internet from which a resource is being requested • the third component identifies part of the server’s subdirectory and the file name for a resource- most likely a HTML document
Uniform Resource Locators (4)‘Complete URL’ to UOW Home Page • URL scheme • server name • server’s subdirectory and resource file name http://www.uow.edu.au/index.html
Uniform Resource Locators (5)Incomplete URL top UOW Home Page • However, the shorter URL http://www.uow.edu.au/index.html points to the ‘home page’ of that server • Web servers have a default filename often default.html or index.html • Note: either this URL or the previous one enables the user to view the home page for UOW web site
Uniform Resource Locators (6)Omitting the Scheme in Web URLs • Because of the popularity of WWW, the scheme is occasionally omitted • web browsers are able to substitute this parts of web URLs • the URL terra.uow.edu.au is interpreted by Netscape as http://terra.uow.edu.au/
Uniform Resource Locators (7)Partial or Relative Web URLs • a partial or relative URL is one which does not have a protocol, host, port, or path • eg. rsch-ss.htm when referenced by http://www.uow.edu.au/commerce/buss/ research.htm is a relative form of http://www.uow.edu.au/commerce/buss/ rsch-ss.htm
Uniform Resource Locators (8)Anchors in Web URLs • Web URLs support the use of a # sign after the HTML filename to indicate an anchor • for example, http://www.uow.edu.au/residences/ inter_house/#Facilitiesrefers to the “Facilities” section of the document inter_house.htm
Uniform Resource Locators (9)Preserving State Information in URLs ... • WWW is inherently stateless • once a request from a client is answered by a HTTP server, the transaction is effectively concluded • the transaction’s current status is lost, that is normally not recorded for future transactions
Uniform Resource Locators (10)…Preserving State Information in URLs ... • state information must be available for many uses like: • electronic commerce across internet (shopping carts), extranet (EDI), etc • researching on the web with search engines which generally involves multiple attempts at converging on a small set of useful sources
Uniform Resource Locators (11)…Preserving State Information in URLs ... • however, state can be preserved for the duration of a user’s session by placing additional information into the URL • this information is typically sent to the CGI-BIN area on the server- the CGI-BIN area is where user provided executable routines are placed for execution during a user’s session
Uniform Resource Locators (12)…Preserving State Information in URLs ... • conventions exist for passing state information to CGI routines • search parameters can form state information- for example, search term “intranets” can be sent as a parameter to the query routine located in the CGI bin of Ultavista search engine
Uniform Resource Locators (13)…Preserving State Information in URLs • Everything after the ? is the parameter string that is past to the query routine located on the Altavista site http://www.altavista.com/cgi-bin/ query?pg=q&kl=XX&q=intranets&search=Search
Web Server Overview • Web Server Components • Relationship to HTTP • Limits of Web Servers
Web Documents & Trees • MIME file extensions and types • Documents, Links and Anchors • Document Tree Organisation
Hypertext Transfer Protocol • browser and server communicate using HTTP • simple set of rules designed to be suitable for hypermedia systems distributed across networks • must understand this protocol in order to understand the WWW • HTTP defines a simple request-response ‘conversation’
Hypertext Transfer Protocol • HTTP does define how to correctly format the request and the response • the client- often but not necessarily a browser- is the requesting program and establishes a connection to the receiving program or server • the server replies with a response including the requested information if possible
Hypertext Transfer Protocol • HTTP does not define: • how the network connection is made or managed, or • how the information is actually transmitted (this is done by lower-level protocols such as TCP/IP) • HTTP requests consist of a method, a Universal Resource Identifier (URI), a protocol version, and other information
Hypertext Transfer Protocol HTTP Requests: Methods ... • HTTP Methods- commonly supported methods include: • GET- which returns the object; retreives the information • HEAD- returns only information about the object, but not the object itself • POST- send information to be stored on the server (eg. input to scripts)
Hypertext Transfer Protocol ... HTTP Requests: Methods • some HTTP methods are not supported by many browsers because they may put the integrity of the server at risk: • PUT- send a new copy of an existing object • DELETE- permanently remove an object • other medthos may be added to the standard in the future- HTTP is extensible and has evolved- slowly
Hypertext Transfer Protocol HTTP Requests: Information Client -> Server • User-Agent: kind of browser making request • If-Modified-Since: the object is returned only if it is newer than a specified date (can save the cost of a retrieval) • Accept: the MIME types and formats the browser has been congigured to accept (can save the cost of downloading an unreadable document) • Authorization: user password etc. as required
Serving Documents- Example1: Server waits for a new request • httpd program waits for a clients request to arrive from somewhere on the Internet • server listens to a port until someone calls it and until that occurs it is dormant
Serving Documents- Example2: Request arrives from client ... • ultimately a request is sent by a client to the server either by typing a URL or selecting a HTML anchor • the network software (client) locates the server computer and sets up a 2-way network connection from the client to the server
Serving Documents- Example... 2: Request arrives from client • client can locate servers by the use of Internet protocols and the name service (DNS) to locate and initiate a connection with the server • once the connection is established the client sends the HTTP request: GET /sample.htm HTTP/1.0 • sent over the network in ASCII, server receives it and saves it
Serving Documents- Example3: server parses the request ... • server decodes the request using HTTP protocol to determine what to do • there are three important pieces of information: • the method instructs the server as to what action should be taken. The GET method is used to locate and read the file and return it to the client ...
Serving Documents- Example... 3: server parses the request • the document (/sample.htm) can be fetched by the server because it knows where it is in the document tree, and the • browser protocol being used (HTTP/1.0) so that the contents can eventually be returned to the client sent back over the same connection as the request. (Note that the server need not find the client on the Internet or make a new connection)
Serving Documents- Example4: Read other information (if necessary) ... • the httpd program reads the rest of the requests needed • using HTTP/1.0 the browser is expected to send additional information about itself to the server • this meta-information describes the browser and its capabilities which may be needed by the server to reply to the request
Serving Documents- Example... 4: Read other information (if necessary) • for example: User-agent: Mosaic for X Windows/2.4 Accept: text/plain Accept text/html Accept: image/* • indicates the browser is Mosaic configured to display text, and any kind of image
Serving Documents- Example5: Do the requested method ... • Assuming no errors, the httpd program executes the request • to GET a document requires looking up the file /sample.htm in its document tree using its standard operating system • there are two alternative courses of action depending on sucess or failure
Serving Documents- Example... 5: Do the requested method (Success) ... • the httpd daemon sends a result code and the information that describes the type of information expected by the client • as the document is found a code 200 (everything is OK) is sent and the document will follow • the information is a HTML document so the Content-type: text/htm; the document is 1066 bytes long so the Content-length: 1066 • the server software and the file date are also included
Serving Documents- Example... 5: Do the requested method (Success) • the header sent to the client might look something like this: HTTP/1.0 200 Document follows Server: NCSA/1.4 Date: Thu, 20 Jul 1996 22:00:00 GMT Content-type: text/html Content-length: 1066 Last-modified: Thu, 20 Jul 1996 20:38:40 GMT
Serving Documents- Example5: Do the requested method (Failure)... • if the requested file could not be found or read then the status code will not be 200 • the most common problem is that the name of the requested file is misspelt so the server cannot find it • if the requested file was called smple.htm it would not be found- the server would send a status code 403
Serving Documents- Example... 5: Do the requested method (Failure)... • the response might look like this: HTTP/1.0 403 Not Found Server: NCSA/1.4 Date: Thu, 20 Jul 1996 22:00:00 GMT Content-type: text/htm Content-length: 0
Serving Documents- Example6: Finish Up • when the file is completely sent or an error message is sent, • the httpd server has finished its work- it closes the file if it was open, and closes the network port which terminates the network connection • the client receives and formats the data- the server knows nothing • the httpd server listens for another request (go back to step 1)
Web Server Operations • a web server has a collection of information in a document tree and it serves it according to the HTTP protocol • web servers are reactive programs waiting until a request is made; it attempts to make it, this is repeated etc. • the previous example is only slightly simplified
Web Server OperationsHandling Multiple Requests (1) • if a server processes one request at a time, but can receive many simultaneous requests then delays will occur- an image may take several seconds to serve • without a priority scheme, small jobs that can be serviced quickly take inordinate amount of time to serve • with a large number of hits servers can go down- backlog can be too great
Web Server OperationsHandling Multiple Requests (2) • web servers are therefore designed to handle as many requests as possible simultaneously • several strategies are available to do this (the last two are are more difficult unless special software is used): • clone a copy of the httpd program for each request- very easy under UNIX • multithreading the httpd program • spreading the work amongst several helper programs
Web Server OperationsCloning Servers (1) • each request is processed by a new copy of the httpd program • the original server called the parent immediately returns to listening for another request • the new copy called the child performs the processing