Internet Applications

Internet Applications ISS, 2005

The World Wide Web • By far the best known distributed application is the World Wide Web (WWW), or the Web for short. Technically, the web is a distributed system of HTTP servers and clients, more commonly known as web servers and web browsers. • Prior to the emergence of the web, the user community of the Internet largely comprised of researchers and academics who used network services such as electronic mail and file transfer to exchange data. • The World Wide Web originated with Tim Berners-Lee in late 1990 for CERN, the European Particle Physics Laboratory in Geneva, Switzerland. A proposal for a "universal hypertext system" was submitted in November 1990 by Tim Berners-Lee and Robert Cailliau for a "universal hypertext system." • In April 2004 Tim Berners-Lee received the first-ever Millennium-Technology Award of 1 million euros from the Finnish Technology Award Foundation. ISS, 2005

The World Wide Web Since the original proposal, the growth of the World-Wide Web has been extraordinary (see Figure 1), and has expanded far beyond the research and academic community into all sectors world-wide, including commerce and private homes. The continued development of the Web technology is currently coordinated by the World-Wide Web Consortium, W3C. ISS, 2005

The World Wide Web The genius of the World-Wide Web is that it combines three important and well-established computing technologies: • Hypertext documents: documents in which chosen words or phrases, typically highlighted, can be marked as links to other documents, so that a user is able to access the linked documents by clicking with a mouse on the highlighted text. • Network based information retrieval: the File Transfer Protocol (FTP) service was the most widely used service for such information retrieval. • Standard Generalized Markup Language(SGML), an ISO standard which allows documents to be “marked up” with tags so that they can be displayed in a uniform format on any platform, independent of the presentation mechanics. ISS, 2005

The World Wide Web • At its most basic, the World-Wide Web is a client-server application based on a protocol named the HyperText Transfer Protocol (HTTP). • A web server is a connection-oriented server that implements the HTTP. By default, an HTTP server runs at the well-known port 80. • A user runs a World-Wide Web client (sometimes referred to as a browser) on a local computer. The client interacts with a web server according to the HTTP, specifying a document to be fetched. If the document is located by the server in its directory, the document’s contents is returned to the client, which presents it to the user. ISS, 2005

The Hypertext Markup Language (HTML) • HTML is a markup language used to create documents that can be retrieved using the World Web Web. • HTML is based on SGML, with semantics that are appropriate for representing information of a wide range of types. • HTML markup can represent hypertext news, mail, documentation, and hypermedia; menus of options; database query results; simple structured documents with in-lined graphics; and hypertext views of existing bodies of information. ISS, 2005

HTML <HTML> <HEAD> <TITLE>A Sample Web Page</TITLE> </HEAD> <HR> <BODY> <center> <H1>My Home Page</H1> <IMG SRC="/images/myPhoto.gif"> <b>Welcome to Kelly's page!</b> <p> <! A list of hyperlinks follows.> <a href="/doc/myResume.html"> My resume</a>. <p> <a href="http://www.someUniversity.edu/">My university<a> </center> <HR> </BODY> ISS, 2005

The Extensible Markup Language XML • Whereas HTML is a language that allows a document to be marked up for the presentation or display of the information contained in a document, XML allows a document to be marked up for structured information. • Also based on SGML, XML uses tags to describe the information contained in a document. <message> <to>you@yourAddress.com</to> <from>me@myAddress.com</from> <subject>This is a message</subject> <text> Hello world! </text> </message> ISS, 2005

Content Type – MIME Protocol ISS, 2005

Content Type and the Mime Protocol • One of the header lines returned in a server response is the Contents Type of the object requested. • Specification of the contents type follows the scheme established in a protocol known as MIME (Multipurpose Internet Mail Extension.) • Originally used for Email, MIME is now widely used for describing the content of a document sent over a network. • It supports a large number and evolving set of predefined content types, specified in the formatType/Subtype. ISS, 2005

The Mime Protocol A small subset of the types and subtypes are: ISS, 2005

Characteristics of HTTP ISS, 2005

HTTP is a Connection-Oriented Protocol With HTTP1.0, a connection to a server is automatically closed as soon as the server returns a response. Thus exactly one round of exchange is allowed between a client and a web server; if a client needs to contact the same server in one session, it must reconnect to the server to reissue another request. ISS, 2005

HTTP is a Connection-Oriented Protocol • The scheme is adequate for the original intent of HTTP for retrieving simple network documents. • It is inefficient for documents such as those that contain a large number of links to image objects to be fetched by the server, since fetching each of these links require a reestablishment of a connection. • It is also insufficient for sophisticated web applications based on HTTP (such as shopping carts). ISS, 2005

HTTP is a stateless Protocol HTTP 1.0 (as well as version 1.1) is also a stateless protocol: the server does not maintain any state information on a client’s session. Regardless of whether the connection is kept alive, each request is handled by a server as a new request. As with non-persistent connections originally in practice with HTTP, a stateless protocol is adequate for the original intent of the protocol, but not so for the more complex applications for which HTTP has been extended, the next topic that we will study. ISS, 2005

HTTP is a Connection-Oriented Protocol • HTTP1.0 was extended to allow a request header line Connection: Keep-Alive to be issued by a client who wishes to maintain a persistent connection with the server; a cooperating server will keep the connection open after sending a response. • In HTTP/1.1, connections are persistent by default. Such a connection allows multiple requests to be send over the same TCP connection. ISS, 2005

Dynamically generated web contents ISS, 2005

Dynamically-generated Web Contents • In the beginning, HTTP was employed to transfer static contents, that is, contents that exist in a constant state, such as a plain text file or an image file. • As the web evolved, applications began to use HTTP for a purpose not originally intended: an application which allows a browser user to retrieve data based on dynamic information entered during an HTTP session. ISS, 2005

Dynamically-generated Web Contents • A typical web application, such as a shopping cart, requires fetching remote data based on data entered by a client at runtime. • For example, an enterprise application typically allows a user to key in data, which is then used to formulate a query to retrieve data from a database, and the outcome is displayed to the user. • Applied to the web, it is desirable to allow a client to submit data during a web session to retrieve data from the web server host, to be displayed by the web browser ISS, 2005

Dynamically-generated Web Contents • A generic HTTP server does not possess the application logic for fetching the data from the data source. • Instead, an external process that has the application logic will serve as an intermediary. • The external process runs on the server host, accepts input data from the web server, exercises its application logic to obtain data from the data source, returns the outcome to the web server, which transmits the outcome to the client. ISS, 2005

Dynamically-generated Web Contents • The first widely adopted protocol to augment HTTP in supporting run-time generated web contents is the Common Gateway Interface (CGI) protocol. • Although rudimentary by comparison, CGI is the predecessor of more sophisticated protocols and facilities (such as the Java Applet and Servlet) that serve similar purposes. • The understanding of CGI and some of its supplementary protocols is important in that it prepares us for the understanding of more advanced protocols and facilities. ISS, 2005

The Common Gateway Interface (CGI) Protocol ISS, 2005

Common Gateway Interface (CGI) • The Common Gateway Interface (CGI) is a standard for providing an interface, or a gateway, between an information server and an external process (that is, a process external to the server). • Using the protocol, a web client may specify a program, known as a CGI script, as the target web object in an HTTP request. • The web server fetches the CGI script, activates it as a process, passing to the process input data transmitted by the web client. The web script executes and transmits its output to the web server, which returns the web-script generated data as the body of a response to the web client. ISS, 2005

CGI - 2 • An HTTP request may specify a CGI program, or CGI script. • A CGI program can be written in: • Programming languages: C. Ada, C++, Fortran; such a program needs to be compiled to generate an executable. • Script languages such as Php, Perl, Tkl, cobra, such a program, referred to as a CGI script, requires the appropriate language interpreter to be present at the server host. • Commonly used for processing user input from HTML forms, and subsequently composing a web page sent as part of the server response. ISS, 2005

CGI Program - 3 • When a web server receives a request whose URI specifies a web program, the web server initiates the execution of the web program. • The web program formulates its output in HTML, which is sent to the server and forwarded to the web client as the HTTP response. ISS, 2005

CGI program ISS, 2005

Action field in a web page A web script can be specified in an action field of a web page. When the web page is submitted, an HTTP request is issued by the browser specifying the web script as the URI: <HTML> <HEAD> <TITLE>A Simple Web Page which illustrates CGI</TITLE> </HEAD> <BODY> <FORM ACTION="Hello.cgi"> <CENTER> Click on the SUBMIT button to activate the CGI script Hello.cgi:<br> <INPUT TYPE="Submit" NAME="submit" VALUE="SUBMIT"> </CENTER> </FORM> </BODY> </HTML> ISS, 2005

Common Gateway Interface (CGI) ISS, 2005

A sample web page (hello.html) which invokes a CGI script <HTML> <HEAD> <TITLE>A web page which invokes a web script</TITLE> </HEAD> <BODY> <H1>This web page illustrates the use of a web script</H1> <P> <BR> The script or program is either a run-script written in a script language such as Perl, or an executable generated from a source program written in a language such as C/C++. </P> <HR> <FORM METHOD="post" ACTION="hello.cgi"> <HR> Press <input type="submit" value="here"> to submit your query. </FORM> <HR> </BODY> </HTML> ISS, 2005

A sample web script hello.c /** * This C program is for a CGI script which generates * the output for a web page. When displayed by a * browser, the message "Hello there!" will be shown * in blue. */ #include <stdio.h> main(int argc, char *argv[]) { printf("Content-type: text/html%c%c",10,10); printf("<font color = blue>"); printf("<H1>Hello there!</H1>"); printf("</font>"); } ISS, 2005

A sample web script hello.pl #!/usr/local/bin/perl # A simple Perl CGI script print "Content-type: text/html\n\n"; print "<head>\n"; print "<title>Hello, World</title>\n"; print "</head>\n"; print "<body>\n"; print "<font color = blue>\n"; print "<h1>Hello, World</h1>\n"; print "</font>\n"; print "</body>\n"; ISS, 2005

Web forms ISS, 2005

A Web Form • You may have noticed that the “hello” example presented does not make use of any user input, and the contents of the dynamically generated web page is predeterminable. This is because the example is provided as an overview of the CGI protocol. • In practice, a CGI script is typically invoked by a special kind of web page known as a web form, to be described in the next section, which accepts input at run time, and invokes a CGI script which makes use of such input. ISS, 2005

A web form • A web form is a special kind of web page which • provides a graphical user interface that prompts input data from a user • invokes the execution of an external program on the web server host, when a submit button on the page is pressed by the user. ISS, 2005

A web form • The code that generates a web form is enclosed between the HTML tags <FORM> ... </FORM> • Within the <FORM> tag attributes can be specified to provide additional information related to the CGI protocol, including: ~ ACTION=<a character string containing the absolute or relative URL of the identification of the external program which is to be initiated by the web server when the form is submitted> ~ METHOD=<a reserved word, POST or GET, which specifies the manner that the external program expects to receive from the web server the collection of data submitted by the user, called the query data.> FORM METHOD="post" ACTION="form.cgi” ISS, 2005

A web form • In the coding for the form, each of the input items (also called an input elements) has a NAME tag. • For each of these items, the browser user enters or selects a value. What is thy NAME: <INPUT NAME=“name"><P> What is thy favorite color: <SELECT NAME="color"> • The collection of the data for the input items is a character string, called a query string, of name=value pairs separated by the & character. name=John%20Chen&color=red • Each name=value pair is encoded using URL-encoding, so that some “unsafe” characters (such as spaces,quotes, %, and &) are mapped to a hexadecimal representation. • For example, the value string “The return is >17%” is encoded as “The%20return%20is%20%3E17%25”. ISS, 2005

A Web Form Query String • An example of a query string for the example form is: name=John%20Doe&quest=peace%20on%20earth&color=azure &swallow=continental&text=The%20return%20is%20%3E17%25 (all on one line) • The collection of the data into a query string, including the encoding of the values, is performed by the browser. • When the form is submitted by the user, the query string is passed to the server in the HTTP request, in a manner depending on the FORM METHOD specified in the form. The query string is then forwarded by the server to the external program. ISS, 2005

Web Form Query String Processing • Based on the form input, the browser assembles the query string. • The string is transmitted to the web server, which in turn passes it on to the external program (the CGI script named in the form). • The manner that the string is transmitted depends on the specification of the FORM METHOD in the web form. ISS, 2005

FORM GET Method – browser to server • Intended for requests of information only. • If GET is specified with the FORM METHOD tag, the query string is transmitted to the server in a HTTP request with a GET method line. <FORM METHOD=“get" ACTION=“getForm.cgi"> • Recall that an HTTP GET request specifies a URI for the web object requested by the client. To accommodate the query string, the syntax for the URI specification was extended to allow the attachment of the query string to the end of the URI (for the CGI script), delimited by the ‘?’ character, as, for example: GET /cgi/getForm.cgi?name=John%20Doe&quest=peace HTTP/1.0 • Since the length of the GET Request-URI line is limited (8K bytes), the length of the query string that can be appended in this manner is also limited. Hence this method is not suitable if the form needs to send a large amount of data, such as data in a text box. ISS, 2005

Form GET method – server to external program • The server invokes the CGI script and passes on the query string that it received from the browser, as appended to the URI in the HTTP request. • The CGI program, or the external program in general, will receive the encoded form input in an environment variable called QUERY_STRING. • Environment variables are variables maintained by the operating system of the server host. • The CGI program retrieves the query string from the environment variable, decodes the character string to obtain the name-value pairs, and uses the parameters during the execution of the program to generate output phrased in HTML. ISS, 2005

FORM POST Method – browser to server • Intended for actions with a side-effect • If POST is specified with the FORM METHOD tag, the query string is transmitted to the server in a HTTP request with a POST method line previous described. <FORM METHOD=“post" ACTION=“postForm.cgi"> • Recall that an HTTP POST request is followed by a request body, which holds text contents to be sent to the server. Using the POST METHOD, the URI of the CGI script is specified with the POST request line, followed by the request header, a blank line, then the query string, as, for example: POST /cgi/postForm.cgi HTTP/1.0 Accept: */* Connection: Keep-Alive Host: myHost.someU.edu User-Agent: Generic name=John%20Doe&quest=peace%20on%20earth&color=azure • Since the length of the request body is unlimited, the query string can be of arbitrary length. Hence the POST method can be used to send any amount of query data to the server. ISS, 2005

Form POST method – server to external program • The server invokes the CGI script and passes on the query string that it received from the browser via the request body. • The CGI program, or the external program in general, will receive the encoded form input on the standard input. • The server will NOT send you an EOF on the end of the data, instead you should use the environment variable CONTENT_LENGTH to determine how much data you should read from (the standard input). • The CGI program reads the query string from the standard input, decode the character string to obtain the name-value pairs, and uses the parameters during the execution of the program to generate output phrased in HTML. ISS, 2005

Encoding and decoding query strings • Whether a query string is obtained from the QUERY_STRING environment variable, or from the standard input, the CGI program must decode the string and extract the name-value pairs from it, so that the parameters may be used for the program’s execution. • Due to the popularity of CGI programs, there are a number of existing libraries or classes that provide routines(functions) and methods for this purpose. For example, Perl has easy-to-use procedures in a library called CGI-lib for the decoding and for extracting the name-value pairs into a data structure called an associative array; and NCSA provides a library of C routines for the same purpose. ISS, 2005

Environment Variables used with CGI • An environment variable defines is a parameter of a user's working environment on a computer system, such as the default directory path for the system to locate a program invoked by the user. On a computer system, environment variables are used across multiple languages and operating systems to provide information to applications that may be specific to a user. • CGI uses environment variables that are set by the HTTP server to pass information about requests from the server to the external program (CGI script). ISS, 2005

Environment Variables used with CGI • Some of the key environment variables related to CGI are listed below: ~ REQUEST_METHOD: The method with which the request was made. For CGI, this is "GET" or "POST". ~QUERY_STRING: If the GET method was specified in the form, this variable contains a character string for the form data. ~ CONTENT_TYPE: the content type of the data, which should be “application/x-www-form-urlencoded” for a query string ~ CONTENT_LENGTH : The length of the query string. ISS, 2005

Web Session State Data ISS, 2005

Web Session and session state data During a session of a web application such as a shopping cart, several HTTP requests are issued, each of which invokes an external program such as a CGI script. ISS, 2005

Web Session and session state data • Data that needs to be shared among CGI scripts invoked successively during a web session are called session state data. • There is no provision in HTTP nor CGI to allow for such sharing, as both of these protocols are stateless and do not support the notion of a session. ISS, 2005

Session Data Sharing Mechanisms • Because of the popularity of Internet applications, a variety of mechanisms have emerged to allow the sharing of session data among CGI scripts (and other external programs). • These mechanisms can be classified as follows: • Server-side facilities • Client-side facilities ISS, 2005

Server-side facilities for session state data • secondary storage (file or database) on the server host may be used as a repository of session state data • software objects which may be employed as state data repository: java beans, session objects, application context state data objects. ISS, 2005

Internet Applications