1.63k likes | 1.83k Views
Part 6: Introduction to CGI and Servlets. CIS 5930-04 – Spring 2001. http://aspen.csit.fsu.edu/it1spring01 Instructors: Geoffrey Fox , Bryan Carpenter Computational Science and Information Technology Florida State University Acknowledgements: Nancy McCracken
E N D
Part 6: Introduction to CGI and Servlets CIS 5930-04 – Spring 2001 http://aspen.csit.fsu.edu/it1spring01 Instructors: Geoffrey Fox , Bryan Carpenter Computational Science and Information Technology Florida State University Acknowledgements: Nancy McCracken Syracuse University dbc@csit.fsu.edu
Introduction • RMI gave us one approach to client/server programming. • The approach was based on the Java language and some far-reaching ideas about remote objects, object serialization, and dynamic class loading. • We could achieve direct integration into the traditional World Wide Web through applets, but the technology is not specifically tied to the Web. • RMI is powerful and general (and interesting), but it can be a slightly heavy-handed approach if actually we only need to interact with users through Web pages. • For the future, it may be more natural to view RMI as a technology for the “middle tier” (or for connectivity in the LAN) rather than for the Web client. dbc@csit.fsu.edu
HTML Forms and CGI • There are long-established techniques for getting information from users through Web browsers (predating the appearance of Java on the Web). • The FORM element of HTML can contain a variety of input fields. • The inputted data is harvested by the browser, suitably encoded, and forwarded to the Web server. • On the server side, the Web server is configured to execute an arbitrary program that processes the user’s form inputs. • This program typically outputs a dynamically generated HTML document containing an appropriate response to the user’s input. • The server-side mechanism is called CGI: Common Gateway Interface. dbc@csit.fsu.edu
CGI and Servlets • In conventional CGI, a Web site developer writes the executable programs that process form inputs in a language such as Perl or C. • The program (or script) is executed once each time a form is submitted. • Servlets provide a more modern, Java-centric approach. • The server incorporates a Java Virtual Machine, which is running continuously. • Invocation of a CGI script is replaced invocation of a method on a servlet object. dbc@csit.fsu.edu
Advantages of Servlets • Invocation of a single Java method is typically much cheaper than starting a whole new program. So servlets are typically more efficient than CGI scripts. • This is important if we planning to centralize processing in the server (rather than, say, delegate processing to an applet or browser script). • Besides this we have the usual advantages of Java: • Portability, • A fully object-oriented environment for large-scale program development. • Library infrastructure for decoding form data, handling cookies, etc (although many of these things are also available in Perl). • Servlets are the foundation for Java Server Pages. dbc@csit.fsu.edu
Plan of this Lecture Set • Review HTML forms and associated HTTP requests. • Briefly describe traditional CGI programming. • Detailed discussion of Java servlets: • Deploying Tomcat as a standalone Web server. • Simple servlets. • The servlet life cycle. • Servlet requests and responses. More on the HTTP protocol. • Approaches to session tracking. Handling cookies. • The servlet session-tracking API. dbc@csit.fsu.edu
References • Core Servlets and JavaServer Pages, Marty Hall, Prentice Hall, 2000. • Good coverage and current, with some discussion of the Tomcat server. • Java Servlet Programming, Jason Hunter and William Grawford, O’Reilly, 1998. • Also good, with some good examples. Slightly out of date. • Java Servlet Specification, v2.2, and other documents, at: http://java.sun.com/products/servlet/ dbc@csit.fsu.edu
HTML Forms dbc@csit.fsu.edu
The HTTP GET request • Before discussing forms, let’s look again at how the GET request normally works. • The following server program listens for HTTP requests, and simply prints the received request to the console. dbc@csit.fsu.edu
A Dummy Web Server public class DummyServer { public static void main(String [] args) throws Exception { ServerSocket server = new ServerSocket(8080) ; while(true) { Socket sock = server.accept() ; BufferedReader in = new BufferedReader( new InputStreamReader(sock.getInputStream())) ; String method = in.readLine() ; System.out.println(method) ; while(true) { String field = in.readLine() ; System.out.println(field) ; if(field.length() == 0) break ; } . . . Send a dummy response to client socket . . . } } dbc@csit.fsu.edu
A GET Request • On the hostsirahI run the dummy server: sirah$ java DummyServer • Now I point a browser at http://sirah.csit.fsu.edu:8080/index.html • The dummy server program might print: GET /index.html HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 <blank line> dbc@csit.fsu.edu
Fields of the GET request • The HTTP GET request consists of a series text fields on separate lines, ended by an empty line. • The first line is the most important: it is called the method field. • In simple GET requests, the second token in the method line is the requested file name, expressed as a path relative to the document root of the server. dbc@csit.fsu.edu
A Simple HTML Form • The form element includes one or more input elements, along with any normal HTML terms: <html> <body> <form method=get action=“http://sirah.csit.fsu.edu:8080/dummy”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> </body> </html> dbc@csit.fsu.edu
Remarks • The form tag includes important attributes method and action. • The method attribute defines the kind of HTTP request sent when the form is submitted: its value can be get or post (see later). • The action attribute is a URL. In normal use it will locate an executable program on the server. In this case it is a reference to my “dummy server”. • An input tag with type attribute text represents a text input field. • An input tag with type attribute submit represents a “submit” button. dbc@csit.fsu.edu
Displaying the Form • If I place this HTML document on a Web Server at a suitable location, and visit its URL with a browser, I see something like: dbc@csit.fsu.edu
Submitting the Form • If I type my name, and click on the “Submit Query”button, the dummy server running on sirah prints: GET /dummy?who=Bryan HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 <blank> dbc@csit.fsu.edu
Remarks • When the form specifying the get method is submitted, the values inputted by the user are effectively appended to the end of the URL specified in the action attribute. • In the HTTP GET request—sent when the submit button is pressed—they appear attached to the second token of the first line of the request. • In simple cases the appended string begins with a ? • This isfollowed by pairs of the form name=value, where name is the name appearing in the name attribute of the input tag, and value is the value entered by the user. • If the form has multiple input fields, the pairs are separated by & dbc@csit.fsu.edu
POST requests • This method of attaching input data to the URL is handy if the user has a relatively simple query (e.g. for a search engine). • For more complex forms it is usually recommended to specify the post method in the form tag, e.g.: <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> • In the HTTP protocol, a POST request differs from a GET request by having some data appended after the headers. dbc@csit.fsu.edu
A Form Using the POST Method <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> Surname: <input type=text name=surname size=32> <p> Surname: <input type=text name=fornames size=40> <p> <input type=submit> </form> dbc@csit.fsu.edu
Extending the Dummy Server • We can modify the dummy server to display POST requests, by declaring a variable contentLength, adding the lines if(field.stubstring(0, 16).equalsIgnoreCase(“Content-Length: ”)) ; contentLength = Integer.parseInt(field.substring(16)) ; inside the loop that reads the headers, and adding for(int i = 0 ; i < contentLength ; i++) int b = in.read() ; System.out.println((char) b) ; } after that loop. dbc@csit.fsu.edu
Submitting the Form • When I click on the “Submit Query” button, the dummy server prints: POST /dummy HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 Content-type: application/x-www-form-urlencoded Content-Length: 39 surname=Carpenter&forenames=David+Bryan dbc@csit.fsu.edu
Remarks • The method field (the first line) now starts with the word POST instead of GET; the data is not appended to the URL. • There are a couple more fields in the header, describing the format of the data. • Most importantly, the form data is now on a separate line at the end of file. • However, the form data is still URL-encoded. dbc@csit.fsu.edu
URL Encoding • URL encoding is a method of wrapping up form-data in a way that will make a legal URL for a GET request. • We have seen that the encoded data consists of a sequence of name=value pairs, separated by &. • In the last example we saw that spaces are replaced by +. • Non-alphanumeric characters are converted to the form %XX, where XX is a two digit hexadecimal code. • In particular, line breaks in multi-line form data (e.g. addresses) become %0D%0A—the hex ASCII codes for a carriage-return, new-line sequence. • URL encoding is somewhat redundant for the POST method, but it is the default anyway. dbc@csit.fsu.edu
More Options for the input Tag • We can make a group of radio buttons in an HTML form by using a set of input tags with the type attribute set to radio. • Tags belonging to the same button group should have the same name attribute, and distinct value attributes, e.g.: <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> Favorite primary color: <p> Red: <input type=radio name=color value=red> Blue: <input type=radio name=color value=blue> Green: <input type=radio name=color value=green> <p> <input type=submit> </form> dbc@csit.fsu.edu
Radio Buttons • The message sent to the server is: ... Content-type: application/x-www-form-urlencoded Content-Length: 10 color=blue dbc@csit.fsu.edu
Checkboxes <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> What pets do you own? <p> <input type=checkbox name=pets value=dog checked> Dog <br> <input type=checkbox name=pets value=cat> Cat <br> <input type=checkbox name=pets value=bird> Bird <br> <input type=checkbox name=pets value=fish> Fish <p> <input type=submit> </form> • Example from “HTML and XHTML: The Definitive Guide”, O’Reilly. dbc@csit.fsu.edu
Checkboxes • The message posted to the server is: ... pets=dog&pets=bird • Note there is no requirement that a form map a name to a unique value. dbc@csit.fsu.edu
File-Selection • You can name a local file in an input element, and have the entire contents of the file posted by browser to server. • This is not allowed using the default URL-encoding for form data. Instead you must specify multi-part MIME encoding in the formelement, e.g.: <form method=post enctype=“multipart/form-data” action=“http://sirah.csit.fsu.edu:8080/dummy”> Course: <input name=course size=20> <p> Students file: <input type=file name=students size=32> <p> <input type=submit> </form> dbc@csit.fsu.edu
File-Selection Entry • With multi-part encoding, the data is no longer sent on a single line. • On submission the DummyServer prints. . . dbc@csit.fsu.edu
Output of DummyServer on submit POST /dummy HTTP/1.0 Referer: http://sirah.csit.fsu.edu/users/dbc/forms/form5.html ... Content-type: multipart/form-data; boundary=---------------------------269912718414714 Content-Length: 455 -----------------------------269912718414714 Content-Disposition: form-data; name="course" CIS6930 -----------------------------269912718414714 Content-Disposition: form-data; name="students"; filename="students" wcao flora Fulay gao ... zhao6930 zheng -----------------------------269912718414714-- dbc@csit.fsu.edu
Remarks • Each form field has its own section in the posted file, separated by a delimiter specified in the Content-type field of the header. • Within each section there are one or more header lines, followed by a blank line, followed by the form data. • The values can contain binary data. There is no “URL-encoding”. dbc@csit.fsu.edu
Masked and Hidden fields • The input to a text field can be masked by setting the type attribute to password. The entered text will not be echoed to the screen. • If the type attribute is set to hidden, the input field is not displayed at all. This kind of field is often used in HTML forms dynamically generated by CGI scripts. • Hidden fields allow the CGI scripts to keep track of “session” information over an interaction that involves multiple forms—hidden fields may contain values characterizing the session. • Use of hidden fields will be one of the topics in the lectures on servlets. dbc@csit.fsu.edu
Text Areas • Similar to text input fields, but allow multi-line input. • Included in a form by using the textarea tag, e.g.: <textarea name=address cols=40 rows=3> . . . optional default text goes here . . . </textarea> • With default (URL) encoding, lines of input are separated by carriage return/newline, coded as %0D%0A. dbc@csit.fsu.edu
Text Area Input • Data posted to server: address=Bryan+Carpenter%0D%0ACSIT%2C+FSU%0D%0ATallahassee%2C+FL+32306-4120 dbc@csit.fsu.edu
Scrollable Menus (Lists) • For long lists of options, when checkboxes become too tedious: <select name=pets size=3 multiple> <option value=dog> Dog <option value=cat> Cat <option value=bird> Bird <option value=fish> Fish </select> • The value attribute in the option tag is optional: default value returned is the displayed string, immediately following the tag. • Without the multiple attribute, only a single option can be selected. dbc@csit.fsu.edu
List Input • The message posted to the server is: ... pets=dog&pets=bird dbc@csit.fsu.edu
Conventional CGI dbc@csit.fsu.edu
Handling Form Data on the Server • In conventional CGI programming, the URL in the action attribute of a form will identify an executable file somewhere in the Web Server’s document hierarchy. • A common server convention is that these executables live in a subdirectory of cgi-bin/ • The executable file may be written in any language. For definiteness we will assume it is written in Perl, and refer to it as a CGI script. • The Web Server program will invoke the CGI script, and pass it the form data, either through environment variables or by piping data to standard input of the script. • The CGI script generates a response to the form, which is piped to the Web server through its standard output, then returned to the browser. dbc@csit.fsu.edu
Operation of a CGI Script • At the most basic level, a CGI script must • Parse the input (the form data) from the server, and • Generate a response. • Most often the response is the text of a dynamically generated HTML document, preceded by some HTTP headers. • In practice the only required HTTP header is the Content-type header. The Web Server will fill in other necessary headers automatically. • Even if there is no meaningful response to the input data, the CGI script must output an empty message, or some error message. • Otherwise the server will not close the connection to the client, and a browser error will occur. dbc@csit.fsu.edu
“Hello World” CGI Script • In the directory /home/httpd/cgi-bin/users/dbc on sirah, I create the file hello.pl, with contents: #!/usr/bin/perl print “Content-type: text/html\n\n” ; print “<html><body><h1>Hello World!</h1></body></html>” ; • I mark this file world readable, and mark it executable: sirah$ chmod o+r hello.pl sirah$ chmod +x hello.pl • Now I point my browser at the URL: http://sirah/cgi-bin/users/dbc/hello.pl dbc@csit.fsu.edu
Output from CGI Script • The novel feature here is the the HTML was dynamically generated: it was printed out on the fly by the Perl script. dbc@csit.fsu.edu
Retrieving Form Data • Several environment variables are set up by the server to pass information about the request to the Perl script. • If the form data was sent using a GET request, the most important is QUERY_STRING, which contains all the text in the URL following the first ? character. • If the form data was sent using a POST request, the environment variable CONTENT_LENGTH contains the length in bytes of the posted data. To retrieve this data, these bytes are read from the standard input of the script. dbc@csit.fsu.edu
GET example • I change our first form to submit data to a CGI script: <form method=get action=“http://sirah.csit.fsu.edu/cgi-bin/users/dbc/getEg.pl”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> and define getEg.pl by: #!/usr/bin/perl print “Content-type: text/html\n\n” ; print “<html><body><h1>Hello $ENV{QUERY_STRING}!</h1></body></html>\n” ; • When I point the browser at the form, enter my name, and submit the form, the page returned to the browser contains the message: Hello who=Bryan! dbc@csit.fsu.edu
POST example • Change the form as follows: <form method=post action=“http://sirah.csit.fsu.edu/cgi-bin/users/dbc/postEg.pl”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> and define postEg.pl by: #!/usr/bin/perl print “Content-type: text/html\n\n” ; for($i = 0 ; $i < $ENV{CONTENT_LENGTH} ; $i++) { $in .= getc ; } print “<html><body><h1>Hello $i!</h1></body></html>\n” ; dbc@csit.fsu.edu
Using the CGI module • The previous example illustrate the underlying mechanisms used to communicate between server and CGI program. • One could go on to use the text processing features of Perl to parse the form data and generate meaningful responses. • In modern Perl you can (and presumably should) use the CGI module to hide many of these details—especially extracting form parameter. dbc@csit.fsu.edu
CGI module example • Change the form as follows: <form method=post action=“http://sirah.csit.fsu.edu/cgi-bin/users/dbc/CGIEg.pl”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> and define CGIEg.pl by: #!/usr/bin/perl use CGI qw( :standard) ; $name = param(“who”) ; print “Content-type: text/html\n\n” ; print “<html><body><h1>Hello $name!</h1></body></html>\n” ; • Now the browser gets a more friendly message like: Hello Bryan! dbc@csit.fsu.edu
Getting Started with Servlets dbc@csit.fsu.edu
Server Software • Standard Web servers typically need some additional software to allow them to run servlets. Options include: • Apache Tomcat The official reference implementation for the servlet 2.2 and JSP 1.1 specifications. It can stand alone or be integrated into the Apache Web server. • JavaServer Web Development Kit (JSWDK) A small standalone Web server mainly intended for servlet development. • Sun’s Java Web server An early server supporting servlets. Now apparently obsolete. • Allaire JRun, New Atlanta’s ServletExec, . . . dbc@csit.fsu.edu
Tomcat • In these lectures we will use Apache Tomcat for examples. • For debugging of servlets it seems to be necessary to use a stand-alone server, dedicated to the application you are developing. • The current architecture of servlets makes revision of servlet classes already loaded in a Web server either disruptive or expensive. In general you need to establish your classes are working smoothly before they are deployed in a production server. • Hence you will be encouraged to install your own private server for developing Web applications. • Tomcat is the flagship product of the Jakarta project, which produces server software based on Java. dbc@csit.fsu.edu
Typical Modes of Operation of Tomcat Tomcat Browser 1. Stand-alone 8080 Servlet Request Client Server Apache 2. In-process servlet container Browser Tomcat 80 Servlet Request Client Server Apache 3. Out-of- process servlet container 80 Browser Servlet Request Tomcat Client 8007 Server dbc@csit.fsu.edu