1 / 163

CIS 5930-04 – Spring 2001

Part 6: Introduction to CGI and Servlets. CIS 5930-04 – Spring 2001. http://aspen.csit.fsu.edu/it1spring01 Instructors: Geoffrey Fox , Bryan Carpenter Computational Science and Information Technology Florida State University Acknowledgements: Nancy McCracken

zazu
Download Presentation

CIS 5930-04 – Spring 2001

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part 6: Introduction to CGI and Servlets CIS 5930-04 – Spring 2001 http://aspen.csit.fsu.edu/it1spring01 Instructors: Geoffrey Fox , Bryan Carpenter Computational Science and Information Technology Florida State University Acknowledgements: Nancy McCracken Syracuse University dbc@csit.fsu.edu

  2. Introduction • RMI gave us one approach to client/server programming. • The approach was based on the Java language and some far-reaching ideas about remote objects, object serialization, and dynamic class loading. • We could achieve direct integration into the traditional World Wide Web through applets, but the technology is not specifically tied to the Web. • RMI is powerful and general (and interesting), but it can be a slightly heavy-handed approach if actually we only need to interact with users through Web pages. • For the future, it may be more natural to view RMI as a technology for the “middle tier” (or for connectivity in the LAN) rather than for the Web client. dbc@csit.fsu.edu

  3. HTML Forms and CGI • There are long-established techniques for getting information from users through Web browsers (predating the appearance of Java on the Web). • The FORM element of HTML can contain a variety of input fields. • The inputted data is harvested by the browser, suitably encoded, and forwarded to the Web server. • On the server side, the Web server is configured to execute an arbitrary program that processes the user’s form inputs. • This program typically outputs a dynamically generated HTML document containing an appropriate response to the user’s input. • The server-side mechanism is called CGI: Common Gateway Interface. dbc@csit.fsu.edu

  4. CGI and Servlets • In conventional CGI, a Web site developer writes the executable programs that process form inputs in a language such as Perl or C. • The program (or script) is executed once each time a form is submitted. • Servlets provide a more modern, Java-centric approach. • The server incorporates a Java Virtual Machine, which is running continuously. • Invocation of a CGI script is replaced invocation of a method on a servlet object. dbc@csit.fsu.edu

  5. Advantages of Servlets • Invocation of a single Java method is typically much cheaper than starting a whole new program. So servlets are typically more efficient than CGI scripts. • This is important if we planning to centralize processing in the server (rather than, say, delegate processing to an applet or browser script). • Besides this we have the usual advantages of Java: • Portability, • A fully object-oriented environment for large-scale program development. • Library infrastructure for decoding form data, handling cookies, etc (although many of these things are also available in Perl). • Servlets are the foundation for Java Server Pages. dbc@csit.fsu.edu

  6. Plan of this Lecture Set • Review HTML forms and associated HTTP requests. • Briefly describe traditional CGI programming. • Detailed discussion of Java servlets: • Deploying Tomcat as a standalone Web server. • Simple servlets. • The servlet life cycle. • Servlet requests and responses. More on the HTTP protocol. • Approaches to session tracking. Handling cookies. • The servlet session-tracking API. dbc@csit.fsu.edu

  7. References • Core Servlets and JavaServer Pages, Marty Hall, Prentice Hall, 2000. • Good coverage and current, with some discussion of the Tomcat server. • Java Servlet Programming, Jason Hunter and William Grawford, O’Reilly, 1998. • Also good, with some good examples. Slightly out of date. • Java Servlet Specification, v2.2, and other documents, at: http://java.sun.com/products/servlet/ dbc@csit.fsu.edu

  8. HTML Forms dbc@csit.fsu.edu

  9. The HTTP GET request • Before discussing forms, let’s look again at how the GET request normally works. • The following server program listens for HTTP requests, and simply prints the received request to the console. dbc@csit.fsu.edu

  10. A Dummy Web Server public class DummyServer { public static void main(String [] args) throws Exception { ServerSocket server = new ServerSocket(8080) ; while(true) { Socket sock = server.accept() ; BufferedReader in = new BufferedReader( new InputStreamReader(sock.getInputStream())) ; String method = in.readLine() ; System.out.println(method) ; while(true) { String field = in.readLine() ; System.out.println(field) ; if(field.length() == 0) break ; } . . . Send a dummy response to client socket . . . } } dbc@csit.fsu.edu

  11. A GET Request • On the hostsirahI run the dummy server: sirah$ java DummyServer • Now I point a browser at http://sirah.csit.fsu.edu:8080/index.html • The dummy server program might print: GET /index.html HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 <blank line> dbc@csit.fsu.edu

  12. Fields of the GET request • The HTTP GET request consists of a series text fields on separate lines, ended by an empty line. • The first line is the most important: it is called the method field. • In simple GET requests, the second token in the method line is the requested file name, expressed as a path relative to the document root of the server. dbc@csit.fsu.edu

  13. A Simple HTML Form • The form element includes one or more input elements, along with any normal HTML terms: <html> <body> <form method=get action=“http://sirah.csit.fsu.edu:8080/dummy”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> </body> </html> dbc@csit.fsu.edu

  14. Remarks • The form tag includes important attributes method and action. • The method attribute defines the kind of HTTP request sent when the form is submitted: its value can be get or post (see later). • The action attribute is a URL. In normal use it will locate an executable program on the server. In this case it is a reference to my “dummy server”. • An input tag with type attribute text represents a text input field. • An input tag with type attribute submit represents a “submit” button. dbc@csit.fsu.edu

  15. Displaying the Form • If I place this HTML document on a Web Server at a suitable location, and visit its URL with a browser, I see something like: dbc@csit.fsu.edu

  16. Submitting the Form • If I type my name, and click on the “Submit Query”button, the dummy server running on sirah prints: GET /dummy?who=Bryan HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 <blank> dbc@csit.fsu.edu

  17. Remarks • When the form specifying the get method is submitted, the values inputted by the user are effectively appended to the end of the URL specified in the action attribute. • In the HTTP GET request—sent when the submit button is pressed—they appear attached to the second token of the first line of the request. • In simple cases the appended string begins with a ? • This isfollowed by pairs of the form name=value, where name is the name appearing in the name attribute of the input tag, and value is the value entered by the user. • If the form has multiple input fields, the pairs are separated by & dbc@csit.fsu.edu

  18. POST requests • This method of attaching input data to the URL is handy if the user has a relatively simple query (e.g. for a search engine). • For more complex forms it is usually recommended to specify the post method in the form tag, e.g.: <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> • In the HTTP protocol, a POST request differs from a GET request by having some data appended after the headers. dbc@csit.fsu.edu

  19. A Form Using the POST Method <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> Surname: <input type=text name=surname size=32> <p> Surname: <input type=text name=fornames size=40> <p> <input type=submit> </form> dbc@csit.fsu.edu

  20. Extending the Dummy Server • We can modify the dummy server to display POST requests, by declaring a variable contentLength, adding the lines if(field.stubstring(0, 16).equalsIgnoreCase(“Content-Length: ”)) ; contentLength = Integer.parseInt(field.substring(16)) ; inside the loop that reads the headers, and adding for(int i = 0 ; i < contentLength ; i++) int b = in.read() ; System.out.println((char) b) ; } after that loop. dbc@csit.fsu.edu

  21. Submitting the Form • When I click on the “Submit Query” button, the dummy server prints: POST /dummy HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 Content-type: application/x-www-form-urlencoded Content-Length: 39 surname=Carpenter&forenames=David+Bryan dbc@csit.fsu.edu

  22. Remarks • The method field (the first line) now starts with the word POST instead of GET; the data is not appended to the URL. • There are a couple more fields in the header, describing the format of the data. • Most importantly, the form data is now on a separate line at the end of file. • However, the form data is still URL-encoded. dbc@csit.fsu.edu

  23. URL Encoding • URL encoding is a method of wrapping up form-data in a way that will make a legal URL for a GET request. • We have seen that the encoded data consists of a sequence of name=value pairs, separated by &. • In the last example we saw that spaces are replaced by +. • Non-alphanumeric characters are converted to the form %XX, where XX is a two digit hexadecimal code. • In particular, line breaks in multi-line form data (e.g. addresses) become %0D%0A—the hex ASCII codes for a carriage-return, new-line sequence. • URL encoding is somewhat redundant for the POST method, but it is the default anyway. dbc@csit.fsu.edu

  24. More Options for the input Tag • We can make a group of radio buttons in an HTML form by using a set of input tags with the type attribute set to radio. • Tags belonging to the same button group should have the same name attribute, and distinct value attributes, e.g.: <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> Favorite primary color: <p> Red: <input type=radio name=color value=red> Blue: <input type=radio name=color value=blue> Green: <input type=radio name=color value=green> <p> <input type=submit> </form> dbc@csit.fsu.edu

  25. Radio Buttons • The message sent to the server is: ... Content-type: application/x-www-form-urlencoded Content-Length: 10 color=blue dbc@csit.fsu.edu

  26. Checkboxes <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> What pets do you own? <p> <input type=checkbox name=pets value=dog checked> Dog <br> <input type=checkbox name=pets value=cat> Cat <br> <input type=checkbox name=pets value=bird> Bird <br> <input type=checkbox name=pets value=fish> Fish <p> <input type=submit> </form> • Example from “HTML and XHTML: The Definitive Guide”, O’Reilly. dbc@csit.fsu.edu

  27. Checkboxes • The message posted to the server is: ... pets=dog&pets=bird • Note there is no requirement that a form map a name to a unique value. dbc@csit.fsu.edu

  28. File-Selection • You can name a local file in an input element, and have the entire contents of the file posted by browser to server. • This is not allowed using the default URL-encoding for form data. Instead you must specify multi-part MIME encoding in the formelement, e.g.: <form method=post enctype=“multipart/form-data” action=“http://sirah.csit.fsu.edu:8080/dummy”> Course: <input name=course size=20> <p> Students file: <input type=file name=students size=32> <p> <input type=submit> </form> dbc@csit.fsu.edu

  29. File-Selection Entry • With multi-part encoding, the data is no longer sent on a single line. • On submission the DummyServer prints. . . dbc@csit.fsu.edu

  30. Output of DummyServer on submit POST /dummy HTTP/1.0 Referer: http://sirah.csit.fsu.edu/users/dbc/forms/form5.html ... Content-type: multipart/form-data; boundary=---------------------------269912718414714 Content-Length: 455 -----------------------------269912718414714 Content-Disposition: form-data; name="course" CIS6930 -----------------------------269912718414714 Content-Disposition: form-data; name="students"; filename="students" wcao flora Fulay gao ... zhao6930 zheng -----------------------------269912718414714-- dbc@csit.fsu.edu

  31. Remarks • Each form field has its own section in the posted file, separated by a delimiter specified in the Content-type field of the header. • Within each section there are one or more header lines, followed by a blank line, followed by the form data. • The values can contain binary data. There is no “URL-encoding”. dbc@csit.fsu.edu

  32. Masked and Hidden fields • The input to a text field can be masked by setting the type attribute to password. The entered text will not be echoed to the screen. • If the type attribute is set to hidden, the input field is not displayed at all. This kind of field is often used in HTML forms dynamically generated by CGI scripts. • Hidden fields allow the CGI scripts to keep track of “session” information over an interaction that involves multiple forms—hidden fields may contain values characterizing the session. • Use of hidden fields will be one of the topics in the lectures on servlets. dbc@csit.fsu.edu

  33. Text Areas • Similar to text input fields, but allow multi-line input. • Included in a form by using the textarea tag, e.g.: <textarea name=address cols=40 rows=3> . . . optional default text goes here . . . </textarea> • With default (URL) encoding, lines of input are separated by carriage return/newline, coded as %0D%0A. dbc@csit.fsu.edu

  34. Text Area Input • Data posted to server: address=Bryan+Carpenter%0D%0ACSIT%2C+FSU%0D%0ATallahassee%2C+FL+32306-4120 dbc@csit.fsu.edu

  35. Scrollable Menus (Lists) • For long lists of options, when checkboxes become too tedious: <select name=pets size=3 multiple> <option value=dog> Dog <option value=cat> Cat <option value=bird> Bird <option value=fish> Fish </select> • The value attribute in the option tag is optional: default value returned is the displayed string, immediately following the tag. • Without the multiple attribute, only a single option can be selected. dbc@csit.fsu.edu

  36. List Input • The message posted to the server is: ... pets=dog&pets=bird dbc@csit.fsu.edu

  37. Conventional CGI dbc@csit.fsu.edu

  38. Handling Form Data on the Server • In conventional CGI programming, the URL in the action attribute of a form will identify an executable file somewhere in the Web Server’s document hierarchy. • A common server convention is that these executables live in a subdirectory of cgi-bin/ • The executable file may be written in any language. For definiteness we will assume it is written in Perl, and refer to it as a CGI script. • The Web Server program will invoke the CGI script, and pass it the form data, either through environment variables or by piping data to standard input of the script. • The CGI script generates a response to the form, which is piped to the Web server through its standard output, then returned to the browser. dbc@csit.fsu.edu

  39. Operation of a CGI Script • At the most basic level, a CGI script must • Parse the input (the form data) from the server, and • Generate a response. • Most often the response is the text of a dynamically generated HTML document, preceded by some HTTP headers. • In practice the only required HTTP header is the Content-type header. The Web Server will fill in other necessary headers automatically. • Even if there is no meaningful response to the input data, the CGI script must output an empty message, or some error message. • Otherwise the server will not close the connection to the client, and a browser error will occur. dbc@csit.fsu.edu

  40. “Hello World” CGI Script • In the directory /home/httpd/cgi-bin/users/dbc on sirah, I create the file hello.pl, with contents: #!/usr/bin/perl print “Content-type: text/html\n\n” ; print “<html><body><h1>Hello World!</h1></body></html>” ; • I mark this file world readable, and mark it executable: sirah$ chmod o+r hello.pl sirah$ chmod +x hello.pl • Now I point my browser at the URL: http://sirah/cgi-bin/users/dbc/hello.pl dbc@csit.fsu.edu

  41. Output from CGI Script • The novel feature here is the the HTML was dynamically generated: it was printed out on the fly by the Perl script. dbc@csit.fsu.edu

  42. Retrieving Form Data • Several environment variables are set up by the server to pass information about the request to the Perl script. • If the form data was sent using a GET request, the most important is QUERY_STRING, which contains all the text in the URL following the first ? character. • If the form data was sent using a POST request, the environment variable CONTENT_LENGTH contains the length in bytes of the posted data. To retrieve this data, these bytes are read from the standard input of the script. dbc@csit.fsu.edu

  43. GET example • I change our first form to submit data to a CGI script: <form method=get action=“http://sirah.csit.fsu.edu/cgi-bin/users/dbc/getEg.pl”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> and define getEg.pl by: #!/usr/bin/perl print “Content-type: text/html\n\n” ; print “<html><body><h1>Hello $ENV{QUERY_STRING}!</h1></body></html>\n” ; • When I point the browser at the form, enter my name, and submit the form, the page returned to the browser contains the message: Hello who=Bryan! dbc@csit.fsu.edu

  44. POST example • Change the form as follows: <form method=post action=“http://sirah.csit.fsu.edu/cgi-bin/users/dbc/postEg.pl”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> and define postEg.pl by: #!/usr/bin/perl print “Content-type: text/html\n\n” ; for($i = 0 ; $i < $ENV{CONTENT_LENGTH} ; $i++) { $in .= getc ; } print “<html><body><h1>Hello $i!</h1></body></html>\n” ; dbc@csit.fsu.edu

  45. Using the CGI module • The previous example illustrate the underlying mechanisms used to communicate between server and CGI program. • One could go on to use the text processing features of Perl to parse the form data and generate meaningful responses. • In modern Perl you can (and presumably should) use the CGI module to hide many of these details—especially extracting form parameter. dbc@csit.fsu.edu

  46. CGI module example • Change the form as follows: <form method=post action=“http://sirah.csit.fsu.edu/cgi-bin/users/dbc/CGIEg.pl”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> and define CGIEg.pl by: #!/usr/bin/perl use CGI qw( :standard) ; $name = param(“who”) ; print “Content-type: text/html\n\n” ; print “<html><body><h1>Hello $name!</h1></body></html>\n” ; • Now the browser gets a more friendly message like: Hello Bryan! dbc@csit.fsu.edu

  47. Getting Started with Servlets dbc@csit.fsu.edu

  48. Server Software • Standard Web servers typically need some additional software to allow them to run servlets. Options include: • Apache Tomcat The official reference implementation for the servlet 2.2 and JSP 1.1 specifications. It can stand alone or be integrated into the Apache Web server. • JavaServer Web Development Kit (JSWDK) A small standalone Web server mainly intended for servlet development. • Sun’s Java Web server An early server supporting servlets. Now apparently obsolete. • Allaire JRun, New Atlanta’s ServletExec, . . . dbc@csit.fsu.edu

  49. Tomcat • In these lectures we will use Apache Tomcat for examples. • For debugging of servlets it seems to be necessary to use a stand-alone server, dedicated to the application you are developing. • The current architecture of servlets makes revision of servlet classes already loaded in a Web server either disruptive or expensive. In general you need to establish your classes are working smoothly before they are deployed in a production server. • Hence you will be encouraged to install your own private server for developing Web applications. • Tomcat is the flagship product of the Jakarta project, which produces server software based on Java. dbc@csit.fsu.edu

  50. Typical Modes of Operation of Tomcat Tomcat Browser 1. Stand-alone 8080 Servlet Request Client Server Apache 2. In-process servlet container Browser Tomcat 80 Servlet Request Client Server Apache 3. Out-of- process servlet container 80 Browser Servlet Request Tomcat Client 8007 Server dbc@csit.fsu.edu

More Related