340 likes | 479 Views
Web services. Clients and servers communicate using the HyperText Transfer Protocol (HTTP) Current version is HTTP/1.1 RFC 2616, June, 1999. HTTP request. web client (browser). web server. HTTP response. Static and dynamic content.
E N D
Web services • Clients and servers communicate using the HyperText Transfer Protocol (HTTP) • Current version is HTTP/1.1 • RFC 2616, June, 1999. HTTP request web client (browser) web server HTTP response
Static and dynamic content • The content returned in HTTP responses can be either static or dynamic. • Static content: • content stored in files and retrieved in response to an HTTP request • HTML files • images • audio clips • Dynamic content: • content produced on-the-fly in response to an HTTP request • usually by CGI process executed by the server.
Backus-Naur Form (BNF) • Developed in late 50’s for FORTRAN • BNF is a way to write grammars that specify computer languages, protocols, and command line arguments to programs (and many others) • BNF describes the legal strings for that language or protocol • The HTTP spec uses the following BNF constructs (among others): • “literal” • literal text • rule1 | rule2 • rule 1 or rule 2 is allowed
BNF (cont) • HTTP BNF constructs • (rule1 rule2) • sequence of elements: rule1 followed by rule2 • (elem (foo | bar) elem) • “elem foo elem” or “elem bar elem” are allowed. • <n>*<m>(element) • at least n and at most m occurences of element. • default values are n=0 and m=infinity. • *(element) allows zero or more occurences of element. • 1(element) requires at least one element. • 1*2(element) allows one or two. • [rule] • rule is optional
BNF (cont) • Basic rules • OCTET = <any 8-bit sequence of data> • CHAR = <any ASCI I char> • ALPHA = <A...Z and a...z> • DIGIT = <0…9> • SP = <ASCII SPACE character (32)> • CRLF = <newline \n> • TEXT = <printable ASCII chars>
URIs and URLs • network resources are identified by Universal Resource Indicators (URIs) • The most familiar is the absolute URI known as the HTTP URL: • http-url = “http:” “//” host [“:” port] [abs_path] • port defaults to “80” • abs_path defaults to “/” • abs_path ending in / defaults to …/index.html • examples: • http://www.iiit.net:80/index.html • http://www.iiit.net/index.html • http://www.iiit.net
Uniform Resource Locators (URLs) • The internet name of the site containing the resource • The type of service (eg. HTTP, FTP) • The Internet portnumber of this service. • The location of the resource in the directory structure of the server.
http://www.address.edu:1234/path/subdir/file.ext | | | | | |service| | | | |____ host _____| | | | | | |port| | | file and | |_ resource details _|
HTTP/1.1 messages An HTTP message is either a Request or a Response: HTTP-message = Request | Response Requests and responses have the same basic form: generic-message = start-line *message-header CRLF [message body] start-line = Request-line | Status line message-header = field-name “:” [field value] CRLF message-body = <e.g., HTML file>
HTTP/1.1 requests Request = Method SP Request-URI SP HTTP-VERSION CRLF *(general-header | request-header | entity header) CRLF [ message-body ] • Method: tells the server what operation to perform • GET: retrieve file • OPTIONS: retrieve server and access capabilities • Request-URI: identifies the resource to manipulate • data file (HTML), executable file (CGI) • headers: parameterize the method • Accept-Language: en-us • User-Agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows 98) • message-body: text characters
HTTP/1.1 responses Response = HTTP-Version SP Status-Code SP Reason-Phrase CRLF *(general-header | response-header | entity header) CRLF [ message-body ] • Status code: 3-digit number • Reason-Phrase: explanation of status code • headers: parameterize the response • Date: Thu, 22 Jul 1999 23:42:18 GMT • Server: Apache/1.2.5 BSDI3.0-PHP/FI-2.0 • Content-Type: text/html • message-body: • file
home cgi-bin html test.pl index.html foo.html How servers interpret Request-URIs • GET / HTTP/1.1 • resolves to home/html/index.html • action: retrieves index.html • GET /index.html ... • home/html/index.html • action: retrieves index.html • GET /foo.html ... • home/html/foo.html • action: retrieves foo.html • GET /cgi-bin/test.pl … • home/cgi-bin/test.pl • action: runs test.pl • GET http://www.iiit.net/index.html • home/html/index.html • action: retrieves index.html
Example HTTP/1.1 conversation gkgarg> telnet www.iiit.net 80 Trying 196.12.32.11... Connected to www.iiit.net. Escape character is '^]'. GET /index.htm HTTP/1.1 ;request line Host: www.iiit.net ;request hdr CRLF HTTP/1.1 200 OK ;status line Date: Wed, 06 Sep 2000 19:53:49 GMT ;response hdr Server: Apache/1.3.9 (Unix) (Red Hat/Linux) Last-Modified: Sat, 02 Sep 2000 06:42:15 GMT ETag: "3ab62-19ee-39b0a147" Accept-Ranges: bytes Content-Length: 6638 Content-Type: text/html <html> ;beginning of the message body 6638 bytes <head> <title>IIIT HOME</title> Request sent by client Response sent by server
MIME • WWW uses MIME (Multipurpose Internet Mail Extension) types to define the type of a particular piece of information being send from a web server to a browser which in turn determines how the data should be treated.
OPTIONS • Retrieves information about the server in general or resources on that server, without actually retrieving the resource. • Request URIs: • if request URI = “*”, then the request is about the server in general • Is the server up? • Is it HTTP/1.1 compliant? • What brand of server? • What OS is it running? • if request URI != “*”, then the request applies to the options that available when accessing that resource: • what methods can the client use to access the resource?
OPTIONS (www.iiit.net) gkgarg> telnet www.iiit.net 80 Trying 196.12.32.11... Connected to www.iiit.net. Escape character is '^]'. OPTIONS * HTTP/1.1 Host: www.iiit.net CRLF HTTP/1.1 200 OK Date: Wed, 06 Sep 2000 19:57:24 GMT Server: Apache/1.3.9 (Unix) (Red Hat/Linux) Content-Length: 0 Allow: GET, HEAD, OPTIONS, TRACE Host is a required header in HTTP/1.1 but not in HTTP/1.0 Request Response
OPTIONS (www.iitd.ernet.in) [vishal@mail vishal]$ telnet www.iitd.ernet.in 80 Trying 202.141.68.9... Connected to www.iitd.ernet.in. Escape character is '^]'. OPTIONS /cgi-bin/ HTTP/1.1 Host: www.iitd.ernet.in CRLF HTTP/1.1 200 OK Date: Wed, 06 Sep 2000 19:11:12 GMT Server: Apache/1.3.12 (Unix) (Red Hat/Linux) Content-Length: 0 Allow: GET, HEAD, POST, OPTIONS, TRACE
OPTIONS (microsoft.com) telnet microsoft.com 80 Trying 207.46.230.218... Connected to microsoft.com. Escape character is '^]'. OPTIONS * HTTP/1.1 Host: microsoft.com CRLF HTTP/1.1 200 OK Server: Microsoft-IIS/5.0 Date: Wed, 06 Sep 2000 19:11:57 GMT Content-Length: 0 Accept-Ranges: bytes DASL: <DAV:sql> DAV: 1, 2 Public: OPTIONS, TRACE, GET, HEAD, DELETE, PUT, POST, COPY, MOVE, MKCOL, PROPFIN D, PROPPATCH, LOCK, UNLOCK, SEARCH Allow: OPTIONS, TRACE, GET, HEAD, DELETE, PUT, POST, COPY, MOVE, MKCOL, PROPFIND , PROPPATCH, LOCK, UNLOCK, SEARCH Cache-Control: private
[vishal@mail vishal]$ telnet microsoft.com 80 Trying 207.46.230.219... Connected to microsoft.com. Escape character is '^]'. OPTIONS / HTTP/1.1 Host: microsoft.com CRLF HTTP/1.1 200 OK Server: Microsoft-IIS/5.0 Date: Thu, 07 Sep 2000 10:44:48 GMT MS-Author-Via: DAV Content-Length: 0 Accept-Ranges: none DASL: <DAV:sql> DAV: 1, 2 Public: OPTIONS, TRACE, GET, HEAD, DELETE, PUT, POST, COPY, MOVE, MKCOL, PROPFIN D, PROPPATCH, LOCK, UNLOCK, SEARCH Allow: OPTIONS, TRACE, GET, HEAD, COPY, PROPFIND, SEARCH, LOCK, UNLOCK Cache-Control: private
OPTIONS (amazon.com) [vishal@mail vishal]$ telnet amazon.com 80 Trying 208.216.181.15... Connected to amazon.com. Escape character is '^]'. OPTIONS / HTTP/1.0 CRLF HTTP/1.1 200 OK Date: Wed, 06 Sep 2000 19:16:41 GMT Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412 (Unix) Content-Length: 0 Allow: GET, HEAD, POST, OPTIONS, TRACE Connection: close
OPTIONS (netscape.com) [vishal@mail vishal]$ telnet www.netscape.com 80 Trying 64.236.8.10... Connected to www-jp.netscape.com. Escape character is '^]'. OPTIONS * HTTP/1.1 Host: www.netscape.com CRLF HTTP/1.1 200 OK Server: Netscape-Enterprise/3.6 SP1 Date: Thu, 07 Sep 2000 10:53:18 GMT Content-length: 0 Public: HEAD, GET, PUT, POST
OPTIONS (netscape.com) [vishal@mail vishal]$ telnet www.netscape.com 80 Trying 64.236.8.10... Connected to www-jp.netscape.com. Escape character is '^]'. OPTIONS /index.html HTTP/1.1 Host: www.netscape.com CRLF HTTP/1.1 200 OK Server: Netscape-Enterprise/3.6 SP1 Date: Thu, 07 Sep 2000 10:45:54 GMT Content-length: 0 Public: HEAD, GET, PUT, POST
GET • Retrieves the information identified by the request URI. • static content (HTML file) • dynamic content produced by CGI program • passes arguments to CGI program in URI • Can also act as a conditional retrieve when certain request headers are present: • If-Modified-Since • If-Unmodified-Since • If-Match • If-None-Match • If-Range • Conditional GETs useful for caching
GET (www.iiit.net) [vishal@mail vishal]$ telnet www.iiit.net 80 Trying 196.12.32.11... Connected to www.iiit.net. Escape character is '^]'. GET /index.htm HTTP/1.1 Host: www.iiit.net CRLF HTTP/1.1 200 OK Date: Thu, 07 Sep 2000 12:02:11 GMT Server: Apache/1.3.9 (Unix) (Red Hat/Linux) Last-Modified: Thu, 07 Sep 2000 12:00:36 GMT ETag: "3ab62-1a38-39b78364" Accept-Ranges: bytes Content-Length: 6712 Content-Type: text/html <html> <head> <title>IIIT HOME</title>
GET request to euro.ecom(Internet Explorer browser) GET /test.html HTTP/1.1 Accept: */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows 98) Host: euro.ecom.cmu.edu Connection: Keep-Alive CRLF
GET response from euro.ecom HTTP/1.1 200 OK Date: Thu, 22 Jul 1999 04:02:15 GMT Server: Apache/1.3.3 Ben-SSL/1.28 (Unix) Last-Modified: Thu, 22 Jul 1999 03:33:21 GMT ETag: "48bb2-4f-37969101" Accept-Ranges: bytes Content-Length: 79 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html CRLF <html> <head><title>Test page</title></head> <body> <h1>Test page</h1> </html>
GET request to euro.ecom (Netscape browser) GET /test.html HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.06 [en] (Win98; I) Host: euro.ecom.cmu.edu Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 CRLF
GET response from euro.ecom HTTP/1.1 200 OK Date: Thu, 22 Jul 1999 06:34:42 GMT Server: Apache/1.3.3 Ben-SSL/1.28 (Unix) Last-Modified: Thu, 22 Jul 1999 03:33:21 GMT ETag: "48bb2-4f-37969101" Accept-Ranges: bytes Content-Length: 79 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html CRLF <html> <head><title>Test page</title></head> <body> <h1>Test page</h1> </html>
HEAD • Returns same response header as a GET request would have... • But doesn’t actually carry out the request. • Some servers don’t implement this properly. • example: espn.com • Useful for applications that • check for valid and broken links in Web pages. • check Web pages for modifications.
Head (netscape.com) [vishal@mail vishal]$ telnet www.netscape.com 80 Trying 64.236.8.10... Connected to www-jp.netscape.com. Escape character is '^]'. HEAD / HTTP/1.1 Host: www.netscape.com HTTP/1.1 200 OK Server: Netscape-Enterprise/3.6 Date: Thu, 07 Sep 2000 11:14:02 GMT Set-Cookie: UIDC=196.12.44.3:0968325242:189082; domain=.netscape.com;path=/;expir es=31-Dec-2010 23:59:59 GMT Content-type: text/html Connection: close
HEAD (espn.com) [vishal@mail vishal]$ telnet espn.com 80 Trying 204.202.136.31... Connected to espn.com. Escape character is '^]'. HEAD / HTTP/1.1 Host: www.espn.com CRFL HTTP/1.1 301 Document Moved Server: Microsoft-IIS/4.0 Date: Thu, 07 Sep 2000 11:11:00 GMT Location: http://espn.go.com Content-Type: text/html <html> Is now part of the http://espn.go.com service<br> </html> Modern browsers transparently connect to the new espn.go.com location
POST • Another technique for producing dynamic content. • Executes program identified in request URI (the CGI program). • Passes arguments to CGI program in the message body • unlike GET, which passes the arguments in the URI itself. • Responds with output of the CGI program.
TRACE • returns contents of request header in response message body. • HTTP’s version of an echo server. • used for debugging.
Assignments • Write a small web server and see what are the request send by different web browsers • Telnet to port 80 of 100 internet sites and make a study of different types of web servers they run. Make groups of 6 students by roll numbers. Roll number 1-6 form group A, 7-12 form group B …. Each group will work on 100 sites beginning with the alphabet of their group, eg. group A will go to sites beginning with A eg. Amazon, AOL etc.