260 likes | 436 Views
CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu. Lecture 4 Basic Web Concepts. IP address 1. IP address 2. TCP/IP network. HypertexT Transfer Protocol (HTTP). HTTP request. HTTP response. web browser
E N D
CS 502 Computing Methods for Digital LibrariesCornell University – Computer ScienceHerbert Van de Sompelherbertv@cs.cornell.edu Lecture 4 Basic Web Concepts
IP address 1 IP address 2 TCP/IP network HypertexT Transfer Protocol (HTTP) HTTP request HTTP response web browser HTTP client renders response web server HTTP server
Transmission Control Protocol/Internet Protocol (TCP/IP ) • is the protocol suite that drives the Internet • handles network communications between network nodes (computers, printers, webcams, … connected to the Internet) • protocol suite: • TCP: communication of data between applications • IP: communication of data between nodes • UDP: communication between applications • ICMP: error and stats
Client sends HTTP request Server receives HTTP request Application layer Transport layer TCP Internet layer IP Network Access layer Ethernet, … TCP/IP protocol architecture
Transmission Control Protocol (TCP) • breaks message up into chunks • chunks get sequence number and IP address of addressee • opens connection with addressee (handshake) • hands chunks over to IP layer • guarantees error-free delivery of chunks at addressee (through connection)
Internet Protocol (IP) • handles the routing of chunks towards addressee (through routers) • IP Addressing: • each node has an IP address: 157.193.101.6 • each node can have readable name erlserv.rug.ac.be • DNS connects IP and readable name • IP Data Transmission: • sender delivers chunk to router (via lower level protocol) • router delivers chunk to router or host • individual chunks can be delivered via different paths • routers decide on the path of least resistance • at addressee delivers chunk to TCP layer
TCP/IP protocol architecture Application layer HTTP, FTP, telnet Transport layer TCP, UDP Internet layer IP, ICMP Network Access layer Ethernet, …
method header entity-body HTTP request GET / HTTP/1.1 Date: Wednesday, 02-Feb-99 23:04:12 GMT Accept-Language: en-us User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: no.good.com Connection: Keep-Alive * a blank line * HTTP request no.good.com web browser HTTP client web server HTTP server
HTTP request method method URI HTTP-version GET - POST - HEAD – PUT - … GET / HTTP/1.1 header • general-header: optional, general information • Date: Wednesday, 02-Feb-99 23:04:12 GMT • Connection: Keep-Alive • request-header: about client • Accept-Language: en-us • User-Agent: Mozilla/4.0 (compatible; • MSIE 5.01; Windows NT) • entity-header: about entity-body What is sent to the server entity-body
status header entity-body HTTP response HTTP/1.1 200 OK Date: Wednesday, 02-Feb-99 23:04:25 GMT Server: Apache/1.3.6 (Unix) Last-Modified: Sun, 01 Feb 1999 13:54:26 GMT ETag: “2f5cd-964-38js8” Content-length: 327 Connection: close Content-Type: text/html * a blank line * <title>Welcome to nogood</title> <img src=“/images/nogood-logo.gif”> HTTP response no.good.com web browser HTTP client web server HTTP server
HTTP response status HTTP-version Status-code Reason-phrase HTTP/1.1 200 OK header • general-header: optional, general information • Date: Wednesday, 02-Feb-99 23:04:25 GMT • response-header: about server • Server: Apache/1.3.6 (Unix) • entity-header: about entity-body • Content-Type: text/html • ETag: “2f5cd-964-38js8” • Content-length: 327 entity-body What is sent to the client title>Welcome to nogood</title> <img src=“/images/nogood-logo.gif”>
HTTP request GET /images/nogood-logo.gif HTTP/1.1 Date: Wednesday, 02-Feb-99 23:04:27 GMT Accept-Language: en-us User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: no.good.com Connection: Keep-Alive * a blank line * HTTP request no.good.com web browser HTTP client web server HTTP server
HTTP response HTTP/1.1 200 OK Date: Wednesday, 02-Feb-99 23:04:29 GMT Server: Apache/1.3.6 (Unix) Last-Modified: Sun, 01 Feb 1999 08:20:00 GMT ETag: “2f5cd-964-445e” Content-length: 220 Connection: close Content-Type:image/gif * a blank line * the GIF file HTTP response no.good.com web browser HTTP client web server HTTP server
HypertexT Transfer Protocol (HTTP) HTTP request HTTP response MIME type + file web browser HTTP client renders response web server HTTP server
Browser • built into browser • plug-in • helper application file MIME type Presentation software Display
s e r v e r c l i e n t HTTP Proxies • Reduce network traffic: caching (Etag, Last-Modified) • IP-based authentication cache no.good.com web browser HTTP client web server HTTP server HTTP proxy
HTTP cookies • HTTP protocol is stateless: once a server has given a response to a client, it forgets about it. No session information. • Fake state with cookies: • server sends token to client • client sends token back to server • server understands the meaning of the token • for instance: server avoids to require input of username/password with every request by reading authorization from cookie
CGI HTTP request HTTP response Dynamic content: Common Gateway Interface (CGI) • Client interaction with non-web servers program no.good.com web browser HTTP client web server HTTP server
CGI CGI -- HTTP POST request POST/cgi-bin/find HTTP/1.1 Date: Wednesday, 02-Feb-99 23:04:27 GMT Accept-Language: en-us User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: no.good.com Connection: Keep-Alive Content-length: 26 Content-type: application/x-www-form-urlencoded * a blank line * search=herbert&type=author program find HTTP request no.good.com web browser HTTP client web server HTTP server
CGI CGI -- HTTP GET request GET/cgi-bin/find?search=herbert&type=author HTTP/1.1 Date: Wednesday, 02-Feb-99 23:04:27 GMT Accept-Language: en-us User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: no.good.com Connection: Keep-Alive * a blank line * program find HTTP request no.good.com web browser HTTP client web server HTTP server
CGI - the interface program find • find receives input from • STDIN • environment variables (about client, server, • request … CGI search=herbert&type=author SERVER-NAME server.good.com REMOTE-HOST 157.193.101.6 … no.good.com web server HTTP server
CGI - the interface find outputs to STDOUT program find Content-type: text/html <title>Search results</title> … CGI web server adds header information sends response to client no.good.com web server HTTP server
Dynamic content: Mobile code - JavaScript • Executed by the browser • • User interface, client-side validation, … HTML HTTP response JavaScript no.good.com web server HTTP server web browser HTTP client
Dynamic content: Mobile code – Java applets • Executed by virtual machine • • Interaction with find not via HTTP program find Java HTTP response no.good.com web server HTTP server web browser HTTP client
Want to read a bit more? • on Web Characterization http://www.w3.org/1999/05/WCA-terms/01 • on CGI http://www.ukans.edu/~acs/docs/other/forms-intro.shtml • on Web, TCP/IP, CGI http://www.wdvl.com/Authoring/Tools/Tutorial/index4.html • HTTP http://www.ietf.org/rfc/rfc1945.txt?number=1945 ; http://www.jmarshall.com/easy/http/