310 likes | 327 Views
Learn about the history and components of the internet, including the origins of HTTP and the distributed system architecture. Gain a thorough understanding of TCP/IP, client-server architectures, and the principles of the web.
E N D
INF 123 SW Arch, dist sys & interopLecture 5 Prof. Crista Lopes
Objectives • Web history competency • Thorough understanding of HTTP
Distributed System • “Collection of interactingcomponentshosted on different computers that are connected through a computer network” … Component n Component n Component n Component1 Component1 Component1 Network Network OS Network OS Network OS Hardware Hardware Hardware Host 2 Host 1 Host 3
The Origins of the Internet • Heterogeneous computers • Decentralized control • Many interested players
OSI Model Image courtesy of The Abdus Salam International Centre for Theoretical Physics
OSI Model in Action UCI routers Google routers Internet DBH wireless router Google server Your laptop
The Internet • Large-scale infrastructure consisting of 100’s of 1,000’s of routers, cables, wireless links, and millions of hosts. • Traffic through the network consists of small data packets. • Software in each node follows, roughly, the OSI model. • Main “contract” between nodes: Internet Protocol (IP) • IP addresses (v4 and now v6) • Packets don’t contain routing information • Route packets according to their final destination but depending on local context of router • Each packet is routed independently of others
Context, 1985-1990 • Full decade of Internet usage • Foundation: TCP/IP [and UDP] • Enabled Client-Server architectures • Application: Telnet • Virtual terminal (login to remote machine) • Can be used to ‘talk’ to *any* TCP/IP server • Application: Email • SMTP: See example next page • POP • IMAP • Application: News • NNTP (before it, Usenet and UUCP) • Application: Instant Messaging • Unix’s Talk program • Popularized by AOL • Application: File sharing • FTP
Client-Server over TCP/IP • Server opens TCP [server] socket, binds to port, listens for connection requests • Client opens TCP [client] socket, connect to server host/port • Server accepts connection, initiates dedicated full-duplex “virtual circuit” • Eventually spawns thread for it • Main thread goes back to listen for other connections • Client and server send each other messages (byte streams) • TCP implementation takes care of protocol details
Example: SMTP over TCP/IP tagus: crista$ telnet smtp.ics.uci.edu 25 Trying 128.195.1.219... Connected to smtp.ics.uci.edu. Escape character is '^]'. 220 david-tennant-v0.ics.uci.edu ESMTP mailer ready at Mon, 5 Apr 2010 17:15:01 -0700' HELO smtp.ics.uci.edu 250 david-tennant-v0.ics.uci.edu Hello barbara-wright.ics.uci.edu [128.195.1.137], pleased to meet you MAIL FROM:<lopes@ics.uci.edu> 250 2.1.0 <lopes@ics.uci.edu>... Sender ok RCPT TO:<lopes@ics.uci.edu> 250 2.1.5 <lopes@ics.uci.edu>... Recipient ok DATA 354 Enter mail, end with "." on a line by itself test . 250 2.0.0 o360F1Mo029280 Message accepted for delivery QUIT 221 2.0.0 david-tennant-v0.ics.uci.edu closing connection Connection closed by foreign host.
Origins of the Web • CERN Conseil Européen pour la Recherche Nucléaire (European Laboratory for Particle Physics; Geneva, Switzerland) • Tim Berners-Lee & Robert Cailliou • Originally a system for sharing documents among scientists • First implementation made publicly available quickly became very popular in universities & research institutions • NCSA Mosaic browser made it popular across the board
Main Design Principles, originally • Client requests a text document from the server • Server sends back the text document • Text document may contain retrieval references (hyperlinks) to other text documents on that or other servers • HyperText Markup Language (HTML) • Client may also send text documents for the server to store • Requests/Responses sent over TCP, but • Client makes connection, sends, receives, connection is closed • Connection is not maintained among interactions • Requests are self-contained, do not rely on past interactions • “Stateless” • (Notice the story based on “text document”; it quickly became apparent that it needed generalization)
Generalization • Document Resource • “Page” with markups • Actual document, many types • Program generating resource • Universal Resource Identifier (URI) • Abstract concept • Concrete realization: Universal Resource Locator (URL) • Provides a method for finding the resource • http://, file://, ftp://, mailto://, etc.
HTTP URLs • Syntax: • http://<host>:<port>[/<path>][?<query>] • Examples • Hosts: www.ics.uci.edu, 127.0.0.1 • Ports: Number • Paths: /wifi/admin/users • Queries: first=John&last=Smith • Spec
HyperText Transfer Protocol (HTTP) • GET • PUT • DELETE • HEAD • OPTIONS • TRACE • POST • CONNECT • Spec Idempotent methods
HTTP Request Syntax <OPERATION> <ARGS> <VERSION> [<HEADER_1_NAME>: <HEADER_1_VALUE> … <HEADER_N_NAME >: <HEADER_N_VALUE>] <blank line> [<DATA>]
HTTP Response Syntax <VERSION> <CODE> <EXPLANATION> [<HEADER_1_NAME>: <HEADER_1_VALUE> … <HEADER_N_NAME >: <HEADER_1_VALUE>] <blank line> [<DATA>]
HTTP Example GET /index.html HTTP/1.1 Host: ics.uci.edu Blank line here HTTP/1.1 200 OK Date: Fri, 09 Apr 2010 19:48:36 GMT Server: Apache/2.2.3 (CentOS) Last-Modified: Fri, 19 Feb 2010 22:01:21 GMT ETag: "238003-64-47ffb39422e40" Accept-Ranges: bytes Content-Length: 100 Connection: close Content-Type: text/html; charset=UTF-8 <html> <head> <meta HTTP-EQUIV="REFRESH" content="0; URL=http://www.ics.uci.edu/"> </head> </html> (show live)
HTTP Headers • Request headers • Response headers • Spec
HTTP Status Codes • Informational 1xxx • E.g. 100 Continue • Successful 2xx • E.g. 200 OK, 201 Created • Redirection 3xx • E.g. 300 Multiple Choices, 301 Moved Permanently • Client error 4xx • E.g. 400 Bad Request, 404 Not Found • Server error 5xx • E.g. 500 Internal Server Error, 503 Service Unavailable • Complete list
Another Example GET /index.html HTTP/1.1 Host: cnn.com Blank line here HTTP/1.1 301 Moved Permanently Date: Fri, 09 Apr 2010 20:32:14 GMT Server: Apache Location: http://www.cnn.com/index.html Vary: Accept-Encoding Content-Length: 294 Content-Type: text/html; charset=iso-8859-1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>301 Moved Permanently</title> </head><body> <h1>Moved Permanently</h1> <p>The document has moved <a href="http://www.cnn.com/index.html">here</a>.</p> <hr> <address>Apache Server at cnn.com Port 80</address> </body></html> (show live)
Web Caches Internet Internet Caches content from servers Caches content from Internet Proxy Reverse Proxy … … Client Client Server Server
Web Caches • Reduce bandwidth • Reduce server load • Reduce lag • Cache content from Idempotent methods (GET mostly)
Web Caches: Why you need to know about them • github.com demo
Web Cache Control • “Cache-Control” header in responses • E.g. Cache-Control: no-cache • “Expires” header in responses • E.g. Expires: Fri, 09 Apr 2010 16:00:00 GMT • “Last-Modified” header in responses • Proxy can use If-Modified-Since header in request, server may respond 304 Not Modified • If subsequent POST, PUT, DELETE to same URL, cache should be invalidated
Cookies • Text data sent from the server to the client meant to be sent back in subsequent requests from the client to the same server • Added to Mosaic browser and Web servers in 1994 • Uses • Session management • Personalization • Tracking
Setting and Using Cookies GET /index.html HTTP/1.1 Host: www.google.com HTTP/1.1 200 OK Date: Sat, 10 Apr 2010 14:35:22 GMT Expires: -1 Cache-Control: private, max-age=0 Content-Type: text/html; charset=ISO-8859-1 Set-Cookie: PREF=ID=1bb89b81c47c05fb:TM=1270910122:LM=1270910122:S=YQ3wzhShOas9UStn; expires=Mon, 09-Apr-2012 14:35:22 GMT; path=/; domain=.google.com Set-Cookie: NID=33=CeVJK2EKVB5kcCiguCD1OjG3g5UKlPq78SXCibOjYQOU46P6SMaAKqAhw2hEVPqqnKfFlTzmC-w4Ol5ZwKQqnjyla1DZcS6ZYmb1lLHe2zNuEVnXJRtd4lMrr6gA4o8m; expires=Sun, 10-Oct-2010 14:35:22 GMT; path=/; domain=.google.com; HttpOnly Server: gws Transfer-Encoding: chunked … Client Server Server Client
Setting and Using Cookies GET /index.html HTTP/1.1 Host: www.google.com Cookie: PREF=ID=1bb89b81c47c05fb:TM=1270910122:LM=1270910122:S=YQ3wzhShOas9UStn Client Server Etc.
Uses • Session Management • User logs in, server sends cookie • Subsequent requests include that cookie • Personalization • User visits, server sends cookie • User changes preferences, all with cookie • Future visits include cookie, server “remembers” preferences • Tracking within same site • Cookie + path + date/time • Tracking inter-site • Referer + Cookie • (Privacy concerns)