360 likes | 373 Views
Learn about the origin and overview of the web, distributed system aspects, communication protocols, naming conventions, replication and fault tolerance, and web security.
E N D
The Web • Origin and overview of the web • Drill-down on distributed system aspects • Communication • Processes • Naming • Synchronization • Replication (especially caching) • Fault tolerance • Security Distributed Systems - Comp 655
Origin of the web • CERN (European particle physics lab) • Purpose: facilitate document sharing • Large user community • Geographically dispersed • Founder: Tim Berners-Lee • Use exploded in late 90’s • Graphical user interfaces (Mosaic and descendants) • Huge amounts of content • Search engines • Interactive pages Distributed Systems - Comp 655
Definition of the Web • Many standards • HTML • HTTP • DNS • URL, URI, URN • XML • DOM • W3C • IETF Distributed Systems - Comp 655
A word about RFCs • Standards track • Proposed standard • Draft standard (at least two independent and interoperable implementations) • Internet standard (also has STD number, for example IP is STD-005 and RFC-0791) • “Off-track” • Experimental • Informational • Historic(al) See RFC 2026 for details Distributed Systems - Comp 655
Yet more words about RFCs Before using an RFC, • check the Obsolete RFC list • or find it on the Active RFC list I use the RFC index at faqs.org because I find it a bit easier to use than the IETF’s list. Remember, if there’s a conflict, IETF is the authority. Distributed Systems - Comp 655
Overall structure Distributed Systems - Comp 655
Client-side script What’s in a web page? Distributed Systems - Comp 655
Some web pages are XML Distributed Systems - Comp 655
XML document type definition Distributed Systems - Comp 655
Other document types Distributed Systems - Comp 655
CGI – early Web interaction Distributed Systems - Comp 655
Problems with CGI • Process per request • Wide variety in server-side runtime environments • Solutions • Server-side scripting (JSP, ASP, PHP) • Servlets Distributed Systems - Comp 655
Problems with browsers • Browser-based user interfaces tend to be clunky and limited • Solutions: • Client-side scripting • Applets • More recently, AJAX • An example: http://www.javarss.com/ajax/j2ee-ajax.html • See http://en.wikipedia.org/wiki/AJAX for more information Distributed Systems - Comp 655
Server-side scripts and servlets Distributed Systems - Comp 655
Nothing’s perfect • What Web technology has big problems with server-side page generation? Distributed Systems - Comp 655
Communcation on the web: HTTP • TCP-based client/server protocol • Create connection • Send request • Send response • Close connection • HTTP 1.1 reduces connection overhead with persistent connections Distributed Systems - Comp 655
HTTP connections non-persistent persistent Distributed Systems - Comp 655
HTTP request types Distributed Systems - Comp 655
type path protocol headers HTTP request example GET /xyzzy HTTP/1.1 Connection: Keep-Alive Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint, applicat ion/vnd.ms-excel, application/msword, application/x-shockwave-flash, */* Accept-Language: en-us Host: laptop:1215 If-Modified-Since: Sun, 27 Jun 2004 00:58:28 GMT User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Distributed Systems - Comp 655
HTTP header types Distributed Systems - Comp 655
Processes • Browsers • Proxies • Apache web server framework Distributed Systems - Comp 655
Browser with plug-in Distributed Systems - Comp 655
Web proxy Most browsers today support ftp. However, proxies are still used for shared caching. Distributed Systems - Comp 655
Apache modules www.apache.org Distributed Systems - Comp 655
Server cluster – simple minded Distributed Systems - Comp 655
Server cluster - clever Distributed Systems - Comp 655
Web naming URI URL URN Distributed Systems - Comp 655
URI examples from RFC 2396 ftp://ftp.is.co.za/rfc/rfc1808.txt -- ftp scheme for File Transfer Protocol services gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles -- gopher scheme for Gopher and Gopher+ Protocol services http://www.math.uio.no/faq/compression-faq/part1.html -- http scheme for Hypertext Transfer Protocol services mailto:mduerst@ifi.unizh.ch -- mailto scheme for electronic mail addresses news:comp.infosystems.www.servers.unix -- news scheme for USENET news groups and articles telnet://melvyl.ucop.edu/ -- telnet scheme for interactive services via the TELNET Protocol More examples on page 670 Distributed Systems - Comp 655
Naming – URL – how to access Distributed Systems - Comp 655
Naming – URN – true resource identifier RFC 2648 defines a URN namespace for IETF documents. RFC 2141 defines URN syntax. RFC 3406 is a BCP (Best Current Practice) for defining URN namespaces. Distributed Systems - Comp 655
Activity – hitting a web page • Check your understanding: draw a UML sequence diagram showing the interaction of key software elements when a browser hits a web page containing graphics • Assume the web page and the images are on different servers • “Classes” in the diagram should include • Browser • DNS resolver • DNS server • Server for the page • Server for the images Distributed Systems - Comp 655
Not much to synchronize … • Generally, web clients don’t exchange information with other clients, and servers don’t exchange with other servers • Most documents have a single author – few write/write conflicts • However, WebDAV is a simple locking and versioning scheme • Locks are connection-independent • Handling abandoned locks is left to implementation Distributed Systems - Comp 655
Replication – client and proxy Many organizations run proxy servers Some proxies can cooperate Virtually all browsers can cache Distributed Systems - Comp 655
Replication – server side • Server clusters • Mirror sites • Content delivery networks (CDNs) • For example, Akamai Distributed Systems - Comp 655
CDN operation In Akamai’s CDN, embedded document URLs get resolved to “closest” CDN server Distributed Systems - Comp 655
If using client authentication Security on the Web NOTE: using both public and private key encryption, for performance reasons NOTE: client has to use same server for entire session Distributed Systems - Comp 655