280 likes | 631 Views
Internet and Intranet Protocols and Applications. Lecture 5: Browsers Feb 22, 2000 Arthur P. Goldberg Computer Science Department New York University artg@cs.nyu.edu. Outline. Issues How do network technologies spread? How do browsers work? Spread in two steps Solve an existing problem
E N D
Internet and Intranet Protocols and Applications Lecture 5: Browsers Feb 22, 2000 Arthur P. Goldberg Computer Science Department New York University artg@cs.nyu.edu
Outline • Issues • How do network technologies spread? • How do browsers work? • Spread in two steps • Solve an existing problem • Bundle new functionality • For example, the browser • Solve access • Multi-protocol • Uniform network interaction • Universal naming • Bundle, the Web • Hypertext
Key Pieces of Unified Multi-Protocol Access • Multi-Protocol client • Simplified, uniform, network interaction • Request - Response • Universal object naming • URL
Tim Berners-Lee WorldWideWeb: Proposal 11/90 • “The current incompatibilities of the platforms and tools make it impossible to access existing information through a common interface, leading to waste of time, …” • “A link is specified as an ASCII string from which the browser can deduce a suitable method of contacting an appropriate server. When a link is followed, the browser addresses the request for the node [document] to the server.”
Multi-Protocol Client Supported • Existing protocols • File Transfer Protocol (FTP) • Network News Transfer Protocol (NNTP) • telnet • Gopher • Mail • WAIS • finger • File access [came later] • Plus a NEW protocol • HyperText Transfer Protocol (HTTP)
Simplified, Uniform, Network Interaction • Most protocol interactions mapped to Request - Response
Universal Object Names Goals • Multi-Protocol • Network-Wide • Printable • Written in 7-bit ASCII chars • No white space • Non ASCII chars escaped by % • Extensible • Scheme/protocol name prefixed so new schemes can be added later • Complete • Any binary value can be encoded • Chars not 7-bit ASCII encoded in base 16 or 64
Unified Multi-Protocol Access • "Any piece of information on the Internet in any of a dozen common and flexible information systems can be uniquely named and thereby retrieved, displayed, annotated, and remembered by Mosaic". • Marc Andreessen, NCSA Mosaic Technical Summary, 5/8/93 • ftp://ftp.ncsa.uiuc.edu/Web/Mosaic/papers/mosaic.ps.Z • Invented by Tim Berners-Lee at CERN in 89
URL Schemes • ftp : // [ user [:password] @ ] host / path • news : newsgroup • telnet : ipaddress • gopher : // host [ gtype ] • mailto : userid @ hostname • wais : // hostport / database [ ? search ] • wais : // hostport / database / wtype / wpath • file: // pathname • http: // host [: port ][ / path ] • http://www.w3.org/Addressing/rfc1738.txt
Partial (Relative) URI's • Purpose • Allow relative reference among objects in the same hierarchy • Enable relocation of a set of objects in a hierarchy • Enabled by hierarchical delimiters • / .. . (slash, dot dot, dot)
To Review • The browser • Solve access • Multi-protocol • Uniform network interaction • Universal naming • Bundle, the Web • Hypertext
Web Browser Architecture • Multi-protocol • Browser GUI: Uniform network interaction • Mouse clicks map to requests • Responses are formatted objects which get rendered • Universal naming
Formatted Document Rendering • Formats: • HTML • Text • Adobe PDF • Sound • And many others • Images: • gif • jpeg
Showing Docs in Other Formats • Helper apps render documents in formats not supported internally in the browser • if (!(MIME type given by incoming identification)) • then {map from suffix to type } • if (MIME type = HTML, text, gif or jpeg) • then render • else { • map from MIME type to helper-application • if (map fails) then {save to disk as binary} • else {run helper-application to view to document} • }
Converting Responses to HTML • Directory listing converted to a <ul> • FTP => HTML • <A HREF=“URL-for-filename”> • <IMG SRC=“appropriate-icon”> • filename (fixed length) • file statistics </A> • Gopher => HTML • <A HREF=“URL-for-gopher-object”> • <IMG SRC=“appropriate-icon”> • gopher object name </A> • <WOULD BE NICE TO DEMO THIS IN A TRACE>
Converting Responses to HTML • News => HTML • Newsgroup list • HTML form for subscription • Available articles list • unordered list • sub-lists for threads • An article • header gif buttons • text of article
Browser Implementation Issues • Performance • Run multiple threads • Parallel retrieval • Cache documents • Cache domain names
Browser Implementation Issues • HyperText and cache management • Canonicalize URLs • Store hypergraph of HREFs • Interaction management • Event driven • Mouse events • Keyboard • Network
Invented by Tim Berners-Lee at CERN in 89 • "HyperText is a way to link and access information of various kinds as a web of notes in which the user can browse at will. It provides a single user-interface to large classes of information." • "A hypertext page has pieces of text which refer to other texts. Such references are highlighted and can be selected... When you select a reference, the browser presents you with the text which is referenced: you have made the browser follow a hypertext link" WorldWideWeb: Proposal for a HyperText Project, 11/12/90, http://www.w3.org/Proposal.html
Universal Resource Name (UNR) • RFC 1737, K. Sollins, L. Masinter, "Functional Requirements for Uniform Resource Names", 12/20/1994 • Global scope • Always identifies the same resource • Globally unique • Each resource uniquely identified • Permanent • Unique over time • Scaleable • Name everything for hundreds of years • Backwards compatible • Incorporate UNL, ISBN, UPC, phone, ZIP code, etc..
UNR Location Independence • URN identifies any one of several instances • Like book reference, Notes DB identifier • Does not imply a resource location • Enables transparent replication of network resources • Requires distributed mapping scheme, resolved before DNS • All CS problems solved by another level of indirection - Lampson • IBQs: • Which schemes are already location independent? • How do they resolve location?
Proposed URN Solution: “The Handle System” • William Arms, David Ely, Corporation for National Research Initiatives, June 23, 1995 • Components • Naming authorities • Handle generators • The global handle server • Local handle servers • Caching handle servers • Client software • Proxy servers • Administrative tools
Handle Syntax • h/d • h - naming authority • d - arbitrary string • naming authority • dot separated list of hierarchical names • e.g.: goldbergs.arthur • a naming authority can grant authority to subsidiary naming authorities • arbitrary string • unique within naming authority
Tanenbaum 7.6 • Nits • TBL • MA • Proxy
Outstanding Qs • Threadsafe socket calls? • yes • Blocking vs. non-blocking vs. interrupted sockets • WS32 • Automatic • connect(), send(), gethostbyname() • protocol Tos • Application • recv(), recvfrom(), accept() • SO_RCVTIMEO
URL Syntax • scheme : path • Reserved characters • % - escape character • % followed by @ hex digits • e.g.: ‘%’ is %25, ‘ ’ is %20 • / .. . (slash, dot dot, dot) - hierarchical delimiters • # - separate URI from fragment identifier within object • ? - separate URI from query