450 likes | 673 Views
World Wide Web. Basics. Original version by Carolyn Watters (Dalhousie U. Computer Science). The Web…. …is a distributed document delivery system that uses Internet protocols …links documents stored in computers communicating by the Internet Main authority is the W3 Consortium www.w3.org.
E N D
World Wide Web Basics Original version by Carolyn Watters (Dalhousie U. Computer Science)
The Web… • …is a distributed document delivery system that uses Internet protocols • …links documents stored in computers communicating by the Internet • Main authority is the W3 Consortium www.w3.org
Basic Definitions • Web server – machine that services Internet request • Web client – machine that initiates Internet request • Browser – software to interact with Internet data at the web client • TCP/IP – internet data protocol • FTP – internet file transfer protocol • HTTP – hypertext transfer protocol • HTML – hypertext markup language
Servers and Clients • Servers – computer systems at the end of a network that store files and provide other services • Clients – computer systems that are end points for users of the data
Client-Server Model & WWW • Cloud model • TCP/IP • HTTP and MIME types • FTP • Protocol stacks
Internet Model Layers Application layer Communication services (FTP, telnet, e-mail) Transport layer Transmission of messages end-to-end Network services layer Transmission of messages sequence of links Data Link layer Transmission of packet across one link Physical layer Where the signals move
Application Layer • FTP • HTTP • SMTP • Telnet • Etc.
TCP/IP • Suite of protocols made the standard for the Internet • facilitates communication between heterogeneous and similar networks that are connected together • reliable, connection oriented, byte stream protocol
TCP transmission control protocol full duplex byte stream virtual path (connected) error free uses acknowledgements 16 bit address of ports UDP user datagram protocol connectionless no acknowledgements no flow control no resending of erroneous packets some error detection 16 bit port addresses Transport layer: TCP & UDP
Network Layer: IP • Delivers packets up to 64 Kb, 1 at a time • Each packet has a header • sending host and intended host network addresses • 32 bit addresses • IP layer (like UDP) • unreliable • connectionless
TCP/IP apps TCP/IP software usually includes: • remote terminal client using TELNET protocol for remote login • electronic mail client using SMTP protocol to transfer e-mail to remote system • file transfer client using FTP protocol to transfer files between 2 machines
HTTPHyperText Transport Protocol • Native protocol for WWW • Sits on top of internet’s TCP/IP protocol • HTTP is a 4 step process per transaction • Uses a predefined set of document formats from MIME
MIME Multipurpose Internet Mail Extensions • defines file formats (images, video, text, etc) • e.g. Content-type: text/html • Data type/subtype • text/html • text/plain • image/gif • video/mpeg • application/msword • etc!
HTTP Connection • 1. Client • Makes an HTTP request for a web page • Makes a TCP/IP connection • 2. Server accepts request • Sends page as HTTP • 3. Client downloads page • 4. Server breaks the connection
HTTP is Stateless! • Each operation or transaction makes a new connection • each operation is unaware of any other connection • each click is a new connection • So how do they do those shopping carts?
What does it look like? • Header + object file • Header • plain text • info about the object (MIME, etc.) • methods allowed • etc. • browser sends a header to server each time you ask for information • server sends a header and possibly content
HTTP Transaction Example • GET /catalog/ip/ip.htm HTTP 1.0 • Accept: text/plain • Accept: text/html • Referer: http://www.june.com/catalog.html • User-Agent: Mozilla/2.0 • CRLF
HTTP REQUEST PROTOCOL Request = Simple | Full Simple = GET <URI> CRLF Full = Method URI ProtVersion CRLF [<HTRQ Header>*] [CRLF <data>] Method = GET | POST | HEAD | …. <HTRQ Header> = <Fieldname>:<Value>CRLF <data> = MIME conforming message w.w3.org/Protocols/HTTP/
HTTP Header fields • General-header fields • used for both requests and responses • Request-header fields • used for responses • extra client information for use by server • optional
General-header fields • Date: Mon,11, Jan 1999 08:14:32 GMT • MIME-version: 1.0 • Pragma: no cache • directives
Request-header fields • acceptable MIME types for response • Accept:text/html • Accept:*/* • 401 response from client • Authorization: Basic abcdef (uuencoded username and password) • From:client-email-addr
More Request-header fields • If-Modified-Since:date • conditional get • source of current requested URL • Referer:URL • robot/browser identification • User-Agent:Mozilla/2.0
Examining HTTP Header Values • In perl • $ENV{"From"} • In Netscape • www.cs.dal.ca/~jamie/cgi-bin/4173/about/env.cgi
HTTP Methods • Client requests either • simple request • full request Request-line= methodRequest-URIHTTP-versionCRLF GET/catalog/ip.htmlHTTP/1.0
Simple requests • Only for HTTP 0.9 • only uses Get method • causes the server to locate and transfer the object specified • client responsible for handling the object GET <uri> CRLF
Full Request • Uses HTTP version and more methods • method tells server what to do to the resource requested • Methods • GET • POST • HEAD
GET Method • Request server to retrieve object specified • conditional GET • request message includes • If-Modified-Since in header
HEAD Method • Like GET but does not return the object • returns a header about the resource requested (meta information) • good way to test link validity
POST Method • Include an object in the request • server should use that object in processing the request • must include a Content-Length in header
HTTP Response Message • HTTP protocol version • 3 digit status code • reason phrase • CRLF • optional header fields • CRLF
HTTP Response Header Fields • Additional information about the server • such as: • LOCATION: exact URI address • SERVER: server software (CERN/3.0) • WWW-AUTHENTICATE: • status 401 responses (unauthorized request) • server challenges client • client may use to send authorization info to server
Understanding STATUS Codes • 1xx – for information only • 2xx – action successful • 3xx – further action needed (redirect) • 4xx – client request error • 5xx – server error
HTTP Transaction • Client and server establish a connection • Client makes a request • Server makes a response • Server terminates connection
Step 1 establish connection • TCP/IP connection set up • uses a port number as application reference • usually port 80 • ports ≤ 1024 are privileged (>1024 are open) • Step 2 client request • HTTP message sent with a request line • request-line = method URL HTTP version
Step 3 Server response • server sends HTTP message and optionally requested data • resp-message = HTTP version status code reason-phrase [optional stuff] • Step 4 connection terminated • usually the server • sometimes the client “stops” it • anything else, whoever notices terminates
Some Port Assignments • 21 FTP • 23 Telnet • 25 smtp (mail) • 70 gopher • 79 finger • 80 HTTP