1.11k likes | 1.27k Views
Ismétlés 1. Gyakorlat Felkészülés a web programozásra. Kecskeméti Gábor A/1 336-os szoba – fizika tanszék Konzultációs idő : Csütörtök 9-11 Előadás : XX. Előadó , Csütörtök 14-16 Gyakorlat : 207-es terem , Csütörtök 12-14 . The Universal Resource Locator (URL).
E N D
Ismétlés 1. GyakorlatFelkészülés a web programozásra KecskemétiGábor A/1 336-os szoba – fizikatanszék Konzultációsidő: Csütörtök 9-11 Előadás: XX. Előadó, Csütörtök 14-16 Gyakorlat: 207-es terem, Csütörtök 12-14
The Universal Resource Locator (URL) Each page of information on the web has a unique address called the URL at which it can be found • http://faculty.uscupstate.edu/atzacheva/lecture1.html The document can be obtained using the Hypertext Transfer Protocol (HTTP) Host Name - The Name of Web Server Path to the Web Page File Name Denotes that the File is Written in HTML HyperText Markup Language 1 2 3 File Name Protocol Host Name
Önállófeladat • Készítsünkegyprogramotamifeldolgozza a parancssoriparaméterkéntkapottURLtésdarabjairabontja • Az URL alábbirészeit minimum tudjafelolvasni a program: • 1. protokoll (séma) • 2. Domain név (hostazonosító) • 3. Port (nemprotokollnakmegfelelőportszám) • 4. Elérésiútvonal • 5. Ha Queryt is tartalmazaz URL akkor a paramétereknevei • 6. Ha Queryt is tartalmazaz URL akkor a paraméterekértékei • Ha valakielkészült a feladattalellenőrzésétésteszteléséténvégzem, jelezzemindenki ha mehetek
Példák • urlparser.exehttp://www.hwsw.hu • Protokoll (séma): http • Domain név (hostazonosító): www.hwsw.hu • urlparser.exe x://y:1/a?b=c&d=e • Protokoll (séma): x • Domain név (hostazonosító): y • Port (nemprotokollbólkövetkező): 1 • Elérésiútvonal: a?b=c&d=e • Query paraméterek: b,d • Query értékek: c,e
A HTTP protokoll KecskemétiGábor A/1 336-os szoba – fizikatanszék Konzultációsidő: Csütörtök 9-11 Előadás: XX. Előadó, Csütörtök 14-16 Gyakorlat: 207-es terem, Csütörtök 12-14
application: supporting network applications FTP, SMTP, HTTP transport: process-process data transfer TCP, UDP network: routing of datagrams from source to destination IP, routing protocols link: data transfer between neighboring network elements PPP, Ethernet physical: bits “on the wire” Internet protocol stack application transport network link physical
Ports • Each IP address is subdivided into ports, each port assigned to a single program • You can browse the Internet using one port while receiving e-mail using another port, with a single IP address
Protocols • Sets of rules or standards that let computers communicate over the Internet • Type or size or brand of computer doesn’t interfere with communication if the protocol is used
WWW Background • 1989-1990 – Tim Berners-Lee invents the World Wide Web at CERN • Means for transferring text and graphics simultaneously • Client/Server data transfer protocol • Communication via application level protocol • System ran on top of standard networking infrastructure • Text mark up language • Not invented by Bernes-Lee • Simple and easy to use • Requires a client application to render text/graphics CS 640
About: Tim Bernes Lee • Born: 1955 • Native City: London • Education: B.A. in physics, Queen's College, Oxford, 1976 . • Hobbies: Windsurfing • Misc Facts: • Knighted by the British Empire in 1998. • While naming the “World Wide Web” he also considered the names “The Information Mine” and “Information Mesh” • In October 1999, Berners-Lee published the book Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor, which told the personal story of how he created the World Wide Web and discussed the future of the mass medium.
The Story of Lee’s Invention • Tim Bernes Lee attended college at Oxford in 1976. 1980 he went to go work for a software company called CERN. • Always been fascinated by computers, it was at Oxford that he struggled with a problem. • This problem was of integrating and exchanging information held on different computers in often widely scattered places. • He found a solution to his problem and named it Enquire. • This method incorporated the use of hypertext, a system that links documents from different sources, forming an electronic path that users follow to obtain related information.
Tim Berners-Lee Tim Berners-Lee was knighted by Queen Elizabeth for his invention of the World Wide Web. He is shown here, along with the first picture posted on the Web and a screen shot from an early version of his Web browser.
WWW History contd. • 1994 – Mark Andreesen invents MOSAIC at National Center for Super Computing Applications (NCSA) • First graphical browser • Internet’s first “killer app” • Freely distributed • Became Netscape Inc. • 1995 (approx.) – Web traffic becomes dominant • Exponential growth • E-commerce • Web infrastructure companies • World Wide Web Consortium • Reference: “Web Protocols and Practice”, Krishnamurthy and Rexford CS 640
No royalties • While the component ideas of the World Wide Web are simple, Berners-Lee's insight was to combine them in a way which is still exploring its full potential. Perhaps his greatest single contribution, though, was to make his idea available freely, with no patent and no royalties due. • In 1994 he founded World Wide Web Consortium (W3C) at the MIT Laboratory for Computer Science in Cambridge, Massachusetts, and in 2003, the organization decided that all standards must contain royalty-free technology, so they can be easily adopted by anyone. Taken from: http://en.wikipedia.org/wiki/Tim_Berners-Lee#No_royalties
WWW Components • Structural Components • Clients/browsers – to dominant implementations • Servers – run on sophisticated hardware • Caches – many interesting implementations • Internet – the global infrastructure which facilitates data transfer • Semantic Components • Hyper Text Transfer Protocol (HTTP) • Hyper Text Markup Language (HTML) • eXtensible Markup Language (XML) • Uniform Resource Identifiers (URIs) CS 640
HTML, XML, ... Client Side Server Side JavaScript VBScript DHTML Java Applets CGI ASP Java Servlets World Wide Web • WWW comprises software (Web server and browser) and data (Web sites)
Quick Aside – Web server use Source: Netcraft Server Survey, 2001 CS 640
WWW Structure • Clients use browser application to send URIs via HTTP to servers requesting a Web page • Web pages constructed using HTML (or other markup language) and consist of text, graphics, sounds plus embedded files • Servers (or caches) respond with requested Web page • Or with error message • Client’s browser renders Web page returned by server • Page is written using Hyper Text Markup Language (HTML) • Displaying text, graphics and sound in browser • Writing data as well • The entire system runs over standard networking protocols (TCP/IP, DNS,…) CS 640
Http protocol (HyperText Transfer Protocol) How HTML is Displayed Browser Command HTML URL:http://www.google.com HTML Display render Text & binary data
HTML CGI ASP PHP … Browser Command URL:http://www.yahoo.com http request User http response How HTML is Displayed – from remote site HTML Display DB Remote Web Server Client Site
How HTML is Displayed – from client site HTML Browser Command URL:c:\my_page.html User HTML Display Client Site
What is HTTP? • HyperText Transfer Protocol, the underlying protocol used by the World Wide Web. HTTP defines how messages are formatted and transmitted, and what action Web servers and browsers should take in response to various commands. For example, when you enter a URL in your browser, this actually sends an HTTP command to the Web server directing it to fetch and transmit the requested Web page. www.wmo.ch/web/www/WDM/Guides/Internet-glossary.html • Basically HTTP is a protocol used by computers to transmit data from one computer to another. It is a universal format that is understood most all computers now-a-days and is the backbone of the World Wide Web (WWW).
HTTP Basics • Protocol for client/server communication • The heart of the Web • Very simple request/response protocol • Client sends request message, server replies with response message • Stateless • Relies on URI naming mechanism • Three versions have been used • 09/1.0 – very close to Berners-Lee’s original • RFC 1945 (original RFC is now expired) • 1.1 – developed to enhance performance, caching, compression • RFC 2068 • 1.0 dominates today but 1.1 is catching up
HTTP • HTTP defines how Web pages are requested and served on the Internet • Early servers and browsers used an ad-hoc approach • A standardized protocol, called HTTP/1.0, was derived from this • The earlier approach is now called HTTP/0.9 • Later, HTTP/1.0 was extended to HTTP/1.1 • The protocol versions are upwardly compatible • servers and browsers which can handle HTTP/1.1 can also handle HTTP/1.0 and HTTP/0.9
History: “HTTP/0.9” • HTTP/0.9 was very simple: • A browser would send a request like this to a server: GET /hobbies.html • In response, the server would send the contents of the requested file. • Only GET requests were supported • Only a file path and name could appear in a GET request • The response had to be a HTML document.
History (contd.) • Different browsers/servers soon extended this basic scheme in various ways • To achieve some standardization, the HTTP/1.0 protocol was specified, in 1996, in a document called RFC1945 • (for historical reasons, an Internet standard spec is called a Request for Comment or RFC) • This was soon extended to HTTP/1.1, in RFC2068, released in January 1997 • An update to RFC2068 was produced in June 1999, as RFC2616 • Various other protocols, based on HTTP, have been produced from time-to-time • we will see a “cookie” protocol, based on HTTP, which was specified in February 1997, in RFC2109
HTTP vs HTML • HTML: hypertext markup language • Definitions of tags that are added to Web documents to control their appearance • HTTP: hypertext transfer protocol • The rules governing the conversation between a Web client and a Web server Both were invented at the same time by the same person
HTTP is an application layer protocol • The Web client and the Web server are application programs • Application layer programs do useful work like retrieving Web pages, sending and receiving email or transferring files • Lower layers take care of the communication details • The client and server send messages and data without knowing anything about the communication network
Web communication Get http://www.google.com/index.html HTTP Request: Internet HTTP Reply Web server: apache on www.google.com Client: browser (Firefox) on local computer
Overall Operation of HTTP • The HTTP protocol is a request/response protocol. • request • An HTTP message sent by a client to a server • response • An HTTP message sent by a server to a client which has made a request. • client • A program that establishes connections for the purpose of sending requests. • server • A program that accepts connections in order to service requests by sending back responses. • As we shall see, a program may act as both a client and a server.
HTTP – HyperText Transfer Protocol • together with HTML forms the base of WWW • is standardized by IETF (rfc 2616) • is a request-response protocol • it is stateless (does not maintain a state of a session) and asynchronous (an html document is loaded asynchronous by the browser, as soon as parts of it are available) • latest version is HTTP/1.1 • runs on top of TCP on the standardized port 80
I would like to open a connection GET <file location> Display response Close connection OK Send page or error message OK An HTTP conversation Client Server HTTP is the set of rules governing the format and content of the conversation between a Web client and server
HTTP messages • HTTP is the language that web clients and web servers use to talk to each other • HTTP is largely “under the hood,” but a basic understanding can be helpful • Each message, whether a request or a response, has three parts: • The request or the response line • A header section • The body of the message
A http kérés – egyböngészőtől (pl. Firefox, IE, Chrome) A kliensoldal
Message from a client: A client sends, over a connection, to a server • a request line in the form of • a request method, • a URI (Uniform Resource Identifier), and • a protocol version, • possibly followed by a message containing • request modifiers, • information about the client, • and (possibly) body content.
What the client does, part I • The client sends a message to the server at a particular port (80 is the default) • The first part of the message is the request line, containing: • A method (HTTP command) such as GET or POST • A document address, and • An HTTP version number • Example: • GET /index.html HTTP/1.0
HTTP Request Packets • Sent from client to server • Consists of HTTP header • header is hidden in browser environment • contains: • content type / mime type • content length • user agent - browser issuing request • content types user agent can handle • and a URL
HTTP - methods • Methods • GET • retrieve a URL from the server • simple page request • run a CGI program • run a CGI with arguments attached to the URL • POST • preferred method for forms processing • run a CGI program • parameterized data in sysin • more secure and private
HTTP Request • has the form: Request-Method SP Request-URL SP HTTP-Version <CR><LF> (generic-header | request-header | entity-header <CR><LF>) <CR><LF> [message body] • Request-Method is: • GET – request whatever information is identified by the Request-URL • POST – request that server accepts the entity enclosed in the request • OPTIONS - request for information about communication options • PUT – request that the enclosed entity be stored under the Request-URL • DELETE – request that the server delete the resource identified by Request-URL • TRACE – invoke a remote, application-layer loopback of the request message • CONNECT – used by proxies in SSL connections • HEAD – identical to GET, but server must not return a message body in response • LINK: Request header information be associated with a document on the server • UNLINK: Request to undo a LINKrequest
HTTP - methods • Methods (cont.) • PUT • Used to transfer a file from the client to the server • HEAD • requests URLs status header only • used for conditional URL handling for performance enhancement schemes • retrieve URL only if not in local cache or date is more recent than cached copy
What the client does, part II • The second part of a request is optional header information, such as: • What the client software is • What formats it can accept • All information is in the form Name: Value • Example: • User-Agent: Mozilla/2.02Gold (WinNT; I)Accept: image/gif, image/jpeg, */* • A blank line ends the header
HTTP Request Headers • Precede HTTP Method requests • headers are terminated by a blank line • Header Fields: • From - Email address of user of client program • Accept - MIME types of resources accepted by browser • Accept-Encoding - encoding accepted by browser • Accept Language - language accepted by browser (Preferred language (For example: English - en, French - fr, German - de)) • Accept-Charset : charset accepted by browser
From: • In internet mail format, the requesting user • Does not have to correspond to requesting host name (might be a proxy) • should be a valid e-mail address
Accept: • List of schemes which will be accepted by client • <field> = Accept: <entry> * [,<entry>] • <entry> = <content type> *[;<param>] • <param> = <attr> = <float> • <attr> = q / mxs / mxb • <float> = <ANSI-C floating point > • Accept: text/html • Accept: audio/basic q-1 • if no Accept is found; plain/text is assumed • may contain wildcards (*)
Accept-Encoding • Like Accept but list is a list of acceptable encoding schemes • Ex • Accept-Encoding: x-compress;x-zip
HTTP Request Headers (cont.) • Referer - the URL of document refering this URL • Authorization - user-agent wishes to authenticate itself with a server • Host : the host Request-URL points to • Charge-To • If-Modified-Since • Pragma • User-Agent : The browser or other client program sending the request • Cookies: Information persistence between requests
Referer • For Server’s benefit, client lists URL od document (or document type) from which the URL in request was obtained. • Allows server to generate back-links, logging, tracing of bad links… • Ex. • Referer: http:/www.w3.com/xxx.html