220 likes | 364 Views
Factors influencing Web browsing. Key points Web browsing - model Web session - anatomy Client side Network Server side. Web Browsing. Browser Internet Server Web Session anatomy User request DNS lookup HTTP Request Send Get / HTTP /1.1 Server checks request
E N D
Factors influencing Web browsing • Key points • Web browsing - model • Web session - anatomy • Client side • Network • Server side
Web Browsing • Browser Internet Server • Web Session anatomy • User request • DNS lookup • HTTP Request • Send Get / HTTP /1.1 • Server checks request • Server sends back HTML document or result via HTTP
DNS Lookup • DNS lookup is the process of resolving an IP address (ie 192.168.11.137) to a host name (ie summary.net). DNS names are registered with the global name server. Most web servers can be configured to do DNS lookups on the IP address of incoming requests, but is more efficient to not have the web server do it. Either the web server or Summary can do the lookups. Someone must do the lookups if you want the Countries and Domains reports to work. ...
HTTP request • Whenever your web browser fetches a file (a page, a picture, etc) from a web server, it does so using HTTP - that's "Hypertext Transfer Protocol". • HTTP is a request/response protocol, which means your computer sends a request for some file (e.g. "Get me the file 'home.html'"), and the web server sends back a response ("Here's the file", followed by the file itself). • That request which your computer sends to the web server contains all sorts of (potentially) interesting information. • In the following slide, we'll examine the HTTP request your computer just sent to this web server, see what it contains, and find out what it tells me about you.
The Raw Information • The following HTTP request was received from IP address 136.148.1.142 (port 59270) by IP address 195.60.17.142 (port 80): GET /dumprequest.html HTTP/1.0 Via: 1.1 cache3.lsbu.ac.uk:8080 (squid/2.5.STABLE4) X-Forwarded-For: unknown Host: djce.org.uk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 Accept: text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png, */*;q=0.5 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Accept-Language: en-gb,en;q=0.5 Connection: keep-alive Keep-Alive: 300 Cache-Control: max-age=259200 -_-_-_-_-_-_-__: ------------
The Analysis: Source IP address, port and proxy • Source IP address:136.148.1.142 • Source port:59270 • Via:1.1 cache3.lsbu.ac.uk:8080 (squid/2.5.STABLE4) • X-Forwarded-For: unknown • In order to send the appropriate response back to your computer, the web server necessarily knows your computer's IP address, and a port number to which to send the response. Your IP address seems to be 136.148.1.142, and the port number used was 59270. • On the other hand, there could be one or more proxy servers between your computer and the web server. If the HTTP request includes the header "Via", or "X-Forwarded-For", then that's a strong indication that there is at least one proxy server somewhere along the line. • If neither of those headers were present, that could mean that no proxy servers were involved, or it could mean that they just chose not to "reveal" themselves by adding those headers. • The X-Forwarded-For header suggests that there is at least one proxy server involved. The proxy server "closest" to the web server is 136.148.1.142. • Normally the X-Forwarded-For header would tell us your IP address, but in this case it hasn't.
Your IP address • For now we'll assume your IP address is 136.148.1.142. Let's see what we know about that address. • (Note, this section is nothing to do with HTTP in particular; this is just an example of what information can be determined from an IP address). IP address:136.148.1.142 DNS name:cache3.lsbu.ac.uk • Lots more interesting information can be learned from your IP address. For example whereabouts you are on the Internet, (roughly) what city you're in, and who your ISP is.
Destination IP address, port, host and protocol Destination IP address:195.60.17.142 Destination port:80Host:djce.org.uk Protocol:HTTP/1.0 • These headers tell us which web server you were trying to contact. If that seems odd, bear in mind that many web sites can be "hosted" on a single server, so when the request is received it needs to know which web site you were attempting to access. • The protocol used will almost always be either "HTTP/1.1" or "HTTP/1.0", and is a property of your computer's web browser and any proxies through which the request might have passed.
Requested URI • Requested URI:/dumprequest.html • Together with the 'Host' header and the destination port number (above), this specifies the document which should be retrieved. • Given all these values we can determine that the URL of the document which is being retrieved is: http://djce.org.uk/dumprequest.html
Request method and content Request method: GET Data: none • The request method is usually either "GET" or "POST". Basically if you fill in and submit a form on a web page it might generate a POST request (or it might be "GET"), whereas if you just click on a link, or activate one of your browser's "bookmarks" or "favourites", then the request method will always be "GET". • Therefore, if it's "POST", we can tell that a form was definitely submitted. The contents of the form would appear here, and there would also be some "Content-" headers describing the data. • Web browsers generate two kinds of "POST" data: either "multipart/form-data", which is used when uploading files to a web server, or the more common "application/x-www-form-urlencoded".
User agent • User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7Accept:text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png, */*;q=0.5Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7Accept-Language:en-gb,en;q=0.5The User-Agent header describes your web browser. Typically it contains the browser name and version (e.g. Firefox 1.0.7), your Operating System and version (e.g. Windows XP), and possibly additional information (such as which "service packs" you have installed). • The "Accept" headers describe what sort of things the web browser can handle, and what it would prefer to be given if there's a choice. • The "Accept" header itself describes which document types the web browser can handle, so for example we can tell whether your browser is capable of handling "image/png" graphics. • The "Accept-Charset" header describes what character sets are acceptable, so we can make some guesses as to what part of the world you might be in, and what language you might speak. For example, western European or north American users quite possibly only understand the "iso-8859-1", "us-ascii" and "utf-8" character sets, whereas "big5" would suggest that you might be Chinese. • "Accept-Encoding" describes the ability of your web browser to handle compressed transfer of documents. Nothing too interesting there, but it's another snippet of information about the browser you're using. • "Accept-Language" is more interesting though; it tells us what language(s) you prefer to receive your documents in - again, if the web server offers a choice. For example, if the header tells us that your preference is for "en-gb" followed by "en", that means you're probably an English-speaking Briton. "pt-br" on the otherhand would suggest a Portuguese-speaking Brazilian.
Referring page Referer:not present • The "referer" header tells us which document referred you to us - in essence, if you followed a link to get to this page, it is the URL of the page you came from to get here. • If on the other hand you didn't follow a link - maybe you clicked on a browser "bookmark", or maybe you just typed the address of this page directly into your browser - then the "referer" will be missing. And yes, that isn't how it should be spelt. :-(
Cookies Cookie:not present • Every time a web server provides you with a response (a page, a graphic, etc), it has the opportunity to send your browser a "cookie". These cookies are small pieces of information which your browser stores, and then sends back to that same web server whenever you subsequently request a document. • So there's two important points here: (1) each cookie is only sent back to the same web site as it came from in the first place, and (2) the "contents" of the cookie (the data it contains) can only be made up of whatever information the web server already knew anyway. For example, a web server can't just say "send me a cookie containing your e-mail address" unless that same web server had already sent you that information in the first place.
Connection control Connection: keep-alive Keep-Alive:300 • These headers are used to fine-tune the network traffic between you and the web server. They don't tell us much, except a little about the capabilities of your web browser.
Cache control Pragma: not present Cache-Control: max-age=259200 If-Modified-Since: not present • These headers control cacheing of the document. By examining them the we can detect if you used your browser's "refresh" button to force the page to reload. • For example, Mozilla (Netscape 6) sets "Cache-Control" to "max-age=0" when you use the "reload" button. MSIE 5.5 sets it to "no-cache" if you do a "hard" reload (while holding down the "control" key).
Authorisation Username: none • If you have "logged in" to a web site, your username appears here. • Note that this only applies to web sites which use proper HTTP authentication - typically, a "login" window pops up and you get three chances to enter your username and password, otherwise you see a page which says "Authentication Required" or similar. It doesn't apply to web sites where the "login" is a separate page. • It's also possible to supply the username and password in the URL you tell your browser to visit - for example, http://user:password@www.example.com/. In that case, the username would appear here too.
HTTP Lookup Summary • The most interesting pieces of information contained in the request are: the IP address of you and/or your HTTP proxy which document you requested which version of which browser you're using which page you came from to get here (if you followed a link) your preferred language(s) cookies • The "odd one out" in that list is "cookies". That's because the cookies only send to the web server information which it had previously sent to you (and your browser accepted). However, the problem is in knowing what it means. The meaning of the cookie is only actually known to the web server. • If you can get your browser to show you your cookies, you might be able to make a good guess as to what it means - for example a cookie called "LastLoginName" with a value of "fred" probably means that when you last logged in on that site, you used the username "fred". However, a cookie called "TGIDX" with a value of "wl4o6ulhw48lw845yh68hylohw45" is meaningless to everybody except the web server, so you really have no idea what information that cookie actually holds.
Client Side • Client’s hardware (CPU, memory, bus, hard disk, monitor, sound card, video card, …) • Client’s software (OS, browser, plug-in, …) • Client profiling • OS, Processor, Brower & version, frames – proprietary elements, Programming support – JavaScript, Java support, ActiveX, Plug-in, Connection speed, Cascading Style Sheets, Screen resolution and color depth, XML, Flash, Shock Wave, etc. • Browsers, major concern!
Network • Bandwidth • Modems (28.8kb, 56kb), ISDN (128kb), ADSL (516kb,1M,2M,8M,22M?), leased line T1 (1.544Mb), T3 (45 Mb) • Latency (virtual distance), hops • Network usage, simultaneous connections • TCP/IP, Packets
Server side • CPU, Memory, Hard disk (SCSI, RAID), BUS, OS, Database,
References • http://djce.org.uk/dumprequest.html • http://www.trl.ibm.com/projects/webui/profil/profil_e.htm • http://computer.howstuffworks.com/question372.htm • http://www.learnthenet.com/english/glossary/t_1line.htm • http://www.learnthenet.com/english/glossary/t_3line.htm