240 likes | 358 Views
OAW – LEKTIONSGANG 3. PRACTICALITIES. Newsgroup New photos ?. SUMMARY, LECTURE 2. Users, Visits, Pageviews Reach, Acquisition rate, Conversion Rate, Retention Rate, Loyalty Abandonment, Attrition, Churn Recency, Frequency, Monetary value, Duration, Yield
PRACTICALITIES • Newsgroup • New photos ?
SUMMARY, LECTURE 2 • Users, Visits, Pageviews • Reach, Acquisition rate, Conversion Rate, Retention Rate, Loyalty • Abandonment, Attrition, Churn • Recency, Frequency, Monetary value, Duration, Yield • Acquisition cost, Conversion cost, Net Yield, Connect rate
WEB SERVERS • A Web server is a program that, using the client/server model and the World Wide Web's Hypertext Transfer Protocol (HTTP), serves the files that form Web pages to Web users (whose computers contain HTTP clients that forward their requests). Every computer on the Internet that contains a Web site must have a Web server program. Two leading Web servers are Apache, the most widely-installed Web server, and Microsoft's Internet Information Server (IIS). Other Web servers include Novell's Web Server for users of its NetWare operating system and IBM's family of Lotus Domino servers, primarily for IBM's OS/390 and AS/400 customers. whatis.com, Feb. 2002
WEB SERVERS Netcraft.com, Feb. 2002
THE WEB SERVER LOG • An access log is a list of all the requests for individual files that people have requested from a Web site. These files will include the HTML files and their imbedded graphic images and any other associated files that get transmitted. The access log (sometimes referred to as the "raw data") can be analyzed and summarized by another program. In general, an access log can be analyzed to tell you: • The number of visitors (unique first-time requests) to a home page • The origin of the visitors in terms of their associated server's domain name (for example, visitors from .edu, .com, and .gov sites and from the online services) • How many requests for each page at the site, which can be presented with the pages with most requests listed first • Usage patterns in terms of time of day, day of week, and seasonally whatis.com, Feb. 2002
THE WEB SERVER LOG • Boundaries for any type of log analysis • Common Log Format – Extended CLF.
AN EXAMPLE, IT-C.DK (Oct 2001) - - [22/Oct/2001:02:22:24 +0200] "GET / HTTP/1.1" 304 0 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)" - - [22/Oct/2001:02:22:30 +0200] "GET /Internet HTTP/1.1" 301 300 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)" - - [22/Oct/2001:02:27:57 +0200] "GET /research/bed/ HTTP/1.1" 200 9079 "http://google.yahoo.com/bin/query?p=Boolean+expression&hc=0&hs=0" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" - - [22/Oct/2001:02:27:58 +0200] "GET /research/bed/icons/Book.gif HTTP/1.1" 200 227 "http://www.itu.dk/research/bed/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" - - [22/Oct/2001:02:27:58 +0200] "GET /research/bed/icons/Tools.gif HTTP/1.1" 200 251 "http://www.itu.dk/research/bed/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" - - [22/Oct/2001:02:27:58 +0200] "GET /people/hra/hoved_logo4.gif HTTP/1.1" 200 3643 "http://www.itu.dk/research/bed/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" - - [22/Oct/2001:02:43:58 +0200] "HEAD /people/kfl/fltk-1.0.4-linux-intel.rpm HTTP/1.0" 200 0 "-" "Slurp.so/1.0 (slurp@inktomi.com; http://www.inktomi.com/slurp.html)" - - [22/Oct/2001:03:03:08 +0200] "HEAD /courses/W2/F2001/ HTTP/1.0" 200 0 "-" "Mozilla/2.0 (compatible; Ask Jeeves)" - - [22/Oct/2001:03:03:10 +0200] "GET /courses/W2/F2001/ HTTP/1.0" 200 39357 "-" "Mozilla/2.0 (compatible; Ask Jeeves)" - - [22/Oct/2001:03:04:57 +0200] "HEAD /people/birkedal/papers/index.html HTTP/1.0" 200 0 "-" "-" - - [22/Oct/2001:03:11:40 +0200] "HEAD /people/birkedal/realizability/index.html HTTP/1.0" 200 0 "-" "Mozilla/3.0 (compatible)" - - [22/Oct/2001:03:22:03 +0200] "GET / HTTP/1.1" 200 77 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)" - - [22/Oct/2001:03:22:07 +0200] "GET /Internet HTTP/1.1" 301 300 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)" - - [22/Oct/2001:03:31:12 +0200] "GET /people/jm/ HTTP/1.0" 200 1539 "-" "ArchitextSpider" - - [22/Oct/2001:03:39:14 +0200] "GET /research/ddd/ HTTP/1.0" 200 2342 "-" "ArchitextSpider" - - [22/Oct/2001:03:42:35 +0200] "GET /connection HTTP/1.1" 404 272 "http://www1.umn.edu/twincities/directory/indexi.html" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 95; AT&T CSM6.0)" - - [22/Oct/2001:03:42:50 +0200] "GET / HTTP/1.1" 200 77 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 95; AT&T CSM6.0)" - - [22/Oct/2001:03:42:51 +0200] "GET /Internet HTTP/1.1" 301 300 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 95; AT&T CSM6.0)" - - [22/Oct/2001:03:43:10 +0200] "GET /Internet HTTP/1.1" 301 300 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 95; AT&T CSM6.0)" - - [22/Oct/2001:03:43:13 +0200] "GET /Internet HTTP/1.1" 301 300 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 95; AT&T CSM6.0)" - - [22/Oct/2001:03:45:24 +0200] "GET / HTTP/1.1" 304 0 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" - - [22/Oct/2001:03:45:26 +0200] "GET /Internet HTTP/1.1" 301 300 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" - - [22/Oct/2001:03:46:11 +0200] "POST /main/cgi-bin/people.cgi HTTP/1.1" 200 2206 "http://www.it-c.dk/English/find_person/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" - - [22/Oct/2001:03:47:34 +0200] "GET /courses HTTP/1.1" 301 299 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" - - [22/Oct/2001:03:47:57 +0200] "GET /courses/GP/F2000/index.html HTTP/1.0" 200 4393 "-" "Openfind data gatherer, Openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html)" - - [22/Oct/2001:04:00:47 +0200] "GET /sysadm/software/lprng/printcap HTTP/1.0" 200 2012 "-" "Wget/1.6" - - [22/Oct/2001:04:01:05 +0200] "GET /sysadm/software/lprng/printcap HTTP/1.0" 200 2012 "-" "Wget/1.6" - - [22/Oct/2001:04:02:00 +0200] "GET /sysadm/software/lprng/printcap HTTP/1.0" 200 2012 "-" "Wget/1.6" - - [22/Oct/2001:04:02:00 +0200] "GET /sysadm/software/lprng/printcap HTTP/1.0" 200 2012 "-" "Wget/1.6" - - [22/Oct/2001:04:02:49 +0200] "GET /sysadm/software/lprng/printcap HTTP/1.0" 200 2012 "-" "Wget/1.6" - - [22/Oct/2001:04:08:21 +0200] "GET /courses/GP/F2000/Eksempler/JavaSoftwareSolutions/chap07/Doodle.html HTTP/1.0" 200 255 "-" "Openfind data gatherer, Openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html)" - - [22/Oct/2001:04:12:28 +0200] "GET /people/tofte HTTP/1.1" 301 298 "http://www.it-c.dk/Internet/itu/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)" - - [22/Oct/2001:04:12:28 +0200] "GET /people/tofte/leftorange.htm HTTP/1.1" 200 1279 "http://www.it-c.dk/people/tofte/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)" - - [22/Oct/2001:04:12:29 +0200] "GET /people/tofte/pics/spacer22.GIF HTTP/1.1" 404 286 "http://www.itu.dk/people/tofte/leftorange.htm" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)" - - [22/Oct/2001:04:12:30 +0200] "GET /people/tofte/Tofte2.jpg HTTP/1.1" 200 10618 "http://www.it-c.dk/people/tofte/madscontents.htm" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)" - - [22/Oct/2001:04:15:28 +0200] "GET /courses/GP/F2000/Eksempler/JavaSoftwareSolutions/chap11/MirroredPictures.html HTTP/1.0" 200 228 "-" "Openfind data gatherer, Openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html)" - - [22/Oct/2001:04:23:33 +0200] "GET /courses/GP/F2000/Eksempler/Tekstfiler/places.txt HTTP/1.0" 200 90 "-" "Openfind data gatherer, Openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html)" - - [22/Oct/2001:04:24:14 +0200] "GET /people/hra/notes-index.html HTTP/1.0" 200 1670 "-" "ArchitextSpider" - - [22/Oct/2001:04:29:43 +0200] "GET /courses/GP/F2000/hold.html HTTP/1.0" 200 5212 "-" "Openfind data gatherer, Openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html)" - - [22/Oct/2001:04:33:22 +0200] "GET /courses/W2/ssh.html HTTP/1.1" 200 2602 "-" "-" - - [22/Oct/2001:04:36:34 +0200] "GET /~haas/GC/c-tut.html HTTP/1.0" 200 77 "http://www.student.dtu.dk/~c971714/GC/c-tut.html" "Mozilla/4.0 (compatible; MSIE 5.0; Mac_PowerPC)" - - [22/Oct/2001:04:36:37 +0200] "GET /~haas/GC/c-tut.php HTTP/1.0" 200 24819 "http://www.itu.dk/~haas/GC/c-tut.html" "Mozilla/4.0 (compatible; MSIE 5.0; Mac_PowerPC)" - - [22/Oct/2001:04:43:30 +0200] "GET /people/slauesen/ HTTP/1.0" 200 11173 "-" "Mozilla/2.0 (compatible; Ask Jeeves)" - - [22/Oct/2001:04:47:21 +0200] "GET /main/projektboers.html HTTP/1.0" 200 75604 "-" "Mozilla/2.0 (compatible; Ask Jeeves)" - - [22/Oct/2001:04:48:13 +0200] "POST /main/cgi-bin/people.cgi HTTP/1.0" 200 928 "http://www.it-c.dk/English/find_person/" "Mozilla/4.0 (compatible; MSIE 5.01; AOL 6.0; Windows 98)"
HOST • Fully qualified domain name of the client or its IP address if the name is unavailable • The address to which the server’s response will be sent • Reverse Address Lookup on the fly is possible – however in most cases performed while postprocessing the log instead • Important issues: dial up connections, proxies
IDENT • Identifier supplied by client applications that support identd (identification daemon) • Mail,Ftp,Irc .. Rarely http. • Also referred to as RFC931
AUTHUSER • The authenticated user name (if user authentication is required for that file)
TIME • Usually the time when the web server completed resonding to the HTTP request • DD/Month/YYYY:HH:MM:SS +XXX0
REQUEST • The actual request from the user client. Typically it looks like the following: • Different types of requests: GET, POST, HEAD • Protocol version included (HTTP/1.1) "GET /people/tofte/leftorange.htm HTTP/1.1"
STATUS • A three-digit status code, which the server returns to the browser • Four classes of codes. Information (100 series). Success (200 series). Redirect (300 series). Failure (400 series). Server Error (500 series). • Examples • 200 OK, 302 Redirect, 401 Unauthorized, 403 Forbidden, 404 File not found
BYTES • For GET requests: Number of bytes returned by the server to the client.
REFERRER • Indicates the page where the visitor was located when making the request • Important for path-analysis • Can be used for referring schemes and for measuring banner effects etc. • RFC2068 (HTTP/1.1): • Note: Because the source of a link may be private information or may reveal an otherwise private information source, it is strongly recommended that the user be able to select whether or not the Referer field is sent.
USERAGENT • Browser name/version (operating system) • "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)" • "Mozilla/3.0 (Macintosh; I; PPC)" • Mozilla • Mozilla was Netscape Communication's nickname for Navigator, its Web browser, and, more recently, the name of an open source public collaboration aimed at making improvements to Navigator.
USERAGENT - STATISTICS • Link1 • Link2 • Opasia.dk
MORE OPTIONS • Filename • Time-to-serve • IP address • Server port • URL-requested • Cookie
THE QUIZ • The referrer indicates where in the world the users is located. • Apache is an open source web server • Webserver failures returns a 30x status code • Apache can be installed on a Windows platform • It is possible to calculate a website’s traffic (eg Gb per month) from the web server log • One IP number is by definition one user • A line in a web server log file is maximum 80 characters • Microsoft has increased it’s market share during the last 8 months from approx 20% to 30% of all webservers in the world • User agent information is part of the Common Logfile Format
AN EXAMPLE - [22/Oct/2001:04:12:28 +0200]"GET /people/tofte/leftorange.htm HTTP/1.1" 200 1279 "http://www.it-c.dk/people/tofte/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"
MORE INFORMATION • Apache HTTP Server Documentation, Log Files • http://httpd.apache.org/docs/logs.html • Microsoft IIS Log Format • http://www.microsoft.com/windows2000/en/server/iis/htm/core/iiabtlg.htm#MicrosoftIISLogFormat • HTTP/1.1 Documentation • http://www.w3.org/Protocols/rfc2068/rfc2068
FURTHER ISSUES • Proxies • Firewalls