1 / 19

2: Web Server Log

152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)“

wangs
Download Presentation

2: Web Server Log

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)“ 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET / HTTP/1.1" 200 12453 "http://www.yisou.com/search?p=data+mining&source=toolbar_yassist_button&pid=400740_1006" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /kdr.css HTTP/1.1" 200 145 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /images/KDnuggets_logo.gif HTTP/1.1" 200 784 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 2: Web Server Log 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)“ 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET / HTTP/1.1" 200 12453 "http://www.yisou.com/search?p=data+mining&source=toolbar_yassist_button&pid=400740_1006" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /kdr.css HTTP/1.1" 200 145 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 252.113.176.247 - - [16/Feb/2006:00:06:00 -0500] "GET /images/KDnuggets_logo.gif HTTP/1.1" 200 784 "http://www.kdnuggets.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" An extract from KDnuggets web log

  2. Page contents Web server log 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET … HTTP/1.1" 200 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /gps.html HTTP/1.1" 200 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 … Web Server Log – An Example KDnuggets.com Server http://www.kdnuggets.com/jobs/

  3. Web (Server) Log – In Depth A sample web log line 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)“ 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"

  4. Web log field: IP 152.152.98.11 IP address - can be converted to host name, such as xyz.example.com

  5. Web log fields: Name, Login - The name of the remote user (usually omitted and replaced by a dash “-”) - Login of the remote user (also usually omitted and replaced by a dash “-”)

  6. Time: HH:MM:SS Time Zone: (+|-)HH00 relative to GMT -0500 is US EST Web log field: Date/Time/TZ [16/Nov/2005:16:32:50 -0500] Date: DD/Mon/YYYY

  7. Web log field: Request "GET /jobs/ HTTP/1.1" Method: GET HEAD POST OPTIONS … HTTP protocol: e.g. HTTP/1.0 or HTTP/1.1 URL: relative to domain Note: the request is recorded as sent, so it may contain errors, hacks, and any strange thing you can imagine

  8. Web log field: Status code 200 Status (Response) code. Most important ones are: • 200 – OK (most frequent, hopefully) • 206 – partial access • 301 – permanently redirected (e.g. access to /courses is redirected to /courses/ ) • 302 – temporarily redirected • 304 – not modified • 404 – not found • …

  9. Web log field: Object size 15140 size of the object returned to the client, in bytes Can also be “-” if status code is 304 (not modified)

  10. Web log field: Referrer http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N URL the visitor came from (here it was a Google query for “salary for data mining”, 2nd page of results – starting from 10) Referrer can also be a static page, internal (same domain) or external (different domain), or “-” in case of a direct request (e.g. type-in, bookmark) Referrer analysis is very valuable

  11. Web log field: User agent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" User agent (browser) http://en.wikipedia.org/wiki/User_agent Almost all browsers start with Mozilla – for historic reasons In many cases additional information: Browser type, version : MSIE 6.0 - Internet Explorer 6.0 OS: Windows NT 5.1 (XP SP2) with .NET Framework 1.1 installed

  12. Web Usage Mining • Basic • Totals • Simple • Request level breakdowns • Advanced • Visit level analysis • Target pages; Conversion analysis

  13. Web Log Analysis Programs • Free • Analog, awstats, webalizer • Google analytics • Commercial • WebTrends, WebSideStory, … www.kdnuggets.com/software/web-mining.html

  14. Web Usage Mining - Basic • Totals for each component • Hits – total number of requests • Files – number of GETs • Pages – number of HTML pages • Sites – unique IP addresses • Response codes • Kbytes – total Kbytes transferred • User Agents

  15. More details Example: KDnuggets.com Nov 2005 totals Monthly Statistics (from webalizer) Q: What is the meaning of the difference between Hits and Files?

  16. Example: KDnuggets.com Nov 2005 totals, 2 Monthly stats for Files by Status Code Answer: the difference between Hits and Files is the number of requests with status code not 200.

  17. Difference between Files and Pages • Q: What is the meaning of difference between Files and Pages ?

  18. Difference between Files and Pages • A: the difference between Files and Pages is the number of non-HTML files (e.g. image, javascript, etc • In November 2005 KDnuggets log HTML files were about 1/3 of all requests • However, this data does not separate bot requests (which are heavily weighted towards HTML pages)

  19. Notes: web log formats • We used web log in Apache standard format • Some old logs have a different format without the last 2 fields (referrer and user agent), but these are now rare.

More Related