190 likes | 303 Views
Lies, Damn lies and Web Statistics. IWMW 2005: Who’s web is it anyway?. Dr. Mike Lowndes, Interactive Media Manager, Natural History Museum, London Houses 350-permanent scientific staff, plus postgraduate students; one of the largest UK research institutes in the natural sciences.
E N D
Lies, Damn lies and Web Statistics IWMW 2005: Who’s web is it anyway? Dr. Mike Lowndes, Interactive Media Manager, Natural History Museum, London Houses 350-permanent scientific staff, plus postgraduate students; one of the largest UK research institutes in the natural sciences. (Right-click or click-hold (Mac) and press k or select Speaker Notes)
Contents • Why bother? • Issues with web logs • Issues with analytic tools • Browser tracking • Comparison between approaches • Known issues with browser tracking • Nedstat input and findings from Newcastle University
Why bother? • Web log analysis is currently the main method used to quantify web site usage for reporting. • Results are used by the government as performance indicators for institutional websites. • Not accurate or meaningful most of the time • no good for absolute measurement of usage. Can be used for: • Trend analysis • Content preferences • ROI estimation • Checking and fixing your site • Understanding users behaviour • Testing assumed pathways
Issues with server logs • Dynamic IP • Many users using the same IP number over time. • Same user assigned many IP numbers over time. • Proxies • Several or many users behind 1 IP number • Caches (can be ‘in’ Proxies) • Commonly requested files cached closer to the users. • Can form the top 20-50 hosts accessing sites. • Robots and spiders • Few visits but lots of hits. • Analytic packages cannot keep up to date with all of them for exclusion. • Syndication • RSS feeds generate huge logs, but are not ‘read’ by humans initially. • Click-through configuration. • Reporting by analysis tools • Often weekly or monthly reports: realtime is very labour/server intensive • Reports often complex and techy.
Issues with log analysis tools • Webtrends vs Summary.net • 1. Natural History Museum • Summary SP (summary.net) Version 4.2.1, unregistered demo, default configuration • 2. UKOLN (Bath) • WebTrends (www.webtrends.com) Version 5, default configuration • Both tools were applied to the same log file • Default configurations – not removing robots • Note: WebTrends documentation not clear on this point
Summary SP Webtrends 7 Connections (hits) - +0.67% hits Page views (page hits) - +5.00% Visits (user sessions) - +0.07% Failed hits - +0.30% Average visit duration - -30.0% (+250%) Browsers IE 75% 86% Netscape compatible 2% 4% Referrers Top Level Domains US US UK UK AUS CAN NETHER NETHER CAN AUS JAP JAP Measurement discrepancies
Comparison between tools • Not a single measurement was identical. • Most measurements were within 5% • Visit duration measurement widely different, and can depend on configuration. Possible bug in WebTrends version 5. • Page view measurements were quite different. Results broadly similar but direct comparisons, especially of Page Views, are not really justified.
Browser tracking • Do they have fewer inaccuracies and distortions? • Is it easier on the web team? • Is it affordable? • Does it give us more information / better information?
Browser tracking • Requires code to be added to pages • Uses an image, sourced from the tracking website. Also uses javascript and cookies for gathering extended and repeat-visit information • Usually hosted services • Provide near real-time tracking • Few of the issues distorting logs affect these measurements (according to the blurb) • Main players: Nedstat, Nielson/Netratings, WebSideStory
Comparison between tools • Summary SP VS Nielson/Netratings • Run on one section of a site over a month. • ‘Visiting’ section of the Natural History Museum site – small but popular and easily tagged.
Results 3 – country • Depends on the quality of the geographical IP database, not the mode of tracking?
Conclusions regarding traditional Log analysis Assuming browser tracking is more accurate… • We have fewer visit sessions than we thought, but more visitors • Fewer visits (sessions), possibly due to robot exclusion • More visitors (unique users), possibly due to the masking effect of proxies/caches and browser caches • Visit duration is much shorter than thought • possibly due to robots/spiders and cache updating. • Country information is roughly accurate so long as a geographical lookup is used. • Activity of popular pages, which are often cached, will be underestimated
Browser tracking advantages • Almost real-time analysis, incremental data. • Better repeat user tracking and individual pathway analysis. • Configurable, graphical reports for non-techies • Techie still needs to configure those reports however, as an understanding of web analytics is required • Cut our monthly staff time down from 1.5 days to 1 hour • Appear to be more accurate in describing the activity of real people, but we would like to see some independent research.
Issues with browser tracking • Setup is not trivial: You need to add code to every page. • Multiple server / ownership issues. • Does not always work (or get full user details) if Javascript is turned off or cookies disallowed. • Does not work with text-only browsers. • Unknown compatibility with PDAs, mobiles etc. Questions: • Would we get different results with different hosted services? • ABCE: industry standards for measurement • Cookies often deleted unless user is confident in the source? • This would affect the measurement of repeat visitors and behaviour Political issues: • Issues with external hosting of institutional data • Security of personal data issues with external hosting • E.g. measurements of student and staff use of a VLE.
Next steps • Many private sector and public sector sites have already moved to browser tracking. • About 6 National Museums are currently discussing hosted browser tracking. • 5 Universities currently involved in a trial of NedStat.