140 likes | 671 Views
Measuring readership. Dr Jim Briggs. Why measure readership?. Interest / ego Assess popularity / audience reach For marketing reasons For content planning reasons Community building Plan technical resources More readers => faster server/network Fewer readers => update less frequently?.
E N D
Measuring readership Dr Jim Briggs WEBP readers
Why measure readership? • Interest / ego • Assess popularity / audience reach • For marketing reasons • For content planning reasons • Community building • Plan technical resources • More readers => faster server/network • Fewer readers => update less frequently? WEBP readers
3 main methods • Guestbooks • Counters • Access logs WEBP readers
Guestbooks • Web forms that readers can complete with comments about a web page or web site • A number of free guestbook services are available • Not normally a reliable means of assessing readership of a web site • No compulsion on a reader to complete an entry • No evidence to show that the readers who do record comments are representative of the majority WEBP readers
Counters • Usually an image of a number • Image created dynamically each time it is accessed • Number shown incremented each time • Counter image often created on a separate server on which the count is stored and incremented • Counters are moderately reliable • However do not count all readers – specifically: • browsers set not to automatically load images • pages whose loading is aborted before image requested • Only tells you how many readers, nothing else WEBP readers
Access logs • Virtually all web server software can be configured to retain a log of accesses • Stores basic data about each request received • Can be analysed • using standard tools e.g. Sawmill, webalizer, WebTrends, NetTracker • report most frequently accessed pages, Internet addresses generating most requests, times of peaks and troughs in demand WEBP readers
Common log format (CLF) • remotehost rfc931 authuser [date] "request" status bytes • remotehost – remote hostname or IP address • rfc931 – remote logname of the user • authuser – authenticated username (if available) • date – date and time of the request • request – request line exactly as it came from the client • status – HTTP status code returned to the client • bytes – content-length of the document transferred WEBP readers
Extended Log File Format • Allows additional fields to be stored in access log • Referrer URL • Browser identification • Allows aggregation of data WEBP readers
Log examples resnet-36498.remote.port.ac.uk - - [03/Jan/2005:09:30:22 +0000] "GET /projects/index.htm HTTP/1.1" 200 9019 "http://www.pums.cam.port.ac.uk/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" resnet-36498.remote.port.ac.uk - - [03/Jan/2005:09:30:22 +0000] "GET /projects/images/new.gif HTTP/1.1" 200 117 "http://www.pums.cam.port.ac.uk/projects/index.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" crawl-66-249-64-55.googlebot.com - - [03/Jan/2005:04:59:32 +0000] "GET /hcc/sihi/sihi2004/proceed/shields.ppt HTTP/1.0" 304 - "-" "Googlebot/2.1 (+http://www.google.com/bot.html)" s1.cache.iso.port.ac.uk - - [06/Jan/2005:13:56:49 +0000] "GET / HTTP/1.0" 304 - "http://www.tech.port.ac.uk/staffweb/briggsj/links.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" WEBP readers
Reliability of access logs • Access logs are more reliable than counters • count the requests for a page rather than that of some element within it • not possible for browser to bypass the counting, meaning the results are more accurate • Downside: • need to store the logs on the server machine • for a website that is attracting large numbers of accesses, the size of the log can grow very quickly • to do long-term analysis of readership may therefore require a large amount of disk space WEBP readers
Access logs only record requests that reach the server Do not count requests serviced from the browser's cache (e.g. when user pressed back button) caching of content by third parties such as ISPs, search engines, or proxy servers Possible to prevent the caching of a site's content by time-expiring elements of it Obviously this means extra load on your site and will effect the delivery time of pages Better (more efficient) to do this with small page components (e.g. images) rather than the whole page itself However… WEBP readers
Web analytics • What should we count in the log? • IFABC standard is the page impression • file, or a combination of files, sent as a result of a request being received by the server • could be all pages, except • images • JavaScript files, style sheets, etc. WEBP readers
Automated accesses • Many page impressions are automated ones • i.e. not a real user, but created by a web robot that visits pages systematically to construct search engine indexes, etc. • general view is that these should not be counted as page impressions WEBP readers
References • Measuring the readership of a health-related website, Briggs JS • http://www.disco.port.ac.uk/hcc/pubs/edmonton1999.htm • International Federation of Audit Bureaux of Circulations (IFABC) • http://www.ifabc.org/standards.htm • Common log file format • http://www.w3.org/Daemon/User/Config/Logging.html • Extended Log File Format • http://www.w3.org/TR/WD-logfile WEBP readers