300 likes | 765 Views
Web surveys typically associated with user data, not usage data. Even if usage, web ... networking solution for surveying all networked services users during ...
E N D
Slide 2:Why Evaluate Usage of Digital Resources?
Data driven decisions Justification to patron groups Budget justification to external funding sources. Collection development decisions Outputs for performance assessment Assessment of service quality Outcomes assessment Strategic planning
Slide 3:Cost
Association of Research Library members spend 215% more per serial unit cost in 2003 than they did in 1986. The average expenditures for serial subscriptions for all serials (not just scholarly journals) in ARL academic libraries in 2003 are $5.46 million. From 1984 to 2002, business and economics journals increased in price 423.7%, chemistry and physics journals increased 664%, and journals in medicine by 628.7%.
Slide 4:Cost
Slide 5:Vendor Supplied Data
Problems Vendor reports do not provide sufficiently detailed information. Vendor reports are inconsistent in their application of the definitions of variables. Vendor reports are not commensurable between each other. Some vendors do not report anything. Practical solutions Number of login (sessions) to networked electronic resources Number of queries (searches) in networked electronic resources Number of items requested in networked electronic resources. Turnaways or exceed simultaneous use level. Monthly Level of effort, both by the vendor and by the library
Slide 6:Vendor Supplied Data
Project COUNTER - Counting Online Usage of Networked Electronic Resources http://www.projectcounter.org/ ICOLC – International Coalition of Library Consortia http://www.library.yale.edu/consortia/ ISO – International Standards Organization ISO 11620 Library Performance Indicators http://www.iso.org/ NISO – National Information Standards Organization NISO Z39.7 Library Statistics http://www.niso.org/
Slide 7:ARL E-Metrics
As summarized by Blixrud and Kyrillidou (2003), asks for the following data from ARL libraries for measuring use of networked electronic resources, data which most libraries can only provide by collecting and analyzing vendor-supplied transaction data: Number of login (sessions) to networked electronic resources Number of queries (searches) in networked electronic resources Number of items requested in networked electronic resources.
Slide 8:Web Statistics
Web server log files transaction - client/server Technical representation of tasks performed by server Log files (common) IP address of requesting computer Remote host: name of computer accessing the web server Name of remote user (usually blank) Login of remote user (usually blank) Date Each and every response from the server - whether it indicates success, an error, or even a timeout (i.e. no response) - gets logged in the server's logfile. Since the server was hit by a request, such a reponse is called a Hit. In other words, the total number of hits must equal the total number of lines in the logfile minus the number of corrupt and empty lines. A typical logfile entry in the Common Logfile Format looks like: hostname - - [01/Feb/1998:10:10:00 +0100] "GET /index.html HTTP/1.0" 200 4839 The hostname field contains the full qualified domain name (FQDN) of the site accessing your server (see »Special Cases« below). The next two fields usually contain a minus (`-') to indicate that those fields are empty. The date is surrounded by square brackets ('[' and ']'). The next field contains the request. It contains the request method (GET for example), the name of the requestet document (URI), and the protocol specification (HTTP/1.0). The following field contains the servers's response code (200 stands for an »OK«, while 404 would mean »Document not found«, for example). The last field contains the size of the document (some servers log the number of bytes transferred actually, while other servers log the size of the document, which makes a difference if the user interrupts the transfer before the document could be transmitted completely. There are two other logfile formats, the Combined or Extended Logfile Format. Those formats add the user-agent (browser type) and the referrer URL (the page, which contains a link to the requested document if this request for such document has been generated by following a link) to the logfile entry. Those Combined or Extended Logfile Format append following two fields to the Common Logfile Format (CLF) in one of two usual ways: CLF Mozilla/2.0 (X11; IRIX 6.3; IP22) http://foo/bar.htmlCLF "http://foo/bar.html" "Mozilla/2.0 (X11; IRIX 6.3; IP22)" Each and every response from the server - whether it indicates success, an error, or even a timeout (i.e. no response) - gets logged in the server's logfile. Since the server was hit by a request, such a reponse is called a Hit. In other words, the total number of hits must equal the total number of lines in the logfile minus the number of corrupt and empty lines. A typical logfile entry in the Common Logfile Format looks like: hostname - - [01/Feb/1998:10:10:00 +0100] "GET /index.html HTTP/1.0" 200 4839 The hostname field contains the full qualified domain name (FQDN) of the site accessing your server (see »Special Cases« below). The next two fields usually contain a minus (`-') to indicate that those fields are empty. The date is surrounded by square brackets ('[' and ']'). The next field contains the request. It contains the request method (GET for example), the name of the requestet document (URI), and the protocol specification (HTTP/1.0). The following field contains the servers's response code (200 stands for an »OK«, while 404 would mean »Document not found«, for example). The last field contains the size of the document (some servers log the number of bytes transferred actually, while other servers log the size of the document, which makes a difference if the user interrupts the transfer before the document could be transmitted completely. There are two other logfile formats, the Combined or Extended Logfile Format. Those formats add the user-agent (browser type) and the referrer URL (the page, which contains a link to the requested document if this request for such document has been generated by following a link) to the logfile entry. Those Combined or Extended Logfile Format append following two fields to the Common Logfile Format (CLF) in one of two usual ways: CLF Mozilla/2.0 (X11; IRIX 6.3; IP22) http://foo/bar.htmlCLF "http://foo/bar.html" "Mozilla/2.0 (X11; IRIX 6.3; IP22)"
Slide 9:Log Files
Referrer Log File URL requested from or referring page Agent Log File Browser Operating system Name of spiders or robots used to probe your web site IP address of requesting computer Example 127.0.0.1 - frank [10/Oct/2004:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
Slide 10:Log files generated by library proxy servers
Proxy servers or passthrough (clickthrough) servers firewalls are based in some degree on an examination of headers Can examine all requests that pass through it, so it is starting to make sense to put a proxy server in front of all library databases and ejounals. Increasingly used as a data collection point for commensurable or comparable data.
Slide 11:What do log files tell us?
Nothing if they are not analyzed. What pages are requested on your site IP addresses of computers making requests Date and time of requests Success of file transfer Last page a requester visited before coming to your site Search terms which led someone to your site. The statistics report contains among others the following information: ? the number of hits, 304's, files, pageviews, sessions, data sent (in KB) ? the amount of data requested, transferred, and saved by cache (in KB) ? the number of unique URLs, sites, and sessions per month ? the number of all response codes other than 200 (OK) ? the average hits per weekday and for last week ? the maximum/average hits per day and per hour ? the number of hits, files, 304's, sites, data sent by day ? the top 5 days, 24 hours, 5 minutes and 5 seconds of the summary period ? the top 30 most commonly accessed URLs (hits, 304's, data sent) ? the 10 least frequently accessed URLs (hits, 304's, data sent) ? the top 30 client domains accessing your server most often ? the top 30 browser types The statistics report contains among others the following information: ? the number of hits, 304's, files, pageviews, sessions, data sent (in KB) ? the amount of data requested, transferred, and saved by cache (in KB) ? the number of unique URLs, sites, and sessions per month ? the number of all response codes other than 200 (OK) ? the average hits per weekday and for last week ? the maximum/average hits per day and per hour ? the number of hits, files, 304's, sites, data sent by day ? the top 5 days, 24 hours, 5 minutes and 5 seconds of the summary period ? the top 30 most commonly accessed URLs (hits, 304's, data sent) ? the 10 least frequently accessed URLs (hits, 304's, data sent) ? the top 30 client domains accessing your server most often ? the top 30 browser types
Slide 12:More log files
Logs and reports from locally implemented journal article services Logs and reports from locally implemented digital library projects ILS log files and reports Becoming more interesting with metasearch engines OPAC The statistics report contains among others the following information: ? the number of hits, 304's, files, pageviews, sessions, data sent (in KB) ? the amount of data requested, transferred, and saved by cache (in KB) ? the number of unique URLs, sites, and sessions per month ? the number of all response codes other than 200 (OK) ? the average hits per weekday and for last week ? the maximum/average hits per day and per hour ? the number of hits, files, 304's, sites, data sent by day ? the top 5 days, 24 hours, 5 minutes and 5 seconds of the summary period ? the top 30 most commonly accessed URLs (hits, 304's, data sent) ? the 10 least frequently accessed URLs (hits, 304's, data sent) ? the top 30 client domains accessing your server most often ? the top 30 browser types The statistics report contains among others the following information: ? the number of hits, 304's, files, pageviews, sessions, data sent (in KB) ? the amount of data requested, transferred, and saved by cache (in KB) ? the number of unique URLs, sites, and sessions per month ? the number of all response codes other than 200 (OK) ? the average hits per weekday and for last week ? the maximum/average hits per day and per hour ? the number of hits, files, 304's, sites, data sent by day ? the top 5 days, 24 hours, 5 minutes and 5 seconds of the summary period ? the top 30 most commonly accessed URLs (hits, 304's, data sent) ? the 10 least frequently accessed URLs (hits, 304's, data sent) ? the top 30 client domains accessing your server most often ? the top 30 browser types
Slide 13:ILS Log Files
OPAC Search statistics Number of searches attempted By fields Search terms Null results Print statistics such as items checked out, holds placed, etc. Difficult to track usage of 856 links. The statistics report contains among others the following information: ? the number of hits, 304's, files, pageviews, sessions, data sent (in KB) ? the amount of data requested, transferred, and saved by cache (in KB) ? the number of unique URLs, sites, and sessions per month ? the number of all response codes other than 200 (OK) ? the average hits per weekday and for last week ? the maximum/average hits per day and per hour ? the number of hits, files, 304's, sites, data sent by day ? the top 5 days, 24 hours, 5 minutes and 5 seconds of the summary period ? the top 30 most commonly accessed URLs (hits, 304's, data sent) ? the 10 least frequently accessed URLs (hits, 304's, data sent) ? the top 30 client domains accessing your server most often ? the top 30 browser types The statistics report contains among others the following information: ? the number of hits, 304's, files, pageviews, sessions, data sent (in KB) ? the amount of data requested, transferred, and saved by cache (in KB) ? the number of unique URLs, sites, and sessions per month ? the number of all response codes other than 200 (OK) ? the average hits per weekday and for last week ? the maximum/average hits per day and per hour ? the number of hits, files, 304's, sites, data sent by day ? the top 5 days, 24 hours, 5 minutes and 5 seconds of the summary period ? the top 30 most commonly accessed URLs (hits, 304's, data sent) ? the 10 least frequently accessed URLs (hits, 304's, data sent) ? the top 30 client domains accessing your server most often ? the top 30 browser types
Slide 14:Log Analysis Software
Analog http://www.analog.cx/ example http://www.statslab.cam.ac.uk/webstats/stats.html http-Analyze http://www.netstore.de/Supply/http-analyze/ WebTrends http://www.netiq.com/webtrends/default.asp
Slide 15:Log Analysis Software
Slide 16:Issues with web surveys
Non-probability Entertainment surveys Self selected surveys Volunteer panels Probability Intercept (every nth) Surveys that obtain respondents from an e-mail request. Mixed-mode surveys where one of the options is a Web survey. Pre-recruited panels of a particular population as a probability sample
Slide 17:Issues with web surveys
Research design Coverage error Unequal access to the Internet Internet users are different than non-users Response rate Response representativeness Random sampling and inference Non-respondents
Slide 18:Issues with web surveys
Mistrust of web surveys Vendor data is census; web survey is a sample Web surveys typically associated with user data, not usage data. Even if usage, web surveys often collect predicted, intended or remembered usage, not actual usage Web survey forms make appear differently in different browsers
Slide 19:Networked electronic resources and services - assessment environment -
Resources are accessible from many different web pages and web servers Bookmarks The survey data must be collected and commensurable for all networked electronic resources. Different authentication methods have to be accommodated, whether the institution used IP, password, referring URL, or an authentication and access gateway. Remote usage has to be measured, regardless of the channel of communication, whether locally implemented proxy server, modem pool, or other institutional service.
Slide 20:MINES strategy
A representative sampling plan, including sample size, is determined at the outset. Typically, there are 48 hours of surveying over 12 months at a medical library and 24 hours a year at a main library. Random moment/web-based surveys are employed at each site. Participation is usually mandatory, negating non-respondent bias, and is based on actual use in real-time. Libraries with database-to-web gateways or proxy re-writers offer the most comprehensive networking solution for surveying all networked services users during survey periods.
Slide 21:Web Survey Design Guidelines
Web survey design guidelines that MINES followed: Presentation Simple text for different browsers – no graphics Different browsers render web pages differently Few questions per screen or simply few questions Easy to navigate Short and plain No scrolling Clear and encouraging error or warning messages Every question answered in a similar way - consistent Radio buttons, drop downs Introduction page or paragraph Easy to read Must see definitions of sponsored research. Can present questions in response to answers – for example if sponsored research was chosen, could present another survey
Slide 22:How to implement web surveys on library web sites
Because the point of use requirement, libraries that had a virtual gateway in library web architecture succeeded the best. Rewriting proxy server Database-to-web solutions Serials Solutions Interestingly openURL solutions are a gateway.
Slide 23:Library web architecture
Slide 24:Digital Libraries
Slide 25:Digital Libraries
Slide 26:Pre-print and post-print servers
Slide 27:Pre-print and post-print servers
Slide 28:Open Access Journals
Slide 29:Library web architecture
Slide 30:What is the future of assessment of networked electronic services
Library is responsible for many heterogeneous resources, not just subscriptions. A library gateway could position the library to constantly assess usage of its resources. This tool will just be one of many, along with LibQUAL+tm and other initiatives.