430 likes | 565 Views
Understanding library users you don't see. Techniques for tracking and analyzing library Web resources. Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University http://staffweb.library.vanderbilt.edu/breeding Marshall.breeding@vanderbilt.edu.
E N D
Understanding library users you don't see Techniques for tracking and analyzing library Web resources Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University http://staffweb.library.vanderbilt.edu/breeding Marshall.breeding@vanderbilt.edu Saturday June 24
Theme • For many libraries, the number of visitors of their Web site and electronic resources exceeds the numbers that visit their physical premises. It's vital for libraries to understand how these remote visitors approach the Web site, not only to measure use but to improve the resources themselves. Marshall Breeding will present a number of practical techniques that libraries can use to better understand the use of their Web-based resources. • Topics will include the basics of analyzing the server logs of the library's Web site, transaction logs from the OPAC, the complexities of measuring use of subscription-based electronic resources, and techniques for enhancing applications to better record how they are used.
Understanding remote users • Vital to providing relevant library services • More libraries may use library resources remotely through the Web than from physical library facilities • Must work harder to ensure that Web-based services meet patron needs • Move beyond hit counters and raw statistics to more sophisticated analysis and assessment
Analysis goals • Improve usability • Web site diagnostics • Understand user needs • Content selection decisions • Improve quality of service • Marketing • Budget justification • Strategy to increase interest and activity
Data sources for tracking remote use • Web server logs • Application logs • Remote tracking data (Google Analytics) • Vendor provided use statistics (e-resources)
Enterprise approach to analytics • Multiplicity of Resources to track • Web Servers • OPACS • E-Resources • Databases • Repositories • Important to track the flow of use among all the library’s Web-based resources • Beyond the library: study flow to and from higher-level Web sites and portals (University -> Courseware -> Library)
Web server logs • Web servers are routinely configured to record detailed information about each request. Common elements include: • File requested • Date / time stamp • Status code • Request directive (get, post, head) • Referrer (where the user came from) • User agent (browser and platform data)
Example Web log • Raw data for analysis process 2006-06-20 05:01:43 129.59.150.105 GET /index.pl - 80 - c-69-250-131-199.hsd1.md.comcast.net Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)http://www.google.com/search?hl=en&lr=&safe=off&q=september+11+television+archive 200 0 0 11752
Exploiting referral data • The query string component of the referrer can be parsed to reveal search terms and other interesting information • http://www.google.com/search?hl=en&lr=&safe=off&q=september+11+television+archive • User typed “september 11 television archive” in Google to find our site • Important to study how users get to your site • [example: TV News Public Web queries vs OpenWeb)
Analysis methodology • Go beyond simply counting pages • Identify Sessions • Categorize users • Determine use patterns • Measure interest • Time spent on Web site • Bounce rate • Page overlay analysis
Move from measurement to impact • Establish site goals • Benchmark current use • Implement goal oriented improvements • Measure impact • Repeat as needed • (Example: enhancement of TV News OpenWeb)
Appropriate data filtering • Requests from indexing bots (crawlers) can skew statistics • Count user requests and bot requests separately • Performance monitors • Link checkers • Monitoring crawler activity is an important component of SEO and Web site discoverability strategies.
Resource Discovery • How do users get to your site? • Track performance of the Web site relative to major search engines • SEO – Search engine optimization • Few users begin with library Web sites
Troubling statistic Where do you typically begin your search for information on a particular topic? College Students Response: • 89% Search engines (Google 62%) • 2% Library Web Site (total respondents -> 1%) • 2% Online Database • 1% E-mail • 1% Online News • 1% Online bookstores • 0% Instant Messaging / Online Chat OCLC. Perceptions of Libraries and Information Resources (2005) p. 1-17.
Library Discovery Model Web Library Web Site / Catalog Library as search Destination
TV News OpenWeb project • Dramatic increase in Web site activity and loan requests through systematic and controlled exposure of metadata to Google and other search engines • SEO (Search Engine Optimization) strategy • Helped the Archive become financially self-sufficient.
Selected utilities • Analog – free, open source • NetTracker – enterprise level Web analysis application • Google utilities • Sitemap – process for submitting Web pages for optimized indexing by Google with some assessment capabilities • Analytics – Sophisticated approach for measuring Web site performance
Analog • Free Open Source application • Basic Web statistics application • Includes fairly full set of static metrics • Command line utility – generates Web report • Windows, Unix, Linux, etc.
NetTracker • Unica Corporation • Enterprise level Web analytics • http://www.sane.com/
Google SiteMaps • XML specification for systematically submitting URLs that represent a Web site • Makes indexing more efficient but does not affect PageRank • SiteMap interface provides utilities for monitoring how the site has been indexed with some analytical information on terms used to find your Web site.
Google Analytics • Available at no cost from Google • Must receive invitation code • Slanted toward e-commerce • “Conversion University” – training on how to optimize Web site for high conversion rates. • Allows Webmasters to establish site goals and measure performance
Application-level reporting and analysis • Content management systems and other dynamically driven Web environments can provide additional usage information. • Can offer additional information beyond raw Web logs • More capabilities for identifying use based on user categories • Reporting can be built into the business logic of the application
Examples from the TV News Web Site • Reports of use by user category and institution • Statistics on resource use • Data on search types, query terms, etc. • Ability to track all aspects of business activity
Other sources of Use data • ILS OPAC Logs • Proxy Server logs and reports • Link resolver logs and reports
Limitations • Can’t know the intent of the user • User success can only be estimated • Difficult to obtain trends by user type • More aggressive reporting might intrude on privacy • Few libraries require the level of user authentication needed to determine use by type of patron
Additional Information • Breeding, Marshall. Strategies for Measuring and Implementing E-use. ALA TechSource. May-June 2002. 79 pages. • Breeding, Marshall. “Analyzing Web server logs to improve a site’s usage.” Computers in Libraries. Information Today. Medford, CT. October 2005.
Handout • Presentation will be available after the conference at: http://staffweb.library.vanderbilt.edu/breeding/presentations/ala2006.ppt