1 / 33

Data, Data Everywhere: Making Sense of the Sea of User Data

The MaxData Project aims to maximize library investments in digital collections through better data gathering and analysis. This study compares various data collection methods and develops models to help libraries make informed decisions. The project teams conduct surveys, analyze library data reports, and delve into deep log analysis of usage data. Surveys cover demographics, reading incidents, and critical incidents, providing valuable insights on reading behaviors. Local log data is utilized for database usage, aiding in subscription management and service optimization. Vendor-supplied usage reports and other sources supplement log data for comprehensive analysis. Challenges and solutions related to vendor reports, link resolvers, meta-search engines, and proxy servers are explored. OhioLINK's deep log analysis showcases the value of data in uncovering usage patterns and behaviors. Overall, the project aims to empower libraries with data-driven strategies to enhance user experiences and optimize resources.

virgiliob
Download Presentation

Data, Data Everywhere: Making Sense of the Sea of User Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data, Data Everywhere: Making Sense of the Sea of User Data

  2. MaxData Project Carol Tenopir and Donald W. King Gayle Baker, UT Libraries Eleanor Read, UT Libraries Maribeth Manoff, UT Libraries David Nicholas, Ciber, University College London http://web.utk.edu/~tenopir/maxdata/index.htm

  3. MaxData “Maximizing Library Investments in Digital Collections Through Better Data Gathering and Analysis” Funded by Institute of Museum and Library Services (IMLS) 2005-2007

  4. Study Objectives • To compare different methods of data collection • To develop a model that compares costs and benefits to the library of collecting and analyzing data from various methods • To help libraries make the best use of data

  5. Study Teams • Surveys (UT and Ohio Libraries) • Library Data Reports (Vendor-provided and library collected) (UT Libraries) • Deep Log Analysis of raw journal usage data (Ciber and OhioLINK)

  6. A bit more about the surveys…

  7. Surveys

  8. Three Types of Questions • Demographic • Recollection • Critical (last) incident of reading

  9. Critical Incident Added to General Survey Questions • Specific (last incident of reading) • Includes all reading--e & print, library & personal • Detailed questions about last article read, e.g., purpose, value, time spent, format, how located, source • Last reading=random sample of readings • Allows detailed analysis

  10. What Surveys Answer that Logs Do Not • Non-library readings • Print as well as electronic readings • Purpose and value of readings • Outcomes of readings

  11. Surveys provide much useful data, but… • Surveys rely on memory and truthfulness • Response rates are falling • Surveys cost your users’ time • Surveys can only be done occasionally • Log reports and raw logs show usage

  12. Local Sources of Use Data • Local log data for databases • Vendor-supplied usage reports • Other sources of data

  13. Local Log Data: Database Use • Environment: Mixture of web-based and locally-loaded resources • Problem: Use data from vendors not available or not uniform • Solution: Log requests for databases from library’s database menu (1999- )

  14. Local Log Data: Process • MySQL and Perl CGI scripts • Log files compiled monthly • Process data with Excel and SAS • Extract, reformat, summarize, graph

  15. Uses of Local Log Data • Subscription management • Number of simultaneous users • Pattern of use of a database over time • Continuation decisions • Cost per request • Services management • Use patterns by day, week or semester • Location of users (campus, off-campus, wireless)

  16. Local Log Data: Issues • Logs requests for access, not sessions • No detail on activity once in database • Undercounts: • Aggregators and full-text collections • Bookmarked access • Metasearch • Other sources of usage data supplement log data

  17. Vendor-Supplied Usage Reports • Little post-processing of vendor data until 2002 • Made available upon request • Special attention to “big ticket” items • Full-text • Integrate subscription info with vendor data

  18. Vendor-Supplied Usage Reports: Additional Processing • ARL Supplemental Statistics • Use data for electronic resources requested: • Number of logins (sessions) • Number of queries (searches) • Number of items requested • Fiscal year: July ‘04 – June ‘05

  19. Vendor Reports to Review • University of Tennessee • Reports from 28 of 45 vendors listed as compliant with Release 1 of the Counter Code of Practice • Reports from 26 other vendors

  20. The Challenge of Vendor-Supplied Use Reports • Request mode • Delivery • Format • Time period • Subscribed / titles used / all titles

  21. Other Sources –Link Resolvers (e.g. SFX) • Past the database level to access of individual journals • Use is measured the same way across packages • Where vendor reports are unavailable or incomplete (Open Access, backfiles) • The more places SFX links are used (catalog, e-j list), the more complete the data

  22. Other Sources –MetaSearch Engines (e.g. MetaLib) • “Number of searches” data that may not be counted in vendor reports (Z39.50) • Most useful and interesting to see how patrons are using federated searching

  23. Other Sources –Proxy Servers (e.g. EZProxy) • Standard web log format captures data for every request to the server – this generates large logs that have to be analyzed • Some libraries send all users (not only remote users) through the proxy server for more complete log data

  24. OhioLINK deep log analysis (DLA) showcase • Choice of OhioLINK – oldest big deal, common publisher platform and source of interesting data • Two purposes: 1) to show what kinds of data that DLA could generate; 2) raise the questions that need to be asked • Raw server logs of off-campus use June to December ’04 (pick-up returnees) and on-campus use for October. Logs uniquely contained search and navigational behaviour, too

  25. Metrics • Four ‘use’ metrics employed – number of items/pages viewed, number of sessions conducted, number of items viewed in a session (site penetration) and amount of time spent online. • An ‘item’ might be: a list of journals – (subject or alphabetic), a list of journal issues, a contents page, an abstract or full-text article. • Search or navigational approach used (search engine, subject list of journals etc) • Users: returnees; by subject of journal and sub-net; name and type of institution.

  26. Is the resource being used? • Items viewed. 1,215,000 items viewed on-campus (1 month) and 1,894,000 items viewed off campus (7 months). • Titles used. • Journals available October 2004 = 5872 • 5,868 jnls used if content lists, abstracts & articles included; 5,193 if only articles included. • 5% of jnls accounted for 38% of usage; 10% for 53%, and 50% for 93%.

  27. Is the resource being used? • Number of journals viewed in a session. • Very pertinent: OhioLINK all about massive choice • Third of sessions saw no views to any items associated with a particular journal • Of two-thirds of sessions recording a journal item view, half viewed item (s) from 1 journal, 30% from 2 to 3 journals, 14% from 4 to 9 journals and 7% from 10+ • 49% of sessions saw a full text article viewed and the average number of articles viewed in a session was just over 2.

  28. Is the resource being used? • Site penetration • 23% viewed 1 item in a session, 40% viewed 2 to 4 items, 21% viewed 5 to 10 items, 9% viewed 11 to 20 and 7% viewed 21+. • Figures quite impressive when compared to other digital libraries. Thus, in the case of EmeraldInsight, 42% of users viewed just one item. Due to the greater level of download freedom offered by OhioLINK?

  29. Is the resource being used? • Returnees (off-campus) • 73% accessed OhioLINK journals once during the seven months (might have also used OhioLINK on campus). 22% came back between 2 to 5 times, 3% between 6 to 15 times and 2% more than 15 times. • Data compromised by floating IP addresses and multi-user machines

  30. What can we learn about the methods used to find articles? • Search engine popularity. • 41% of sessions saw search engine only being used and a further 23% of sessions saw engine used together with either the alphabetic or subject lists. • Users of engines more likely to look at wider range of: • Journals. 66% of those using search engine viewed 2 or more journals, compared to 43% using either alphabetic or subject lists. People using all three methods most likely to view 10 or more different journals; nearly 1 in 5 did so;

  31. What can we learn about the methods used to find articles? • Users of engines more likely to look at wider range of: • Subjects. Those utilising the engine were more likely to have viewed two or more subjects - 54% had done so compared to 41% of those whose sessions saw use of an alpha or subject list. • Older material. Search engine users viewed older material, while those accessing the service via the alphabetical or subject lists were more likely to view very current material.

  32. Issues • This is only pilot data • Caching means not all transactions recorded in logs • Studying usage patterns of a given IP address, not a given user and there are the consequent problems that arise from multi-user machines, proxy servers and floating IP addresses • There are problems with calculating session time • However: 1) use a number of metrics; 2) will be collaborated by survey techniques; 3) we have three years to perfect our techniques!

  33. References • Nicholas D, Huntington P, Russell B, Watkinson A, Hamid R. Jamali, Tenopir, C. The big deal: ten years on. Learned Information 18(4) October, 2005, pp?? • Nicholas D, Huntington P, Hamid R. Jamali, Tenopir, C. Journal of Documentation. 62, (2), 2006, pp?? • Nicholas D, Huntington P, Hamid R. Jamali, Tenopir, CFinding information in (very large) digital libraries: a deep log approach to determining differences in use according to method of access. Journal of Academic Librarianship. March 2006, pp??

More Related