240 likes | 315 Views
Usage Data Analysis Using Python. Lei Jin Electronic Resources Librarian Josephine Choi Library Technician – Acquisitions, Electronic Resources and Serials. Background. Project Goals. Implement reporting system using EZproxy data Cover all electronic subscriptions
E N D
Usage Data Analysis Using Python Lei Jin Electronic Resources Librarian Josephine Choi Library Technician – Acquisitions, Electronic Resources and Serials
Project Goals • Implement reporting system using EZproxy data • Cover all electronic subscriptions • Integrate demographic details to usage data • Explore patterns for evidence-based decision making
Project Milestones • Project planning started Spring 2013 • First prototype in-house Analyzer created, Spring 2014 • First reports generated Jan 2015 • Began open source proxy analyzer testing • Ezproxy logs and registration data processed, Aug 2016 • Second prototype in-house analyzer launched, Oct 2016 • On campus Ezproxy implemented, Jan 2017 • Ezproxy logs and registration data processed, Aug 2018 • Reports generated using third prototype in-house analyzer, Jan 2019
Scope of the project • Who: 1st year, 2nd year, 3rd year, 4th year, graduate students • When: September 2017 - August 2018 • What: ezproxy usage data both on and off campus + student profiles from registrar office • Why: dissect usage by student profile, by database, by programs, by faculty
Simplify Anonymize EDA (activities per second) Combine EDA (session) Extract Future Development Faculty Profile and other by-products
Raw Data Here is an example of what the log would look like 192.168.1.1 QsmG5smxpT1iPiU - jochoi [10/Dec/2013:15:41:00 -0500] "GET http://ezproxy.lib.ryerson.ca:80/connect?session=sQsmG5smxpT1iPiU&url=http://code.paperless.com HTTP/1.1" 302 0 192.168.1.1 QsmG5smxpT1iPiU - jochoi [10/Dec/2013:15:41:01 -0500] "GET http://code.paperless.com:80/ HTTP/1.1" 200 2336 192.168.1.1 QsmG5smxpT1iPiU - jochoi [10/Dec/2013:15:41:02 -0500] "GET http://code.paperless.com:80/default.taf?_function=main HTTP/1.1" 200 1189 Size: • Offcampus 3.67 GB zipped • On-campus 1.07 GB zipped • Equals to 125 GB from 2016-18 before being converted to .csv
Tool • We use Python to handle most of the data processing procedure • Codes are stored in Jupyter Notebook and can be shared with those who are interested • Tableau is used as visualization tool
Acknowledgement • We use code by Petrina Collingwood for the first step of the process (‘Simplify’) (https://github.com/prcollingwood/ezproxy) • Github as a good starting point for those who is interested in developing their own data project • Some of the codes for this project can be found here https://github.com/josiechoi/ezproxy-student
Findings Access trend based on user group (students vs. staff/faculty) • the plot (right) shows the annual trend (per second) from Sept 2017-Aug 2018 (without applying moving average). The data is subsetted by user group (student vs. staff/faculty) • the usage shown here maybe affected by cases of massive downloads • student and staff/faculty may follow different patterns • our student usage has a dominant effect over the overall trend of our usage stats
Access trend for on-campus and off-campus • The plot (right) shows annual trend (per second) of on-campus and off-campus usage. The data has been smoothened to mitigate seasonality • The trend resembles full-time vs. part-time students • Both on-campus and off-campus shows two peaks (one smaller one, followed by a big one)
Access trend for Full-Time and Part-Time students • This plot (right) shows the annual trend (per second) from Sept 2017 to Aug 2018. The data is subsetted based on full-time/part-time status • We have smoothened the data (i.e. applying moving average) to mitigate the impact of seasonality • The trend of full-time student clearly shows two peaks in each semester (a small peak, followed by a big one). The same pattern was not found in the trend of part-time student
Access trend for graduate and undergraduate students • The plot (right) shows the annual trend (session ) for Graduate and Undergraduate students • The two peaks pattern can be found in undergraduate student; however, it is not as prominent in graduate student
Live Demo Ryerson Intranet Tableau
Next steps • Mapping ezproxy data with our erm records • Combining ezproxy data with erm and acquisition data • Include faculty research profiles • Share reports with Collection liaison leads • Incorporate reports into ER evaluation workflow