180 likes | 333 Views
Statistics in practice: Measuring and managing. Sampling in-library use. Sebastian Mundt sebastian.mundt@unibw-hamburg.de. Framework. Selection. Examples. Conclusions. Sampling in library statistics. “Official“ library statistics so far only allowed the full count:.
E N D
Statistics in practice: Measuring and managing Sampling in-library use Sebastian Mundt sebastian.mundt@unibw-hamburg.de
Framework • Selection • Examples • Conclusions
Sampling in library statistics “Official“ library statistics so far only allowed the full count: In libraries sampling has traditionally been used ... - for catalogue evaluation - in user surveys - in performance measurement (e.g. correct shelving). “Data referring to a period should cover the specified period in question, not the interval between two successive surveys.“ (ISO 2789:1991)
Sampling in library statistics Consequence: Important activities of use have previously not been reported in most countries. Category # datasets Libraries 35 Collections 41 Library use (lending) 19 Library use (other) 4 Expenditure 7 Library staff 4 (ISO 2789:1991) The revised International Standard ISO 2789:2001 now recommends sampling methods for ... The full count of some measures would be ... - too time consuming (costly), - practically impossible - too monotone. - information requests - in-house use - visits (gate count).
Framework • Selection • Examples • Conclusions
Sampling Sampling can be “selective“ as regards ... Sampling is selecting a subset of the population in question. A sample can be drawn randomly or not. The “accuracy“ of random samples can be measured in terms of error and confidence level. It depends on the sample size and the variance of the sample. - time (reporting period) - location (branch, service point) - objects (media) - persons (satisfaction, user behaviour)
Selection procedure purposive (judgement) sampling highly dependent on staff experience requires mimimum statistical knowledge ISO/FDIS 2789:2001 “The annual total is to be established from a sample count. The sample should be taken in one or more normal weeks and grossed up.” NISO Z39.7-2002 Draft Standard for Trial Use, Data Dictionary Version 2002a “A “typical week“ is a time that is neither unusually busy nor unusually slow. Avoid holidays, vacation periods, days when unusual events are taking place in the community or in the library. Choose a week in which the library is open its regular hours.“
“Typical week“ 100% 99,5% 96,8% 91,8% 85,1% 24,8% Administration: weekwise count is easier to organize. Cluster: weeks comprise days of different activity level. Visits per weekday (Münster UL) mon tue wed thu fri sat
“Typical week” % deviation of visits from annual mean (Münster UL) Periods of average activity as estimated by reference staff “Typical“ weeks can hardly be anticipated even from data collected over several years.
“Typical week“ Minimum/maximum values (Münster UL) +22,9% +21,7% max (all) +16,9% max (staff) +15,3% +15,8% +12.4% -11,6% min (staff) -15,1% -20.5% min (all) -17,4% -17,8% Data collected by purposive (judgement) sampling are a weak foundation for comparisons. -23,2% 1999 2000 1998
Framework • Selection • Examples • Conclusions
Selection method: case 1 Hourwise count is difficult to administer. A sample size of 52 hours (of 4,103 hours of service a year) was calculated given a confidence level of 90% and an error of +/- 11.23% Randomly and individually selected hours of the year (simple random sample) Total estimated by linear extrapolation Louisiana State University Libraries (reference statistics) Maxstadt, J.M. (1988): A new approach to reference statistics, C&RL (Feb. 1988), p. 85-88 Similar (daywise): Bauer, K. (2000): Gathering ARL reference data, http://info.med.yale.edu/assessment/methods.html
Selection method: case 2 Additional information (past data) is used to improve the sample. Separation of high and medium weeks difficult. Based on reference data of previous year, weeks were “classified“ in high, medium and low usage (stratified random sample). Sample size of 15 weeks was calculated given a confidence level of 95% and error of +/- 400 [ 10%]. Linear extrapolation of weighted class means. New York University / Bobst Library (reference statistics) Kesselman, M.; Watstein, S.B.: The measurement of reference and information services, JAL (1987, 1), p. 24-30
Selection method: case 3 Deals with “missing“ days. Allows small random sample of a few weeks once high correlation is confirmed. Extrapolation relative to boundary distribution (gate count) Found extremely high correlation (.957) between reference activity and gate count. University of South Carolina / Thomas Cooper Library (reference statistics) Lochstet, G.; Lehman, D.H.: A correlation method for collecting reference statistics, C&RL (Jan, 1999), p. 45-53
Selection method: case 4 visits refe-rence reser-vations inside reser-vations remote accountinfo rene-wals short loans visits 1.000 reference .876** 1.000 reserv. inside .802** .751** 1.000 reserv. remote .437** .347 .269** 1.000 account info .800** .765** .796** .220** 1.000 renewals In branch libraries the same datasets are collected. These can be used to extrapolate the sample count for visits and information requests. .523** .512* .568** .156** .759** 1.000 short loans .473** .383 .558** .117 .312** .140* 1.000 normal loans Which data from the library system can be used as boundary distribution? .506** .057 .656** -.019 .508** .283** .483** Münster University Library
Sampling locations main reading Branch A main 1.000 .437* 1.000 reading .593* .559* 1.000 Branch A Over the first half of 2002 no relationship between branches was found: Branch 1 Branch 2 Branch 3 Branch 4 Branch 1 1.000 Does reference activity in different branches correlate significantly? Does reference activity in different branches correlate significantly? -.064 1.000 Branch 2 .022 -.041 1.000 Branch 3 Branch 4 .122 .058 .031 1.000 Münster University Library University of the FAF / University Library, Hamburg - 4 branch libraries (3 interconnected) with separate service points and entrances in one building
Framework • Selection • Examples • Conclusions
Conclusions From the point of data collection management it seems useful to choose a week as sampling unit. “Normal“ weeks can hardly be anticipated even from data collected over several years. It is, however, likely that certain usage data show significant correlation and provide useful information for estimating totals. If data from automated systems are used for correlation the workload of sampling can be reduced. In-library use activities correlate with in-library use of automated systems. Significant remote use should be correlated separately (e.g. frequent e-mail reference). Sampling locations might reduce the workload of data collection further. Results, however, are ambivalent.