180 likes | 191 Views
Interoperable Repository Statistics. Les Carr & Tim Brody University of Southampton http://irs.eprints.org/. Introduction. Background to & history of our interest in statistics The purpose of the IRS project Progress so far Request for participation. Topics of Discussion.
E N D
Interoperable Repository Statistics Les Carr & Tim Brody University of Southampton http://irs.eprints.org/
Introduction • Background to & history of our interest in statistics • The purpose of the IRS project • Progress so far • Request for participation
Topics of Discussion • Statistics - how to get them • Statistics - what services we can build with them • Statistics - what current users want from them now • Statistics - new measures of impact • and significance? and quality? • Statistics - validation for the academic community
Some History • 1999 NSF / JISC International Digital Library Project “OpCit” • provide external citation linking service for LANL (as was) • ECS, Cornell, Los Alamos • Supporting study on use of citations • Analysed download patterns from UK mirror wrt citation patterns • Were downloads influenced by citations? • Could citation impact be predicted by downloads?
OpCit Outcome • Citebase ‘OAI’ service • Used OAI to obtain article metadata • Used local UK mirror’s file system to obtain • article text (to extract citations) • article download stats from web logs • No standard way to obtain necessary data. • By now most repositories accept web spiders. • Still no agreement with arxiv.org to harvest central download data
Stakeholders • Authors - • Encouragement to research • Encouragement to archive • Researchers • New Discovery & Navigation methods • New Filter mechanisms
Stakeholders (II) • Repository Administrators • Management & maintenance decisions • Marketing, feedback to research managers • Fund holders • Assess impact, inform future funding decisions
Internal Deposit Statistics • Who is depositing what, how frequently • Managers can collect info to feedback on effectiveness • Archives.eprints.org provides simple growth data • OpenDOAR could collect information on best practice • Evaluate policy outcomes • Are mandates necessary?
Management Stats (the Easy Way) • Google Analytics • Repository overview • Individual document downloads over time • JavaScript invokes external service to gather stats
Project Aims • investigate the requirements for UK and international stakeholders • design an (API) for gathering download data • build software • distribution and collection software for repositories • generic analysis and reporting tools
Scenario 1 • Forty physicists collaborate on a paper which is deposited into each of their institutional repositories plus arxiv.org • Each repository reports to its author that it has received n downloads. • How can they be aggregated?
Scenario 2 • Two repositories report that a paper has received 50 downloads. • Have they both filtered out spiders in the same way? Self-downloads? Repeated downloads from the same IP? • Are abstract and PDF downloads treated equivalently? • How can they be compared?
Participants • Project Partners • ECS (Leslie Carr, Stevan Harnad, Tim Brody), U Tasmania (Arthur Sale), Counter (David Goodman, Long Island U), Key Perspectives (Alma Swan) • International Panel • Rob Tansley (DSpace), Herbert Van de Sompel (OAI), Alberto Pepe (CERN), Laurent Romary (CNRS), Bill Hubbard (SHERPA), Leo Waaijers (DARE), Sune Karlsson (LogEc), Andrew Bennett (APSR)
Current Progress • Report on stakeholder requirements • 35 people representing 18 different institutions interviewed • Their priorities: • Origin of access (country, domain, institution) • Timing of access (date, time-series, cumulative) • Comments requested • http://irs.eprints.org/report/
Comments Sought • Are the pre-occupations of new repository managers correct? • How much effort should be devoted to new bibliometrics? • We need international feedback and support to go there! • Please join expert panel!
Impact for (e.g) UK • Research Assessment exercise • Performed in 2001, and next in 2008 • IRRA project defines roles for repository (collecting research evidence, providing it for panels) • But to simplify: • RAE results citation impact • citation impact correlated to downloads • But academic community still very wary of ‘web logs’ • but citation used to be the only auditable use of an article
Message to UK Fundholders • Let many flowers bloom (Harnad) • i.e. Freely provide download and citation statistics • So each community can define its own statistical measures of quality, impact and success • Physicists can use journals • Computer Scientists can use conferences • All disciplines can see the impact of their own departments, individuals, projects etc.
Next Steps • Looking for feedback • Looking for agreement • Looking for collaboration • How can we join in?