550 likes | 579 Views
`. Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye UKSG Annual Conference April 2011, Harrogate. UKSG Usage Factor Project. Project rationale Issues addressed before data collection and analysis Collecting and analysing the data What data we collected
E N D
` Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye UKSG Annual Conference April 2011, Harrogate
UKSG Usage Factor Project • Project rationale • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
UKSG Usage Factor Project • Project rationale • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we have collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
The challenge……. ISI's Impact Factor compensates for the fact that larger journals will tend to be cited more than smaller ones • Can we do something similar for usage? • In other words, should we seek to develop a “Usage Factor” as an additional measure of journal quality/value?
For example….. Usage Factor = Total usage over period ‘x’ of articles published during period ‘y’ Total articles published during period ‘y’
Usage factor advantages • Especially helpful for journals and fields not covered by ISI • Especially helpful for journals with high undergraduate or practitioner use • Especially helpful for journals publishing relatively few articles • Data available potentially sooner than with Impact Factors
“Authors select journals that will give their articles prestige and reach. Impact Factor is a widely used surrogate for the former, while perceived circulation and readership reflect the latter. But usage is becoming more important as a measure of reach” Carol Tenopir
UKSG Usage Factor Project • Project rationale • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
UKSG Usage Factor Project • Brief background • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we have collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
Key data issues we have addressed • Consistency – numerator/denominator • Defining article usage year • Defining article publication date • Different usage patterns by subject
With key data issues addressed, we developed a specification for a report via which participating publishers would deliver real usage data for analysis
Real journal usage data analysed by John Cox Associates and Frontline GMS • Participating publishers:- • American Chemical Society • Emerald • IOP • Nature Publishing • OUP • Sage • Springer
UKSG Usage Factor Project • Project rationale • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
UKSG Usage Factor Project • Brief background • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
UKSG Usage Factor Project • Brief background • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
The data • 326 journals • 38 Engineering • 32 Physical Sciences • 119 Social Sciences • 29 Business and Management • 35 Humanities • 102 Medicine and Life Sciences • 57 Clinical Medicine • c.250,000 articles • 3350 spreadsheets • 1GB of data
UKSG Usage Factor Project • Brief background • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
UKSG Usage Factor Project • Brief background • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we have collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
The calculation Usage Factor = Total usage over period ‘x’ of articles published during period ‘y’ Total articles published during period ‘y’ • ‘x’ is the usage period • ‘y’ is the publication period
JUF variables to be tested Subject comparisons • Broad subjects: • Physical Sciences • Medicine and Life Sciences • Social Sciences • Humanities • Engineering • Narrow subjects • Business and Management • Clinical Medicine
UKSG Usage Factor Project • Brief background • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
UKSG Usage Factor Project • Brief background • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we have collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
Issues and challenges • Article metadata resides in multiple databases • Key article metadata needed for JUF not included in usage log records • Need to map and merge different records • Lack of standards for key schemas and practices • Different publisher policies on article version labelling and availability • Multiple schemas for “article type” – typically journal rather than publisher specific • Some difficulties retrieving detailed historical usage records by article, especially when data straddled transfer between systems
UKSG Usage Factor Project • Project rationale • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
UKSG Usage Factor Project • Brief background • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we have collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
Results • Content Type • In social sciences JUFs were higher for non-article content • In medicine and life sciences JUFs were higher for article content • In humanities, physical sciences, and business & management, JUF differences between article and non-article content were not significant
Results • Article Version • In physical sciences the JUF was significantly (sometimes dramatically) lower when calculations were confined to the Version of Record • In all other subjects the JUF was significantly higher when calculations were confined to the Version of Record
Results • JUF and Impact Factor • Little correlation apart from the Nature branded titles • Some titles with no or very low impact factors have very high JUFs
How stable are journal rankings based on impact factor over time?ISI impact factorsMedical and life science journals (n=36) How stable are journal rankings based on the classic ISI impact factor? The previously mentioned paper by Amin and Mabe shows that impact factors may fluctuate year on year by as much as plus or minus 40 per cent and that this is in part a function of journal size. As a point for comparing the properties of the journal usage factor, we start this section by looking at changes in rankings among 36 medical titles over three years. Key findings Journal rankings based on ISI impact factor are pretty consistent over the period 2006-2008. Rank order correlations 2008 vs 2007: Spearman’s rho = 0.973, p < 0.01 2007 vs 2006: Spearman’s rho = 0.971, p < 0.01 2008 vs 2006: Spearman’s rho = 0.959, p < 0.0 How to read this graphic We start by putting the journals into ranked order by ISI impact factor for each of the three years 2006-2009, These charts show how these ranked orderings compare across different years. For example, the middle chart in the right hand column compares 2007 and 2008. If there were no changes in journal ranking, all the journals would like on the diagonal. Journals below the diagonal have fallen down the order.
How stable are journal rankings based on usage factor over time?UKSG usage factorsMedical and life science journals (n=48) The journal usage factors reported here are based on a single publication year with use being measured in months 1-24. Key findings Rankings based on a journal usage factor (based on one publication year and use in months 1-24), we find reasonable stability with high and statistically significant correlations. Rank order correlation 2008 vs 2007: Spearman’s rho = 0.886, p < 0.01 2007 vs 2006: Spearman’s rho = 0.862, p < 0.01 2008 vs 2006: Spearman’s rho = 0.755, p < 0.01 The correlations are smaller than for the impact factors in the previous slide, but they are still high. This analysis shows that usage factors are more volatile than impact factors and any journal rankings based on them will show greater churn year on year. But, broadly speaking, they do a similar job.
Recommendations – the metric • The most promising JUF metric for further testing will be based on:- • All content types except standing matter • Non-article matter is published for a purpose and its usage forms part of the usage of the journal as a whole • Item type control is difficult to manage • All versions published • For simplicity and completeness • Publication period: 2 years • For a greater “smoothing” effect on occasional unexplained peaks and troughs in usage • To reduce the effect of pre-”Version of Record” publication • Usage period: 2 years contemporaneous with publication period • To capture peak post-publication usage • To keep the metric as current as possible
Recommendations - infrastructure • Development of systems to automate the extraction and collation of data needed for JUF calculation is essential if calculation of this metric is to become routine • Development of an agreed standard for content item types, to which journal specific item types would be mapped, is desirable as it would allow for greater sophistication in JUF calculation • Development or adoption of a simple subject taxonomy to which journal titles would be assigned by their publishers
Recommendations - infrastructure • Publishers should adopt standard “article version” definitions based on NISO recommendations • But no specific recommendations for the labelling or making available of these versions
UKSG Usage Factor Project • Project rationale • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
UKSG Usage Factor Project • Brief background • Issues addressed before data collection and analysis • Collecting and analysing the data • What data we have collected • Methodology • Issues and challenges • Results and Recommendations • Next steps
Next steps • Progress Report summarising Phase 1 and 2 will be published in Q1 2011 • Meanwhile further analysis of the usage data collected in Phase 2 is being undertaken by CIBER at UCL
Patterns of use across timeMonthly use of all items published in 2006Engineering journals (n=21) About this analysis In order to have an informed discussion about the optimal length of the time window to use to record downloads for the usage factor, we need to understand how items are used over time. In this and the following analyses, we take all items published in 2006 and look at their monthly pattern of use over the subsequent three years. Ideally, we need a longer time series, but this is all we have. The trend line, which admittedly does not give an excellent fit to the data, suggests that aggregate usage of 2006 engineering items will trickle to near zero (i.e. become `asymptotic’) at around 45 months after publication. The life span of original research articles and review papers is likely to be longer, as the `all items’ approach used here will contain much relatively ephemeral material such as editorial material and rapid communications. How to read this graphic This chart shows the number of downloads each month with a trend line.
Patterns of use across timeMonthly use of all items published in 2006Humanities journals (n=24) About this slide Humanities items follow a generally similar pattern to engineering but with a shorter and more delayed peak. The trend line, which offers a reasonable fit to the data, suggests that aggregate usage of 2006 humanities items will trickle to near zero (i.e. become `asymptotic’) at around 48 months after publication.
Patterns of use across timeMonthly use of all items published in 2006Physical sciences journals (n=3) About this slide The monthly pattern of use for physical sciences items is very different from the other broad subjects in this study. There is a very sharp initial peak followed by continuing and steady interest in items in the period months 14-36 [caution: we only have three journals]. There is not enough data to justify calculating an end point for physical sciences articles, but the well-fitting trend line suggests it may have been reached at or just after 36 months.
Patterns of use across timeMonthly use of all items published in 2006Social sciences journals (n=115) About this slide The pattern in the social sciences is broadly similar to that for humanities items. The trend line, which offers a reasonable fit to the data, suggests that aggregate usage of 2006 social sciences items will trickle to near zero (i.e. become `asymptotic’) at around 47 months after publication.
Patterns of use across timeMonthly use of all items published in 2006Medical and life sciences journals (n=47) About this slide Monthly usage in the medical and life sciences shows an interesting double peak: a very immediate one in the first few months and another from month 12, which may well be due to a delayed open access and / or citation effect. The trend line, which offers a good fit to the data, suggests that aggregate usage of 2006 medical and life science items will trickle to near zero (i.e. become `asymptotic’) at around 40 months after publication.
Which time window is `best’?Cumulative use of all items published in 2006 by usage time windowComparison by broad subject area % of life time use
Patterns of use across timeEarly conclusions We ideally need a longer time series to be sure but it appears that a good working estimate for the useful lifetime of all items in all versions is about four years. The longevity of original research articles and review papers is likely to be longer than this, and possibly more highly differentiated between subjects but if all items are used, then this seems a reasonable position to take. All the subjects show a peak roughly between months 6 and 12 and a broadly similar (and steady) pattern of cumulative item use in years 1-3. Tentatively, a time window based on months 1-24 would seem to be the most appropriate: capturing information both about the peak and the subsequent steady state growth. If the estimations of lifetime use are accurate, roughly four years for all items, then it would appear that a 1-24 month window will capture a substantial proportion of lifetime use (all items, all versions), probably of the order of 60 per cent as a global figure. A 1-12 month window will capture around 30 per cent of lifetime use.
Patterns of use across itemsA problem with averages: the Bill Gates problemMedical and life sciences journals About this slide The histogram shows the frequency with which individual items are downloaded. The publication year is 2008 and this chart shows use in month 2. The pattern is lognormal: many items are used rarely, a few items used many times. An issue using the arithmetic mean to summarise this data is that the few heavily used items will exert a major effect: the mean will be a lot higher than other averages such as the mode or median. The International Mathematical Union has criticised ISI’s citation impact factor for this reason. Mean = 335.5 Many articles used a few times A few articles used many times 49
Provisional conclusionsExploratory data analysis of UKSG usage factorsMedical and life science journals (n=48) Publication year 2008, usage in months 1-24 Unadjusted Cox data, mean and 95% confidence intervals 354 539 Articles, final versions attract significantly higher use than all items, all versions ANOVA F=6.1,p < 0.01 466 653 Publication year 2008, usage in months 1-24 Log transformed CIBER data, mean and 95% confidence intervals 303 407 Articles, final versions attract significantly higher use than all items, all versions ANOVA F=6.8,p < 0.01 438 554