210 likes | 295 Views
Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management. Constance Malpas Program Officer RLG Programs. This presentation . Summarizes recent data-mining efforts by OCLC Programs and Research System-wide sample (Summer 2007 – Spring 2008)
E N D
Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs
This presentation • Summarizes recent data-mining efforts by OCLC Programs and Research • System-wide sample (Summer 2007 – Spring 2008) • ARL unique print books (Autumn 2007) • Suggests implications for collection managers • Outlines next steps for RLG Programs • An opportunity to discuss what additional evidence and analysis is needed
What we mean by ‘last copy’ • Monographic title uniquely-held by a single WorldCat contributor • Cf. ‘single copy’ repositories, where ‘last copy’ is relative to local/group holdings • May represent a last manifestation, expression or work • Bibliographic records describe manifestations, not copies; unique manifestations are the point of departure for analysis • Some are intrinsically unique; others are rendered unique by erosion of system-wide holdings • Historical data may help document increased copy or work-level availability, but weren’t included in the studies presented here
Distribution of wealth: ARL unique books 20% of the population holds >75% of unique titles A classic Pareto distribution institutional excellence? (or) a “network effect?” Median institutional holdings = 19K titles N = 6.95 M titles
Why focus on uniquely-held titles? • “Scarcity is common” • limited redundancy in holdings = limited preservation guarantee, limited opportunity to create economies of scale by aggregating supply • Research institutions bear the brunt of responsibility for long-term preservation and access of unique titles • Academic and independent research libraries hold up to 70% of aggregate unique print book collection • Continuing costs of managing (storing, providing access to) print collections are high; use is generally declining • Space pressure on physical plant (on-campus, remote) is high; understanding distribution and characteristics of unique holdings can inform decisions about disposition of physical collection • Increased attention to stewardship of special collections • ARL SCWG, CLIR, LC Task Force on Bibliographic Control – new attention to what constitutes ‘special’ collections, appropriate standards of care, modes and metrics of use
Challenges • Identification requires group / network view of holdings WorldCat provides a reasonably proxy for system-wide collection • Some materials (MSS, theses and dissertations, etc.) are intrinsically unique; not all can be algorithmically identified in MARC records hybrid approach combines computational and manual analysis of bibliographic data • Sparse bibliographic records impede efficient work/title matching, may introduce spurious measure of uniqueness external sources (including Google) sometimes helpful in filling gaps • Non-English titles (especially transliterated non-roman scripts) are especially difficult to match we resisted the temptation to exclude these
Study I: System-wide Sampling • 250 randomly selected, uniquely-held titles • Limited to printed books (including theses) published before 2005 • English-language cataloging only • Iterative re-sampling required to fill gaps • Independently reviewed by three project staff • Level of uniqueness • Material type • Results periodically collated for group analysis • Compare results of individual analysis for consistency • Seek consensus on difficult cases – relatively few of these • Re-sample as necessary to fill gaps • White paper anticipated March 2008
Study II: ARL uniquely-held books • Ad hoc analysis by RLG Programs, prompted by IMLS Connecting to Collections grant announcement • How might the existing evidence base be used to focus regional preservation investments? • Based on January 2007 snapshot of WorldCat database: 13M records for titles (6.95M print books) uniquely held by ARL institutions; 300+ OCLC symbols; 123 institutions • Iterative analysis examined relative impact of theses/dissertations and recent imprints on system-wide uniqueness; regional and institutional distribution of holdings • Findings shared with ARL Special Collections Working Group (October 2007) and selected RLG partner institutions (UC; CIC; ReCAP; Harvard; ASU; NYU) • Heritage Preservation willing to share Heritage Health survey data for cross-tabulation on as-needed basis
Limitations • Current studies limited to printed books – excludes serials, special collections; only a partial measure of uniqueness in system-wide collection • Incomplete representation of world book collection; for non-English titles especially, uniqueness of North American holdings is only relative • Cataloging backlogs of up to 5 years mean that holdings for recent acquisitions are imperfectly reflected • Incomplete coverage of rare books and special collections prior to (ongoing) integration of RLG Union Catalog
Our findings – distribution of unique titles • Research and academic libraries hold >70% of aggregate unique print book collection • while value and utility of these holdings may be widely distributed across the library community, holdings are concentrated at institutions with a research / teaching / learning mandate • limited data on aggregate use, sources of demand • Institutional distribution of unique holdings is highly skewed, with a handful of libraries holding a majority share of collective assets • ARL unique print book holdings range from 400 – 600K titles per institution; median holdings = 19K titles • generally, institutions with large collections hold more unique materials – but absolute size of collection is not an indicator of relative uniqueness
Based on a randomly selected sample of 250 uniquely-held print book titles in WorldCat (Jan. 2007)
National libraries and institutions with deep collections and an aggressive approach to collecting and cataloging new monographs – LC, Harvard, Libraries & Archives Canada – have an exceptional range of unique holdings Unique Print Books in ARL Institutions CRL’s focus on theses and dissertations is evident – most uniqueness is attributable to these holdings Institutions with younger collections, actively seeking to increase scope of coverage - NCSU, Temple – are building uniqueness in new titles
Content-type Distributions: CRL and ARL Intrinsically unique content, “only copies” May include “first copies” in cataloging queue; uniqueness subject to rapid erosion
Our findings – levels of uniqueness • ~60% of titles represent unique works • Ex: Report and recommendation … on a proposed loan … equivalent to US$70 million to the … Islamic Republic of Pakistan for a power plant efficiency improvement project (1987) – World Bank report held by George Washington University • ~15% of titles represent unique manifestations • Ex. Gallipolis … an account of the French five hundred and of the town they established … compiled by Workers of the Writers' program of the Work projects administration (1940) – microform pamphlet held by Yale University; related manifestations at 40 libraries • ~5% of titles represent unique expressions • Ex: E.J. Luck. A pedigree of the families Luck, Lock and Lee (1908) – book held by Masssanutten Regional Library, VA; similar title (Luck, Lock) by same author, pub’d in 1900, held at LC • ~20% of titles not unambiguously unique: duplicate or near-duplicate records can be found in WorldCat • Ex: K. Kimura. Edo no akebono (1956) – book held by Harvard Yenching; apparent duplicate (cataloged with original scripts) held by Waseda, Yale
Our findings – content characterization Material types • ~35% are books (>50pp) • most appear to be non-fiction titles, less likely to have additional manifestations • ~20% theses and dissertations • many at Master’s level – unlikely to be held beyond issuing institution • ~15% government documents • mostly federal and state, may be duplicated in depositories • ~10% pamphlets • unique content, but rarely useful in isolation • ~10% analytics; single articles or issues bound as a separate volume • non-unique content • <5% early imprints • lost treasures? • Small numbers of by-laws, scripts, legal briefs, minutes, etc.
Implications • Institutions with significant unique holdings may benefit from ‘splitting the difference’ between unique works and manifestations unique manifestations and analytics should be judged with an eye to provenance history; unless they contribute to local distinctiveness, immediate action may not be warranted • A preliminary sort by material type may help guide local decision-making regarding the physical disposition of unique holdings pamphlets and technical reports may be candidates for cataloging enhancement and storage transfer; books may be short-listed for digitization and/or transfer to special collections • Institutions with smaller unique print book collections may benefit from collective action to aggregate supply (through effective disclosure) and demand (through special resource-sharing and digitization initiatives) around specific topical and disciplinary interests local collections gain in significance when presented in context with related holdings
Recommendations Adopt a nuanced understanding of ‘relative uniqueness’ when assessing local holdings • Unique manifestations may not represent unique intellectual content, but may have other value • As artifacts special collections • As a networked resource increased availability • Unique works may gain relevance and value when presented as part of a larger disciplinary or topical collection • Theses and dissertations may benefit from special discovery tools, integration in local scholarly communications initiatives • Pamphlets and technical reports may be virtually aggregated for specific communities of use • Maximize disclosure of unique holdings to increase their impact and value • Focus on use and utility of unique holdings to ensure long-term preservation, enduring value to parent institution
What’s Next . . . • Holdings validation study will examine a sample of scarcely-held (<5 copies) US imprints in North-American research libraries • Compare current WorldCat holdings to historical holdings – looking for signs of collection erosion; elimination of local backlogs (diminishing uniqueness) • Compare local holdings to current WorldCat holdings – location changes/storage transfers, withdrawals • Assess impact of local preservation actions on system-wide holdings (availability, condition) and potential value of ‘full disclosure’ • Collaborative effort with RLG partner institutions anticipated Spring/Summer 2008
Some closing observations Opportunities • Large research libraries hold a wealth of unique materials – long tail resources with broad potential audience • Aggregated bibliographic data supports programmatic analysis and enrichment – work-level clustering, identification of duplicates • Largest institutions, with enduring commitments to retention and access, hold majority of potential ‘at risk’ titles Challenges • Libraries ill-equipped to measure potential demand for unique holdings • Technical and social infrastructure for aggregating supply is lacking • University presses are potential distribution partners, but alliances are weak
Questions, Comments? • ‘Managing the Collective Collection’ work agenda • Data-mining for management intelligence • Shared print collections http://www.oclc.org/programs/ourwork/collectivecoll • Midwinter RLG Update Session 1:30-3:30 Marriott 302-304 • Contact: Constance Malpas Program Officer malpasc@oclc.org
Median institutional holdings =96k unique titles N=5.9M titles