1 / 21

Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management

This presentation summarizes recent data-mining efforts to measure uniqueness in system-wide book holdings. It discusses the distribution of uniquely-held titles and suggests implications for collection managers. The presentation also outlines the next steps for RLG Programs and invites discussion on additional evidence and analysis needed.

tristans
Download Presentation

Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management Constance Malpas Program Officer RLG Programs

  2. This presentation • Summarizes recent data-mining efforts by OCLC Programs and Research • System-wide sample (Summer 2007 – Spring 2008) • ARL unique print books (Autumn 2007) • Suggests implications for collection managers • Outlines next steps for RLG Programs • An opportunity to discuss what additional evidence and analysis is needed

  3. What we mean by ‘last copy’ • Monographic title uniquely-held by a single WorldCat contributor • Cf. ‘single copy’ repositories, where ‘last copy’ is relative to local/group holdings • May represent a last manifestation, expression or work • Bibliographic records describe manifestations, not copies; unique manifestations are the point of departure for analysis • Some are intrinsically unique; others are rendered unique by erosion of system-wide holdings • Historical data may help document increased copy or work-level availability, but weren’t included in the studies presented here

  4. Distribution of wealth: ARL unique books 20% of the population holds >75% of unique titles A classic Pareto distribution institutional excellence? (or) a “network effect?” Median institutional holdings = 19K titles N = 6.95 M titles

  5. Why focus on uniquely-held titles? • “Scarcity is common” • limited redundancy in holdings = limited preservation guarantee, limited opportunity to create economies of scale by aggregating supply • Research institutions bear the brunt of responsibility for long-term preservation and access of unique titles • Academic and independent research libraries hold up to 70% of aggregate unique print book collection • Continuing costs of managing (storing, providing access to) print collections are high; use is generally declining • Space pressure on physical plant (on-campus, remote) is high; understanding distribution and characteristics of unique holdings can inform decisions about disposition of physical collection • Increased attention to stewardship of special collections • ARL SCWG, CLIR, LC Task Force on Bibliographic Control – new attention to what constitutes ‘special’ collections, appropriate standards of care, modes and metrics of use

  6. Challenges • Identification requires group / network view of holdings  WorldCat provides a reasonably proxy for system-wide collection • Some materials (MSS, theses and dissertations, etc.) are intrinsically unique; not all can be algorithmically identified in MARC records  hybrid approach combines computational and manual analysis of bibliographic data • Sparse bibliographic records impede efficient work/title matching, may introduce spurious measure of uniqueness  external sources (including Google) sometimes helpful in filling gaps • Non-English titles (especially transliterated non-roman scripts) are especially difficult to match  we resisted the temptation to exclude these

  7. Study I: System-wide Sampling • 250 randomly selected, uniquely-held titles • Limited to printed books (including theses) published before 2005 • English-language cataloging only • Iterative re-sampling required to fill gaps • Independently reviewed by three project staff • Level of uniqueness • Material type • Results periodically collated for group analysis • Compare results of individual analysis for consistency • Seek consensus on difficult cases – relatively few of these • Re-sample as necessary to fill gaps • White paper anticipated March 2008

  8. Study II: ARL uniquely-held books • Ad hoc analysis by RLG Programs, prompted by IMLS Connecting to Collections grant announcement • How might the existing evidence base be used to focus regional preservation investments? • Based on January 2007 snapshot of WorldCat database: 13M records for titles (6.95M print books) uniquely held by ARL institutions; 300+ OCLC symbols; 123 institutions • Iterative analysis examined relative impact of theses/dissertations and recent imprints on system-wide uniqueness; regional and institutional distribution of holdings • Findings shared with ARL Special Collections Working Group (October 2007) and selected RLG partner institutions (UC; CIC; ReCAP; Harvard; ASU; NYU) • Heritage Preservation willing to share Heritage Health survey data for cross-tabulation on as-needed basis

  9. Limitations • Current studies limited to printed books – excludes serials, special collections; only a partial measure of uniqueness in system-wide collection • Incomplete representation of world book collection; for non-English titles especially, uniqueness of North American holdings is only relative • Cataloging backlogs of up to 5 years mean that holdings for recent acquisitions are imperfectly reflected • Incomplete coverage of rare books and special collections prior to (ongoing) integration of RLG Union Catalog

  10. Our findings – distribution of unique titles • Research and academic libraries hold >70% of aggregate unique print book collection • while value and utility of these holdings may be widely distributed across the library community, holdings are concentrated at institutions with a research / teaching / learning mandate • limited data on aggregate use, sources of demand • Institutional distribution of unique holdings is highly skewed, with a handful of libraries holding a majority share of collective assets • ARL unique print book holdings range from 400 – 600K titles per institution; median holdings = 19K titles • generally, institutions with large collections hold more unique materials – but absolute size of collection is not an indicator of relative uniqueness

  11. Based on a randomly selected sample of 250 uniquely-held print book titles in WorldCat (Jan. 2007)

  12. National libraries and institutions with deep collections and an aggressive approach to collecting and cataloging new monographs – LC, Harvard, Libraries & Archives Canada – have an exceptional range of unique holdings Unique Print Books in ARL Institutions CRL’s focus on theses and dissertations is evident – most uniqueness is attributable to these holdings Institutions with younger collections, actively seeking to increase scope of coverage - NCSU, Temple – are building uniqueness in new titles

  13. Content-type Distributions: CRL and ARL Intrinsically unique content, “only copies” May include “first copies” in cataloging queue; uniqueness subject to rapid erosion

  14. Our findings – levels of uniqueness • ~60% of titles represent unique works • Ex: Report and recommendation … on a proposed loan … equivalent to US$70 million to the … Islamic Republic of Pakistan for a power plant efficiency improvement project (1987) – World Bank report held by George Washington University • ~15% of titles represent unique manifestations • Ex. Gallipolis … an account of the French five hundred and of the town they established … compiled by Workers of the Writers' program of the Work projects administration (1940) – microform pamphlet held by Yale University; related manifestations at 40 libraries • ~5% of titles represent unique expressions • Ex: E.J. Luck. A pedigree of the families Luck, Lock and Lee (1908) – book held by Masssanutten Regional Library, VA; similar title (Luck, Lock) by same author, pub’d in 1900, held at LC • ~20% of titles not unambiguously unique: duplicate or near-duplicate records can be found in WorldCat • Ex: K. Kimura. Edo no akebono (1956) – book held by Harvard Yenching; apparent duplicate (cataloged with original scripts) held by Waseda, Yale

  15. Our findings – content characterization Material types • ~35% are books (>50pp) • most appear to be non-fiction titles, less likely to have additional manifestations • ~20% theses and dissertations • many at Master’s level – unlikely to be held beyond issuing institution • ~15% government documents • mostly federal and state, may be duplicated in depositories • ~10% pamphlets • unique content, but rarely useful in isolation • ~10% analytics; single articles or issues bound as a separate volume • non-unique content • <5% early imprints • lost treasures? • Small numbers of by-laws, scripts, legal briefs, minutes, etc.

  16. Implications • Institutions with significant unique holdings may benefit from ‘splitting the difference’ between unique works and manifestations unique manifestations and analytics should be judged with an eye to provenance history; unless they contribute to local distinctiveness, immediate action may not be warranted • A preliminary sort by material type may help guide local decision-making regarding the physical disposition of unique holdings pamphlets and technical reports may be candidates for cataloging enhancement and storage transfer; books may be short-listed for digitization and/or transfer to special collections • Institutions with smaller unique print book collections may benefit from collective action to aggregate supply (through effective disclosure) and demand (through special resource-sharing and digitization initiatives) around specific topical and disciplinary interests local collections gain in significance when presented in context with related holdings

  17. Recommendations Adopt a nuanced understanding of ‘relative uniqueness’ when assessing local holdings • Unique manifestations may not represent unique intellectual content, but may have other value • As artifacts  special collections • As a networked resource  increased availability • Unique works may gain relevance and value when presented as part of a larger disciplinary or topical collection • Theses and dissertations may benefit from special discovery tools, integration in local scholarly communications initiatives • Pamphlets and technical reports may be virtually aggregated for specific communities of use • Maximize disclosure of unique holdings to increase their impact and value • Focus on use and utility of unique holdings to ensure long-term preservation, enduring value to parent institution

  18. What’s Next . . . • Holdings validation study will examine a sample of scarcely-held (<5 copies) US imprints in North-American research libraries • Compare current WorldCat holdings to historical holdings – looking for signs of collection erosion; elimination of local backlogs (diminishing uniqueness) • Compare local holdings to current WorldCat holdings – location changes/storage transfers, withdrawals • Assess impact of local preservation actions on system-wide holdings (availability, condition) and potential value of ‘full disclosure’ • Collaborative effort with RLG partner institutions anticipated Spring/Summer 2008

  19. Some closing observations Opportunities • Large research libraries hold a wealth of unique materials – long tail resources with broad potential audience • Aggregated bibliographic data supports programmatic analysis and enrichment – work-level clustering, identification of duplicates • Largest institutions, with enduring commitments to retention and access, hold majority of potential ‘at risk’ titles Challenges • Libraries ill-equipped to measure potential demand for unique holdings • Technical and social infrastructure for aggregating supply is lacking • University presses are potential distribution partners, but alliances are weak

  20. Questions, Comments? • ‘Managing the Collective Collection’ work agenda • Data-mining for management intelligence • Shared print collections http://www.oclc.org/programs/ourwork/collectivecoll • Midwinter RLG Update Session 1:30-3:30 Marriott 302-304 • Contact: Constance Malpas Program Officer malpasc@oclc.org

  21. Median institutional holdings =96k unique titles N=5.9M titles

More Related