270 likes | 551 Views
Data Mining Library Collection Silos: Print Books and E-books in Library Collections. Lynn Silipigni Connaway Ed O’Neill Chandra Prabha Brian Lavoie. Collection Assessment. Why assess collections? Provide data for member libraries for decision-making Description of the collection
E N D
Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’NeillChandra PrabhaBrian Lavoie
Collection Assessment • Why assess collections? • Provide data for member libraries for decision-making • Description of the collection • Identify specific subject areas • Determine collection age • Rate of growth • Strengths and weakness • Overlap/gap analysis • Identify last copy • Useful information • Outside funding • Library collection comparisons • Remote storage decisions • Collection development and management • Identify role of non- ARL libraries
WorldCat as a Collection • World’s largest bibliographic database • July 1, 2003 = 50 million+ records • 1 billion holdings • Ideal source for data-mining • Characteristics of WorldCat • Age • Subject, using NATC • Holdings by type of library • ARL • Academic, non-ARL • Public • School • Special
WorldCat as a Collection • Use of MARC data elements in WorldCat • Types of materials • Library holdings to determine audience levels • Collection assessment and collection use • Unique titles • Analyze and compare aggregate holdings for libraries • Identify print books (p-books) and electronic books (e-books)
Study Objective • Digital materials constitute increasing proportion of library collections • Effective strategies for integrating print and digital materials within a library collection • Eliminate redundancies • Meet user expectations • Data-mining increasingly important to support collection management decisions • WorldCat • World’s largest bibliographic database • Ideal as source for data-mining • Data-mine WorldCat in order to examine characteristics of p-books and e-books
Rationale • Collection management • Development • Cooperation • Deselection • Preservation • Space allocation and management • Meet user expectations • Services for off-site users • Migration from print to digital • Convenient access • 24/7 access • Desk-top delivery
Scope • WorldCat • July 1, 2003 = 50 million+ records • 1 billion holdings • Digital Items • Books • Print (p-book) • Digital (e-book)
Strategy • Identify digital items • Identify digital items with at least one other manifestation in WorldCat • FRBRize database • Work • Distinct intellectual or artistic expression • Cluster works in WorldCat • Manifestation • Physical embodiment of a work • Identify digital items with p-book equivalents • Assumption • If digital items have p-book equivalents, then digital items are e-books • Identify publishers and publication dates
Need to Determine • Comparison of p-books and e-books • What is a book? • What is a p-book? • What is an e-book? • What is a digital item? • How do we extend p-book criteria to digital world?
What is a Digital Item? • Working definition of digital item • Computer file • OR Electronic resource • OR Appropriate 856 field • Indicates electronic location or access
What is a P-book? • No consensus for definition of a book • Text (type = a) and monograph (bib level = m) • Broadsides? • Pamphlets? • Government documents? • Children’s books? • Microforms? • Authoritative Definitions • UNESCO • Nonperiodical literary publication consisting of > 49 pages, covers excluded • ANSI • Publications consisting of > 49 pages • Hard covers • US Postal Service (publication) • Publications > 24 pages
A P-book IS: • Based on UNESCO definition • Working definition of a p-book • Printed on paper (excludes microform) • Language material • Monograph • Physical description • Form of item = regular or large print • Title does not include a GMD • Substantial length (> 49 pages; > 25 to include juvenile titles) • Excludes manuscripts (dissertations and theses)
What is an E-book? • Difficult to define e-book • Digital version of p-book (straightforward) • New conceptual views of a book in digital environment • Assumption • P-book is well-defined • If digital item has manifestation as a p- book, then digital item must also be a book • If p-book has digital equivalent or vice-versa, ignore e-book that has no print equivalents
An E-book IS: • E-Book = Electronic (Digital) + Book • Definition of e-Book: • Digital equivalents of p-books • New conceptual definitions of books in digital environment
WorldCat Record Analysis • P-book records = 24,048,235 (48% of WC) • Digital item records = 795,630 (15% of WC) • Web sites • Collections of interlinked, Web-accessible materials residing at a single location on the Internet • Documents • Various forms of electronic documents • E-books with no p-book equivalents and no minimum page requirements • Book chapters • Broadsides • Brochures • Pamphlets • Reprints • E-books with p-book equivalents = 76,375 (1.5% of WC)
WorldCat Record Analysis • Digital item records (continued) • Interactive learning objects • Computer programs offering self-contained, interactive tutorial or educational experience • Software • Computer programs for creating and manipulating information • Serials • Journals • Proceedings • Images • Theses • Other (2 records) • Computer game • Raw data file
Publication Dates of Digital Items With P-Book Equivalents in WorldCat
Publishers of Digital Items With P-Book Equivalents in WorldCat • Approximately 15,000 unique publishers • Approximately 150 publishers with > 25 records • Top 10 publishers • Institute of Electrical and Electronic Engineers (IEEE) • National Bureau of Economic Research • US Government Printing Office • Springer • Inter-University Consortium for Political and Social Research • PowerKids Press • University of Virginia Library • MIT Press • Microsoft • Broderbund Software and Books
Discussion of Analysis • Small number of • E-books with p-book equivalents • Publishers with > 25 records for e-books with p-book equivalents • Recent publication dates for e-books with p-book equivalents • More Web sites than documents or reprints • Difficult to identify and categorize digital items • Inconsistent cataloging policies and practices for digital items • Inconsistent definitions for types of digital items
Future Research • Establish accepted criteria for defining an e-book independent of p-books • Identify and compare type of library holdings and NATC subjects for p-books and e-books • Identify electronic collection silos • Continue to collect these data to compare for trends • Identify types of content/materials that are better suited for either print or digital environment
Questions and Discussion connawal@oclc.org oneill@oclc.org