150 likes | 301 Views
Bibliographic Metadata and HathiTrust. ALCTS CaMMS Catalog Management Interest Group Meeting American Library Association MidWinter Convention Philadelphia, Pennsylvania, January 25, 2014 Jon Rothman, Head, Library Systems Office, University of Michigan jrothman@umich.edu.
E N D
Bibliographic Metadata and HathiTrust ALCTS CaMMS Catalog Management Interest Group Meeting American Library Association MidWinter Convention Philadelphia, Pennsylvania, January 25, 2014 Jon Rothman, Head, Library Systems Office, University of Michigan jrothman@umich.edu
HathiTrust Mission To contribute to the common good by collecting, organizing, preserving, communicating, and sharingthe record of human knowledge.
HathiTrust Background • Launched in 2008 by the libraries of the CIC Committee on Institutional Cooperation (CIC) and the University of California System. • Initial focus on digitized book and journal content • 10,922,113 total volumes • 3,563,589 public domain (~33%) • Currently 91 partner institutions and continuing to grow.
Partnership The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Syracuse University Temple University Texas A&M University Tufts University Universidad Complutense de Madrid University of Alabama University of Alberta University of Arizona University of British Columbia University of Calgary University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Connecticut University of Delaware University of Florida University of Houston University of Illinois University of Illinois at Chicago The University of Iowa University of Kansas University of Maryland University of Massachusetts, Amherst University of Miami University of Michigan University of Minnesota University of Missouri University of Nebraska- Lincoln The University of North Carolina at Chapel Hill University of Notre Dame University of Oklahoma University of Pennsylvania University of Pittsburgh University of Queensland University of Tennessee, Knoxville University of Utah University of Vermont University of Virginia University of Washington University of Wisconsin- Madison Utah State University Vanderbilt University Virginia Tech Wake Forest University Washington University Yale University Library Allegheny College Arizona State University Baylor University Boston College Boston University Brandeis University Brown University California Digital Library Carnegie Mellon University Colby College Columbia University Cornell University Dartmouth College Duke University Emory University Florida State University Getty Research Institute Harvard University Library Indiana University Iowa State University Johns Hopkins University Kansas State University Lafayette College Library of Congress Massachusetts Institute of Technology McGill University` Michigan State University New York Public Library New York University North Carolina Central University North Carolina State University Northwestern University
Where does HathiTrust’s bibliographic metadata come from? • Bibliographic metadata is provided by depositors of digital content. • Metadata must be supplied to HathiTrust before ingest of digital content can occur • The metadata is used in several ways, including • To act as a manifest of the materials being deposited. • To identify and track records to their contributor. • To help in making an initial rights determination about each volume.
Minimal metadata specifications for deposited records • Valid MARC binary or MARCXML structure • Valid leader and 008 • 245 $$a (or $$k where appropriate) • A 955 field describing a single item • Item identifier (usually barcode) • Item description (enumeration/chronology) for multi-volume works • OCLC Number (strongly preferred)
Duplicate detection • Simple identifier match at bibliographic level, using OCLC numbers. • OCNs most ubiquitous and unique identifiers in the records, but there are issues… • Records without OCNs • Some partners didn’t have OCNs in any of their records • Some have had them in many, but not all, of their records • Differences in OCN location, prefixes, etc. in records • Different OCNs for same item.
HathiTrust metadata management • Where • HathiTrustbibliographic metadata was managed in the University of Michigan’s Aleph LMS from 2008 until… • Zephir, a dedicated HathiTrust metadata management system developed by California Digital Library, launched in production in early December, 2013. • Underlying principle • Records supplied to HathiTrust are not considered definitive. • Definitive record lives in the source institution’s own system and/or Worldcat.
Zephir Functionality • Keeps all versions of records received from depositors. • OCLC number still used for duplicate detection • Records are clustered rather than merged. • A weighting algorithm determines best bibliographic record in each cluster. – selected record, with item-level data for all ingested items attached to that cluster are selected for output. • Provides a daily output of new/changed records. Records where none of the associated digital items have been ingestedyet are not included.
Record correction and update • General policy is not to correct or update the content of contributors’ records. • In most cases, contributors are asked to correct and re-submit records with observed metadata errors or issues. • When it’s necessary for a correction to happen quickly: • A corrected “shadow record” is created in Zephir-- temporarily takes the place of the contributor record in outputs. • Contributor is asked to submit a corrected record. When corrected record is received, the shadow record is removed.
Contributor Bibliographic Records HathiTrust Metadata Management (Zephir) Identifiers of ingested objects Zephir daily export Metadata about newly-loaded records HathITrust Access Processing HathiTrust Ingest Framework (Feed) Rights DB OAI Hathifiles HathiTrust Catalog Digital Object Repository Individual library catalogs, etc. Bib API Catalog + Full Text WorldCat
Contributor Bibliographic Records HathiTrust Metadata Management (Zephir) Identifiers of ingested objects Zephir daily export Metadata about newly-loaded records HathITrust Access Processing HathiTrust Ingest Framework (Feed) Rights DB OAI Hathifiles HathiTrust Catalog Digital Object Repository Individual library catalogs, etc. Bib API Catalog + Full Text WorldCat
Contributor Bibliographic Records HathiTrust Metadata Management (Zephir) Identifiers of ingested objects Zephir daily export Metadata about newly-loaded records HathITrust Access Processing HathiTrust Ingest Framework (Feed) Rights DB OAI Hathifiles HathiTrust Catalog Digital Object Repository Individual library catalogs, etc. Bib API Catalog + Full Text WorldCat
Contributor Bibliographic Records HathiTrust Metadata Management (Zephir) Identifiers of ingested objects Zephir daily export Metadata about newly-loaded records HathITrust Access Processing HathiTrust Ingest Framework (Feed) Rights DB OAI Hathifiles HathiTrust Catalog Digital Object Repository Individual library catalogs, etc. Bib API Catalog + Full Text WorldCat
Bibliographic Metadata and HathiTrust ALCTS CaMMS Catalog Management Interest Group Meeting American Library Association MidWinter Convention Philadelphia, Pennsylvania, January 25, 2014 Jon Rothman, Head, Library Systems Office, University of Michigan jrothman@umich.edu