230 likes | 405 Views
A Library Science Perspective on Digitization. Bryan Heidorn University of Arizona. Library-Museum Parallels. Intellectual Property Rights Physical /Digital Objects Sharing Descriptive Metadata Formats Preservation Metadata Transport Metadata Formats Communication Protocols (no so much)
E N D
A Library Science Perspective on Digitization Bryan HeidornUniversity of Arizona
Library-Museum Parallels • Intellectual Property Rights • Physical/Digital Objects Sharing • Descriptive Metadata Formats • Preservation Metadata • Transport Metadata Formats • Communication Protocols (no so much) • Similar Digitization Workflow • OCR Challenges
Intellectual Property Rights • Expanded to 75yrs in US from 25 • Academic Publishing anomalies • Attribution required (data no so much) • Decoupling of Data from Text
Online Computer Library Center (OCLC) • Collaborative Automation of libraries including copy cataloging • Started 1967 • Catalog 271 million items/year • 72,000 libraries in 170 countries and territories use OCLC services to locate, acquire, catalog, lend and preserve library materials.
Descriptive Metadata Formats • MARC(XML) 21 Standard • METS • Dublin Core (Interchange Format only)
Biodiversity Heritage Library Workflow Courtesy: Martin KalfatovicProgram Director, Biodiversity Heritage Library, Smithsonian Institution Libraries
MARC 21 Standard • Formats: Bibliographic, Authority, Holdings, Classification, Community • Bibliographic Material Types: • Books (BK) • Continuing resources (CR) • Computer files (CF) • Maps (MP) • Music (MU) • Visual materials (VM) • Mixed materials (MX) http://www.loc.gov/marc/
MARC Fields • 00X: Control Fields • 01X-09X: Numbers and Code Fields • Heading Fields - General Information • 1XX: Main Entry Fields • 20X-24X: Title and Title-Related Fields • 25X-28X: Edition, Imprint, Etc. Fields • 3XX: Physical Description, Etc. Fields • 4XX: Series Statement Fields • 5XX: Note Fields • 6XX: Subject Access Fields • 70X-75X: Added Entry Fields • 76X-78X: Linking Entry Fields • 80X-83X: Series Added Entry Fields • 841-88X: Holdings, Location, Alternate Graphics, Etc. Fields
MARC Book Example eader/00-23 *****nam##22*****#a#4500 001 <control number> 003 <control number identifier> 005 19920331092212.7 007/00-01 ta 008/00-39 820305s1991####nyu###########001#0#eng## 020 ##$a0845348116 :$c$29.95 (£19.50 U.K.) 020 ##$a0845348205 (pbk.) 040 ##$a[organizationcode]$c[organization code] 050 14$aPN1992.8.S4$bT47 1991 082 04$a791.45/75/0973$219 100 1#$aTerrace, Vincent,$d1948- 245 10$aFifty years of television :$ba guide to series and pilots, 1937-1988 /$cVincent Terrace. 246 1#$a50 years of television 260 ##$aNew York :$bCornwall Books,$cc1991. 300 ##$a864 p. ;$c24 cm. 500 ##$aIncludes index. 650 #0$aTelevision pilot programs$zUnitedStates$vCatalogs. 650 #0$aTelevision serials$zUnitedStates$vCatalogs.
Difference between Museum and Library • Full Darwin code has parallels in MARC • Many more commercial and custom products • Larger installed base • Library Entries somewhat more detailed • There is a MARC(XML) and MARC Lite • MARC differentiates among material types
Digital Content Transport • METS – Metadata Encoding and Transmission Standard • The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language.
Courtesy: Martin KalfatovicProgram Director, Biodiversity Heritage Library, Smithsonian Institution Libraries
METS Components • METS Header • Descriptive Metadata • Administrative Metadata • File Section - The file section lists all files containing content which comprise the electronic versions of the digital object. <file> elements may be grouped within <fileGrp> elements, to provide for subdividing the files by object version. • Structural Map • Structural Links • Behavior
I/O • Submission Information Package (SIP), which is sent from the information producer to the archive; • the Archive Information Package (AIP), which is the information package actually stored by the archive; and • the Dissemination Information Package (DIP), which is the information package transferred from the archive in response to a request by a consumer.
Courtesy: Martin KalfatovicProgram Director, Biodiversity Heritage Library, Smithsonian Institution Libraries
Open Archives Initiative Protocol for Metadata Harvesting • The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.
OAI Verbs • Get • Identify • ListIdentifiers • ListMetadataFormats • ListRecords • ListSets
Get • http://arXiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:cs/0112017&metadataPrefix=oai_dc
<?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-02-08T08:55:46Z</responseDate> <request verb="GetRecord" identifier="oai:arXiv.org:cs/0112017" metadataPrefix="oai_dc">http://arXiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv.org:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title>Using Structural Metadata to Localize Experience of Digital Content</dc:title> <dc:creator>Dushay, Naomi</dc:creator> <dc:subject>Digital Libraries</dc:subject> <dc:description>With the increasing technical sophistication of both information consumers and providers, there is increasing demand for more meaningful experiences of digital information. We present a framework that separates digital object experience, or rendering, from digital object storage and manipulation, so the rendering can be tailored to particular communities of users. </dc:description> <dc:description>Comment: 23 pages including 2 appendices, 8 figures</dc:description> <dc:date>2001-12-14</dc:date> </oai_dc:dc> </metadata> </record> </GetRecord> </OAI-PMH>
Physical/Digital Objects Sharing • Books both part of an Edition and Unique • 20th century books have standard front matter • LMS contained Metadata Only • Journals indexed by article • Most digital content is commercially owned and born digital • 2011 author-publishing exceeded commercial • Born analog digitization (Google Books and BHL)
Governance • Libraries pay for OCLC • OCLC is Participatory • Close Collaboration with Library of Congress on Standards • School System exists to train librarians • Libraries are being cut in academic, public and school sectors