240 likes | 370 Views
Using Metadata Standards in Digital Libraries: Implementing METS, MODS, PREMIS and MIX: Introduction. Rebecca Guenther Library of Congress LITA Standards IG Program, ALA Annual 2007. Program overview. Introduction To METS, MODS, PREMIS and MIX (Guenther)
E N D
Using Metadata Standards in Digital Libraries:Implementing METS, MODS, PREMIS and MIX: Introduction Rebecca GuentherLibrary of Congress LITA Standards IG Program, ALA Annual 2007
Program overview • Introduction To METS, MODS, PREMIS and MIX (Guenther) • Using METS and MODS for presentations of LC content (Cundiff, Trail) • Using METS in special collections at CDL (Tingle) • Creating rich shareable metadata: the DLF Aquifer MODS implementation guidelines (Shreeves) • METS, MODS and PREMIS, Oh My!: Integrating digital library standards for interoperability and preservation (Habing) • MODS as metadata Hub (Olson)
Metadata standards in digital libraries • XML is the de-facto standard for metadata descriptions on the Internet • Interoperability and object exchange requires the use of established standards • Many digital objects are complex and are comprised of multiple files • Complex digital objects require many more forms of metadata than analog for their management and use • Descriptive • Technical • Digital provenance/events • Structural • Rights/Terms and conditions
Descriptive metadata: MARCXML • Millions of rich descriptive records in MARC systems: can be reused in an XML environment using MARCXML • MARCXML uses the MARC data element set in an XML syntax • Allows interoperability with other XML schemes by taking advantage of free XML tools • Allows for collaborative use of metadata for access (e.g. OAI) • Provides continuity with current data and flexible transition options
MARCXML • MARCXML record • XML exact equivalent of MARC (2709) record • Lossless/roundtrip conversion to/from MARC 21 record • Simple flexible XML schema, no need to change when MARC 21 changes • Presentations using XML stylesheets • LC provides converters (open source) • http://www.loc.gov/standards/marcxml • Music record in MARCXML
What is MODS? • Metadata Object Description Schema • An XML descriptive metadata standard • A derivative of MARC • Uses language based tags • Contains a subset of MARC data elements • Repackages elements to eliminate redundancies • MODS does not assume the use of any specific rules for description • Element set is particularly applicable to digital resources
Uses of MODS • Extension schema to METS • Rich description works well with hierarchical METS objects • To represent metadata for harvesting (OAI) • Language based tags are more user friendly • As a specified XML format for SRU • As a core element set for convergence between MARC and non-MARC XML descriptions • For original resource description in XML syntax that is simpler than full MARC
Title Info Name Type of resource Genre Origin Info Language Physical description Abstract Table of contents Target audience Note Subject Classification Related item Identifier Location Access conditions Part Extension Record Info MODS high-level elements Music record in MODS
MODS Development • Developed 2002 through open listserv discussion of possible implementers (LC coordinated) • Version 1 in late 2002; now in version 3.2 with 3.3 almost complete • Companion for authority metadata (MADS) in version 1.0 (2005) • Endorsed as METS extension schema for descriptive metadata section • Registered with NISO • Widely used in digital library projects • MODS Implementation registry: http://www.loc.gov/mods/registry.php
What is METS? • METS records the (possibly hierarchical) structure of digital objects, the names and locations of the files that comprise those objects, and the associated metadata • A container for metadata and file pointers • A METS document may be a unit of storage or a transmission format • METS is extensible and modular, using “wrappers” or “sockets” where elements from other schemas can be plugged in • METS uses the XML Schema facility for combining vocabularies from different Namespaces
What is PREMIS? • A data dictionary for metadata to support the long-term preservation of digital objects • A piece of the necessary infrastructure for implementing reliable, sustainable preservation programs • A supporting set of XML schema for implementation in a variety of contexts • A maintenance activity hosted at LC including an Implementers’ Group and Editorial Committee
What is preservation metadata? Preservation Metadata Content • Provenance: • Who has had custody/ownership of the digital object? • Authenticity: • Is the digital object what it purports to be? • Preservation Activity: • What has been done to preserve the digital object? • Technical Environment: • What is needed to render and use the digital object? • Rights Management: • What IPR must be observed? • Makes digital objects self-documenting across time 10 years on 50 years on Forever!
Guiding principles and assumptions … • “Implementable, core, preservation metadata”: • “Preservation metadata”: maintain viability, renderability, understandability, authenticity, identity in a preservation context • “Core”: What most preservation repositories need to know to preserve digital materials over the long-term • “Implementable”: rigorously defined; supported by usage guidelines/recommendations; emphasis on automated workflows • Implementation neutral: • No assumptions on specific implementation • Promote flexibility/interoperability • Focus on semantic units: what you need to know (implementation-neutral) vs. metadata elements: how you record it (implementation-specific) • Information that needs to be “recoverable” from the digital archiving system, independent of local implementation
Scope • What PREMIS is: • Common data model for organizing/thinking about preservation metadata • Guidance for local implementations • Standard for exchanging information packages between repositories • What PREMIS is not: • Out-of-the-box solution: need to instantiate as metadata elements in repository system • All needed metadata: excludes business rules, format-specific technical metadata, descriptive metadata for access, non-core preservation metadata • Lifecycle management of objects outside repository • Rights management: limited to permissions regarding actions taken within repository
PREMIS data model Intellectual Entities Rights Agents Objects Events
objectIdentifier preservationLevel objectCategory objectCharacteristics creatingApplication originalName storage environment signatureInformation relationship linkingEventIdentifier linkingIntellectual Entity Identifier linkingPermission StatementIdentifier Semantic units pertaining to objects: technical metadata
Semantic units pertaining to Events: provenance and preservation activity • eventIdentifier • eventType • eventDateTime • eventDetail • eventOutcome • eventOutcomeDetail • linkingAgentIdentifier • linkingObjectIdentifier
Semantic units pertaining to Rights: terms and conditions • permissionStatement • permissionStatementIdentifier • relatedObject • grantingAgent • grantingAgreement • permissionGranted • act • restriction • termOfGrant • permissionNote
Semantic units pertaining to Agents • agentIdentifier • agentName • agentType
PREMIS maintenance activities • First revision of Data Dictionary (PREMIS 2.0) • Documenting errata and proposed revisions to Data Dictionary (feedback through PIG list) • http://www.loc.gov/standards/premis/changes.html • PREMIS Implementers’ Registry • http://www.loc.gov/standards/premis/premis-registry.html • Consultancies (funded by Library of Congress): • Rights issues for digital preservation (Karen Coyle) • PREMIS implementation guidelines and recommendations (Deborah Woodyard-Robinson) • PREMIS Tutorials: • Glasgow, Boston, Stockholm, Albuquerque, Washington
What is MIX? • Metadata For Images in XML • An XML Schema designed for expressing technical metadata for digital still images • Based on the NISO Z39.87 Data Dictionary – Technical Metadata for Digitial Still Images • Used to express attributes of digital images such as file format, file size, dimensions, resolution, compression, etc. • Version 1.0 (recently released) includes support for GIS images and JPEG 2000 images; data element names harmonized with PREMIS • Can be used standalone or as an extension schema with METS
How do these standards work together for digital libraries? • A container format such as METS allows for packaging together forms of metadata with objects or pointers to objects • There are about 5 years of experimentation experience using METS in combination with other standards for managing and using digital objects in digital libraries • These standards are all freely available • METS profiles detail how METS is used for particular object types or applications • Best practices are needed (and being developed) for use of PREMIS with METS and MIX • Using METS, MODS, PREMIS and MIX: http://www.loc.gov/premis/louis.xml