180 likes | 194 Views
Implementation of PREMIS in METS. Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress rgue@loc.gov PREMIS Implementation Fair San Francisco, CA October 7, 2009.
E N D
Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress rgue@loc.gov PREMIS Implementation Fair San Francisco, CA October 7, 2009
METS records the (possibly hierarchical) structure of digital objects, the names and locations of the files that comprise those objects, and the associated metadata • A METS document may be a unit of storage (e.g. OAIS AIP) or a transmission format (e.g. OAIS SIP or DIP) • METS is extensible and modular • METS uses the XML Schema facility for combining vocabularies from different Namespaces • The METS Editorial Board has endorsed PREMIS as an extension schema • Many institutions trying to use PREMIS within the METS context
OAIS, METS and PREMIS <METS> described by delimited by Descriptive Information Archival Information Package Packaging Information identifies derived from <dmdSec> MODS MARCXML DC Preservation Description Information Content Information further described by <fileGrp> <amdSec> Reference Information <mdRef> Data Object Representation Information <rightsMD> Context Information metsRights premis:rights <file> <techMD> <structMap> <digiProvMD> <sourceMD> premis:event Provenance Information Structure Semantics described by Fixity Information <techMD> premis:object File formats premis:object textMD MIX Legend Black Arial = OAIS Red Times New Roman = METS Primary Schema Blue Times New Roman Italics = Extension Schema
METS extension schemas • “wrappers” or “sockets” where elements from other schemas can be plugged in • Provides extensibility • Uses the XML Schema facility for combining vocabularies from different Namespaces • Endorsed extension schemas: • Descriptive: MODS, DC, MARCXML • Technical metadata: MIX (image); textMD (text) • Preservation related: PREMIS
Why do we need guidelines for using PREMIS with METS? • Contents of each information package may vary depending on its function within a repository • Need to determine how to include representation metadata and associate it with package components • PREMIS data entities (objects, events, rights, agents) do not map perfectly to METS categories for representation metadata (techMD, digiProvMD, rightsMD, sourceMD) • There are redundant elements between the two standards • Both have extensibility mechanisms • Flexibility of both standards requires implementation choices
Development of Guidelines for Using PREMIS with METS for Exchange • PREMIS in METS Guidelines Working Group • Consists of PREMIS and METS experts • Focuses on the METS document as a mechanism of exchange of digital objects and their metadata (SIP or DIP) • Facilitates communication when internal requirements and technical environments vary • Tension between flexibility and being prescriptive to facilitate interoperability • Consider usage scenarios • If a SIP it may get unwrapped and stored in different structures • If a DIP it is converted from internal structures to PREMIS • A more liberal approach is possible for a SIP than a DIP • Establishing guidelines, a METS profile, and examples http://www.loc.gov/standards/premis/guidelines-premismets.pdf
Implementation issues in using PREMIS with METS • Location of PREMIS metadata within METS documents • Whether to record elements redundantly if they occur in both PREMIS and METS • Relationship of different structural metadata mechanisms in PREMIS and METS • How to record PREMIS Agent entities in METS documents • Use of identifiers to link elements in PREMIS and METS • How to record elements that are also part of a format specific technical metadata schema (e.g. MIX)
Some recommendations from Guidelines • METS sections • Use Object in techMD or digiProvMD • Use Event in digiProvMD • Use Rights in rightsMD • Use Agent in digiProvMD or rightsMD • PREMIS Container -- use only if keeping all PREMIS metadata together. Do not use if separating PREMIS metadata into different amdSec subelements • PREMIS and METS redundancies -- Choosing which options to use is an implementation decision, document in profile e.g. METS <size> element attributes and subelements of <objectCharacteristics> in PREMIS
Recommendations (cont.) • Structural relationship elements -- use the METS structMap to record structural relationships, use PREMIS relationship elements to record preservation and derivation relationships and structural if desired • ID/IDREF and PREMIS identifier elements -- use METS ID/IDREF mechanisms, best practices for using these ID/IDREF mechanisms apply • Use PREMIS extensibility mechanism for format specific technical metadata • Document decisions in METS profiles
<fileSec><fileGrp> <file ID="FID1" SIZE="184302" ADMID="TMD1PREMIS TMD1MIX DP1EVENT DP1AGENT“ CHECKSUM="4638bc65c5b9715557d09ad373eefd147382ecbf" CHECKSUMTYPE="SHA-1"> <FLocat LOCTYPE="OTHER" xlink:href="BXF22.JPG" /> </file></fileGrp></fileSec> <techMD ID="TMD1PREMIS"> <mdWrap MDTYPE="PREMIS"> <xmlData><premis:object > <objectCharacteristics> <fixity> <messageDigestAlgorithm>SHA-1 </messageDigestAlgorithm> <messageDigest>4638bc65c5b9715557d09ad373eefd147382ecbf </messageDigest> <messageDigestOriginator>EchoDep/messageDigestOriginator> </fixity> <size>184302</size> </objectCharacteristics> Elements defined in both METS and PREMIS: METS: Checksum, Checksumtype attribute of <file> not repeatable PREMIS: fixity also includes messageDigestOriginator allows multiples
<fileSec><fileGrp> <file ID="FID1" ADMID="TMD1PREMIS DP1EVENT DP1AGENT“ MIMETYPE="image/jpeg" <FLocat LOCTYPE="OTHER" xlink:href="BXF22.JPG"/> </file></fileGrp></fileSec> <techMD ID="TMD1PREMIS“ <mdWrap MDTYPE="PREMIS"> <xmlData> <premis:object> <objectCharacteristics> <format> <formatDesignation> <formatName>image/jpeg</formatName> <formatVersion>1.02 </formatVersion> </formatDesignation></format> </objectCharacteristics> Elements defined both in METS and PREMIS: METS: MIMETYPE attribute of <file> optional PREMIS: <format> more granular; includes name and version (although name may be MIMETYPE) mandatory
<fileSec> <fileGrp> <file ID="FID1" ADMID="TMD1PREMIS TMD1MIX DP1EVENT DP1AGENT"> <techMD ID="TMD1PREMIS"> <linkingEventIdentifier> <linkingEventIdentifierType>ECHODEP Hub Event </linkingEventIdentifierType> <linkingEventIdentifierValue>echo12345</linkingEventIdentifierValue> </linkingEventIdentifier> <digiprovMD ID="DP1EVENT"> <premis:event> <eventIdentifier> <eventIdentifierType>ECHODEP Hub Event</eventIdentifierType> <eventIdentifierValue>echo12345 </eventIdentifierValue> </eventIdentifier> <eventType>ingestion</eventType> <eventDateTime>2006-05-02T15:12:53 </eventDateTime></event> Elements defined both in METS and PREMIS METS ID/Idref: used to associate metadata in different sections and for different files PREMIS identifiers: explicit linking between entity types
<structMap TYPE=“physical”> <div ORDER="1" TYPE="text"> <:fptr FILEID="FID9"/> <div ORDER="1" TYPE="page" LABEL=" Page [1]"> <fptr FILEID="FID1"/></mets:div> <div ORDER="2" TYPE="page" LABEL=" Page [2]"> <fptr FILEID="FID2"/></mets:div> </div> <relationship> <relationshipType>structural</relationshipType> <relationshipSubType>is sibling of </relationshipSubType> <relatedObjectIdentification> <relatedObjectIdentifierType>UCB</relatedObjectIdentifierType> <relatedObjectIdentifierValue>FID2</relatedObjectIdentifierValue> <relatedObjectSequence>1</relatedObjectSequence> Elements defined both in METS and PREMIS: METS: structMap details structural relationships and is the heart of the METS document hierarchical, so may be more expressive than PREMIS semantic units links the elements of the structure to content files and metadata PREMIS: <relationship> details all kinds of relationships, including structural data dictionary says that implementations may record by other means
Some METS profiles with PREMIS • UCSD simple and complex object • UC Berkeley • ECHO Dep Generic METS Profile for Preservation and Digital Repository Interoperability • LC Profile for Recorded Events • Australian METS Profile • TIPR • … many others
Additional changes to Guidelines • Make extensibility mechanism consistent with METS • significantPropertiesExtension • objectCharacteristicsExtension • creatingApplicationExtension • environmentExtension • signatureInformationExtension • eventOutcomeDetailExtension • rightsExtension
Additional changes to Guidelines (cont.) • Add the same elements and attributes as in METS to PREMIS extension elements in schema and data dictionary • mdRef, mdWrap • binData, xmlData • Attributes: ID, LABEL, MDTYPE, MIMETYPE, SIZE, CREATED, CHECKSUM, CHECKSUMTYPE • Allow URI or string for MDTYPE • Add use cases/examples to illustrate choices made • Clarify structural relationships
Implementing an Exchange Standard • PREMIS Implementation Tool • Some tools documented on the PREMIS website http://www.loc.gov/standards/premis/tools_for_premis.php • PiM tool developed by Florida Center for Library Automation • Further work to generate metadata from digital files in PREMIS elements