220 likes | 236 Views
PREMIS at the British Library. Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009. General. Archival Information Package (AIP) AIP is just a conceptual entity Conceptual (generic) data model Content files stored on write once media
E N D
PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009
General • Archival Information Package (AIP) • AIP is just a conceptual entity • Conceptual (generic) data model • Content files stored on write once media • Content files may be containerized (stored in ZIP or WARC files) One or more containers per AIP; files in containers may belong to various AIPs • AIP Descriptor: METS file describes the content of the AIP structure, files, descriptive metadata, preservation metadata • Different METS profiles for different content streams eJournals, newspapers (born digital and digitized), web archiving • Common underlying document model for all AIPs
METS Descriptor • What is stored in the METS Descriptor? • Structure of the document (logical and physical in different structMaps) Not all content streams have two structMaps (born digital streams have only on) • Descriptive metadata • File Section Defines container files as well as content files (nested <file> elements)
METS Descriptor • What is stored in the METS Descriptor? • Structure of the document (logical and physical in different structMaps) Not all content streams have two structMaps (born digital streams • Descriptive metadata • File Section Defines container files as well as content files (nested <file> elements) • Preservation metadata Preservation metadata for files and representations
METS Descriptor • What is stored in the METS Descriptor? • Preservation metadata: Preservation metadata for files and representations • Focusses on: • Audit trail – events and agents • Technical metadata – basic technical metadata in METS and PREMIS • Assumption: future migrations of files necessary No emulation considered; no environment information stored <mets:file> elements <mets:div> elements
Preservation Metadata (PREMIS)in METS • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output • Newspapers uses PREMIS 2.0; MODS 3.3; METS 1.8 • Web Archiving uses PREMIS 2.0; MODS 3.3; DC; METS 1.8
Preservation Metadata (PREMIS)eJournal content stream • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output • AIP model: One AIP per article, issue, journal, digital manifestation • Any changes will lead to a new AIP; old version of AIP is referenced
Preservation Metadata (PREMIS)eJournal content stream • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output • AIP model: One AIP per article, issue, journal, digital manifestation • Journal, Issue, Article: AIP consists just of a METS descriptor (mainly descriptive metadata (MODS) embedded and preservation metadata: • PREMIS: regarded as representations of intellectual entities • Relationships between representations are recorded in MODS record
Preservation Metadata (PREMIS)eJournal content stream • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove dtd • AIP model: One AIP per article, issue, journal, manifestation • Digital Manifestation: AIP consists of content files and METS descriptor. METS descriptor contains PREMIS records for files and one for the Digital Manifestation itself • Relationships to article recorded in PREMIS record (manifestationOf) • Relationships to submission is recorded in PREMIS (containedInSubmission) • Submission: received content files in ZIP (one AIP)
Preservation Metadata (PREMIS) and METS:eJournal content stream • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output • amdSec: • one amdSec per PREMIS record; referenced from <mets:file> and <mets:div> elements • Use of <premis:object>; <premis:agent>; <premis:event> elements • techMD: • Extracted data from Jhove (files) • PREMIS record of a file • digiprovMD: • PREMIS record of representations (journal, issue, article) • PREMIS record of a file
Preservation Metadata (PREMIS) and METS:eJournal content stream • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output • PREMIS elements used: • objectIdentifier • objectCategory • preservationLevel • size • fixity (MD5, SHA-512) • format (PRONOM) • Relationships, events and agents where necessary
Preservation Metadata (PREMIS) and METS:eJournal content stream • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output • PREMIS elements used: • objectIdentifier • objectCategory • preservationLevel • size • fixity (MD5, SHA-512) • format (PRONOM) • Relationships, events and agents where necessary } Redundantly in METS <file> element
Preservation Metadata (PREMIS):relationships • PREMIS relationships: • manifestationOf (between Manifestation and Article) • containedInSubmission (between Manifestation and Submission) • PREMIS relationships (between files: m-n relationships): • migration • uncompression • modification • Relationships are always stored in <digiProvMD> • Premis records for files will have techMD and digiProvMD
Preservation Metadata (PREMIS):events • PREMIS events (on file level): • integrityCheck • formatIdentification • validation • wellformness • propertyExtraction • PREMIS events (on representation level): • metadataUpdate • Relationships are always stored in <digiProvMD> • Premis records for files will have techMD and digiProvMD
Preservation Metadata (PREMIS):events • PREMIS events always have an agent • Event and agents are stored in each PREMIS record: • In case an event effects more than one object, it must be repeated in each object’s PREMIS record. • Using the same identifier indicating it is the same event.
Preservation Metadata (PREMIS)in METS • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove dtd • Newspapers uses PREMIS 2.0; MODS 3.3; METS 1.8 • Web Archiving uses PREMIS 2.0; MODS 3.3; DC; METS 1.8 • Move to PREMIS 2.0 • Changes to AIP model
AIPs and PREMIS 2.0 • Change of AIP: • Newspapers need second structMap (and structLink) • Hierarchy of AIPs no longer possible • Instead: one AIP per issue • Manifestations are modelled as a <fileGrp> • (various manifestations per AIP possible) • Support of container files (ZIP, WARC) • Modelled as nested <file> elements; no PREMIS record for container files • No file format specific technical metadata is captured
METS and PREMIS 2.0 • METS and PREMIS 2.0: • Use of new METS schema versions: • <mets:mdWrap MDTYPE="PREMIS:OBJECT"> • <premis:object xsi:type="premis:file"> instead of objectCategory • just use <digiProvMD> • Agent, object, event in separate <digiProvMD> elements within the same <amdSec> • PREMIS record should be self containing
METS and PREMIS 2.0 • Extended list of event types: • deselection: files which are defined in the AIP descriptor but never ingested (no FLocat element) • metadataExtraction vs. propertyExtraction • Extended list of relationship types (relationshipSubType): • modification vs. manipulation
METS and PREMIS 2.0 • Extended list of event types: • deselection: files which are defined in the AIP descriptor but never ingested (no FLocat element) • metadataExtraction vs. propertyExtraction • Extended list of relationship types (relationshipSubType): • modification vs. manipulation
METS and PREMIS 2.0 • Problems: • Validation • Using controlled vocabularies • Considering dependencies between METS and PREMIS • Standardized workflow for creating METS and PREMIS for all content streams • Currently specific implementations for each content stream • Extending the AIP Model • Preservation metadata for metadata records
Thanks Markus Enders The British Library Markus.Enders@bl.uk