1 / 22

PREMIS at the British Library

PREMIS at the British Library. Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009. General. Archival Information Package (AIP) AIP is just a conceptual entity Conceptual (generic) data model Content files stored on write once media

Download Presentation

PREMIS at the British Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009

  2. General • Archival Information Package (AIP) • AIP is just a conceptual entity • Conceptual (generic) data model • Content files stored on write once media • Content files may be containerized (stored in ZIP or WARC files) One or more containers per AIP; files in containers may belong to various AIPs • AIP Descriptor: METS file describes the content of the AIP structure, files, descriptive metadata, preservation metadata • Different METS profiles for different content streams eJournals, newspapers (born digital and digitized), web archiving • Common underlying document model for all AIPs

  3. METS Descriptor • What is stored in the METS Descriptor? • Structure of the document (logical and physical in different structMaps) Not all content streams have two structMaps (born digital streams have only on) • Descriptive metadata • File Section Defines container files as well as content files (nested <file> elements)

  4. METS Descriptor • What is stored in the METS Descriptor? • Structure of the document (logical and physical in different structMaps) Not all content streams have two structMaps (born digital streams • Descriptive metadata • File Section Defines container files as well as content files (nested <file> elements) • Preservation metadata Preservation metadata for files and representations

  5. METS Descriptor • What is stored in the METS Descriptor? • Preservation metadata: Preservation metadata for files and representations • Focusses on: • Audit trail – events and agents • Technical metadata – basic technical metadata in METS and PREMIS • Assumption: future migrations of files necessary No emulation considered; no environment information stored <mets:file> elements <mets:div> elements

  6. Preservation Metadata (PREMIS)in METS • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output • Newspapers uses PREMIS 2.0; MODS 3.3; METS 1.8 • Web Archiving uses PREMIS 2.0; MODS 3.3; DC; METS 1.8

  7. Preservation Metadata (PREMIS)eJournal content stream • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output • AIP model: One AIP per article, issue, journal, digital manifestation • Any changes will lead to a new AIP; old version of AIP is referenced

  8. Preservation Metadata (PREMIS)eJournal content stream • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output • AIP model: One AIP per article, issue, journal, digital manifestation • Journal, Issue, Article: AIP consists just of a METS descriptor (mainly descriptive metadata (MODS) embedded and preservation metadata: • PREMIS: regarded as representations of intellectual entities • Relationships between representations are recorded in MODS record

  9. Preservation Metadata (PREMIS)eJournal content stream • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove dtd • AIP model: One AIP per article, issue, journal, manifestation • Digital Manifestation: AIP consists of content files and METS descriptor. METS descriptor contains PREMIS records for files and one for the Digital Manifestation itself • Relationships to article recorded in PREMIS record (manifestationOf) • Relationships to submission is recorded in PREMIS (containedInSubmission) • Submission: received content files in ZIP (one AIP)

  10. Preservation Metadata (PREMIS) and METS:eJournal content stream • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output • amdSec: • one amdSec per PREMIS record; referenced from <mets:file> and <mets:div> elements • Use of <premis:object>; <premis:agent>; <premis:event> elements • techMD: • Extracted data from Jhove (files) • PREMIS record of a file • digiprovMD: • PREMIS record of representations (journal, issue, article) • PREMIS record of a file

  11. Preservation Metadata (PREMIS) and METS:eJournal content stream • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output • PREMIS elements used: • objectIdentifier • objectCategory • preservationLevel • size • fixity (MD5, SHA-512) • format (PRONOM) • Relationships, events and agents where necessary

  12. Preservation Metadata (PREMIS) and METS:eJournal content stream • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output • PREMIS elements used: • objectIdentifier • objectCategory • preservationLevel • size • fixity (MD5, SHA-512) • format (PRONOM) • Relationships, events and agents where necessary } Redundantly in METS <file> element

  13. Preservation Metadata (PREMIS):relationships • PREMIS relationships: • manifestationOf (between Manifestation and Article) • containedInSubmission (between Manifestation and Submission) • PREMIS relationships (between files: m-n relationships): • migration • uncompression • modification • Relationships are always stored in <digiProvMD> • Premis records for files will have techMD and digiProvMD

  14. Preservation Metadata (PREMIS):events • PREMIS events (on file level): • integrityCheck • formatIdentification • validation • wellformness • propertyExtraction • PREMIS events (on representation level): • metadataUpdate • Relationships are always stored in <digiProvMD> • Premis records for files will have techMD and digiProvMD

  15. Preservation Metadata (PREMIS):events • PREMIS events always have an agent • Event and agents are stored in each PREMIS record: • In case an event effects more than one object, it must be repeated in each object’s PREMIS record. • Using the same identifier indicating it is the same event.

  16. Preservation Metadata (PREMIS)in METS • Content streams: • eJournals uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove dtd • Newspapers uses PREMIS 2.0; MODS 3.3; METS 1.8 • Web Archiving uses PREMIS 2.0; MODS 3.3; DC; METS 1.8 • Move to PREMIS 2.0 • Changes to AIP model

  17. AIPs and PREMIS 2.0 • Change of AIP: • Newspapers need second structMap (and structLink) • Hierarchy of AIPs no longer possible • Instead: one AIP per issue • Manifestations are modelled as a <fileGrp> • (various manifestations per AIP possible) • Support of container files (ZIP, WARC) • Modelled as nested <file> elements; no PREMIS record for container files • No file format specific technical metadata is captured

  18. METS and PREMIS 2.0 • METS and PREMIS 2.0: • Use of new METS schema versions: • <mets:mdWrap MDTYPE="PREMIS:OBJECT"> • <premis:object xsi:type="premis:file"> instead of objectCategory • just use <digiProvMD> • Agent, object, event in separate <digiProvMD> elements within the same <amdSec> • PREMIS record should be self containing

  19. METS and PREMIS 2.0 • Extended list of event types: • deselection: files which are defined in the AIP descriptor but never ingested (no FLocat element) • metadataExtraction vs. propertyExtraction • Extended list of relationship types (relationshipSubType): • modification vs. manipulation

  20. METS and PREMIS 2.0 • Extended list of event types: • deselection: files which are defined in the AIP descriptor but never ingested (no FLocat element) • metadataExtraction vs. propertyExtraction • Extended list of relationship types (relationshipSubType): • modification vs. manipulation

  21. METS and PREMIS 2.0 • Problems: • Validation • Using controlled vocabularies • Considering dependencies between METS and PREMIS • Standardized workflow for creating METS and PREMIS for all content streams • Currently specific implementations for each content stream • Extending the AIP Model • Preservation metadata for metadata records

  22. Thanks Markus Enders The British Library Markus.Enders@bl.uk

More Related