1 / 9

Background

ITHAKA Preservation Metadata 2.0: Revising the Event Model A last-minute presentation on work currently in progress Evan Owens VP, Content Management ITHAKA (JSTOR / Portico) evan.owens@ithaka.org. Background. Portico Preservation Metadata designed & implemented in 2002-2003

Download Presentation

Background

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ITHAKA Preservation Metadata 2.0:Revising the Event ModelA last-minute presentation on work currently in progressEvan OwensVP, Content ManagementITHAKA (JSTOR / Portico)evan.owens@ithaka.org

  2. Background • Portico Preservation Metadata designed & implemented in 2002-2003 • Inspired by PREMIS working group participation • Operational before PREMIS was completed! • Portico Archive as of October 2009 • >14 Million E-Journal Articles plus other content • ~150 Million Files • ~1 Billion Events • Only 1K manual events; 99.999% system generated • Over 1 TB of Preservation Metadata • Portico / JSTOR / Ithaka merger in 2009

  3. 2.0 PMD Revision Project • Begun in 2008; Implementation now underway • Design Goals for Revision to Events: • Consistent editorial/coding practices (capitalization, verb tenses, etc.) • Clarify what event goes with which object and why • Eliminate redundant information where possible • Make explicit all data constraints not currently expressed in our schemas • Synchronize event metadata with the high-level preservation metadata so that the events properly document changes in the core metadata • Establish a clean base line for future expansion of events metadata

  4. PMD 2.0 Design Choices • Use our own data model / information architecture • Optimized for Java, Oracle, and XML instantiations • XML designed to reduce future versioning: • XSD schema for frame (syntax) only • All business rules (semantics) expressed in Schematron • Not METS, not DIDL, not PREMIS XML • PREMIS compliant • Optimized for size and speed • Fully relationally normalized • Inheritable attributes / metadata • Events attached to objects

  5. Processing Record“master” for each processing pass Bring together information common to all the events from a given processing pass; e.g., initial ingest, future migration, etc.

  6. Not a real event!Example XML serialization showing all possible child elements to illustrate the information model

  7. Event Types • Check: Virus, Fixity, … • Characterize: File, … • Generate: Desc. MD, Tech. MD, Fixity, … • Edit: Desc. MD, … • Set: Status, Format, Preservation Level, … • Ingest: into Archive • Add, Create, Remove File

  8. Mapping PMD 2.0 to PREMIS

  9. Observations • Large-scale automated events feel very different from human events • ITHAKA archive will quadruple in 2010 • Likely 3-5 billion events . . . • Every bit of metadata has to be need justified • Events have proved their value • An entire talk on that subject alone • Nothing is easy in quantities of billions • We still have to work on full lifecycle events • THIS IS STILL A WORK IN PROGRESS!

More Related