90 likes | 186 Views
ITHAKA Preservation Metadata 2.0: Revising the Event Model A last-minute presentation on work currently in progress Evan Owens VP, Content Management ITHAKA (JSTOR / Portico) evan.owens@ithaka.org. Background. Portico Preservation Metadata designed & implemented in 2002-2003
E N D
ITHAKA Preservation Metadata 2.0:Revising the Event ModelA last-minute presentation on work currently in progressEvan OwensVP, Content ManagementITHAKA (JSTOR / Portico)evan.owens@ithaka.org
Background • Portico Preservation Metadata designed & implemented in 2002-2003 • Inspired by PREMIS working group participation • Operational before PREMIS was completed! • Portico Archive as of October 2009 • >14 Million E-Journal Articles plus other content • ~150 Million Files • ~1 Billion Events • Only 1K manual events; 99.999% system generated • Over 1 TB of Preservation Metadata • Portico / JSTOR / Ithaka merger in 2009
2.0 PMD Revision Project • Begun in 2008; Implementation now underway • Design Goals for Revision to Events: • Consistent editorial/coding practices (capitalization, verb tenses, etc.) • Clarify what event goes with which object and why • Eliminate redundant information where possible • Make explicit all data constraints not currently expressed in our schemas • Synchronize event metadata with the high-level preservation metadata so that the events properly document changes in the core metadata • Establish a clean base line for future expansion of events metadata
PMD 2.0 Design Choices • Use our own data model / information architecture • Optimized for Java, Oracle, and XML instantiations • XML designed to reduce future versioning: • XSD schema for frame (syntax) only • All business rules (semantics) expressed in Schematron • Not METS, not DIDL, not PREMIS XML • PREMIS compliant • Optimized for size and speed • Fully relationally normalized • Inheritable attributes / metadata • Events attached to objects
Processing Record“master” for each processing pass Bring together information common to all the events from a given processing pass; e.g., initial ingest, future migration, etc.
Not a real event!Example XML serialization showing all possible child elements to illustrate the information model
Event Types • Check: Virus, Fixity, … • Characterize: File, … • Generate: Desc. MD, Tech. MD, Fixity, … • Edit: Desc. MD, … • Set: Status, Format, Preservation Level, … • Ingest: into Archive • Add, Create, Remove File
Observations • Large-scale automated events feel very different from human events • ITHAKA archive will quadruple in 2010 • Likely 3-5 billion events . . . • Every bit of metadata has to be need justified • Events have proved their value • An entire talk on that subject alone • Nothing is easy in quantities of billions • We still have to work on full lifecycle events • THIS IS STILL A WORK IN PROGRESS!