140 likes | 267 Views
NGDA Architecture Update. Greg Janée. Three motivations. Archival has to be cheap & easy little incentive no funding Need to archive data semantics key differentiator from text, audio, video Focus on long-term preservation need to migrate whole systems. system. handle resolver.
E N D
NGDA Architecture Update Greg Janée
Three motivations • Archival has to be cheap & easy • little incentive • no funding • Need to archive data semantics • key differentiator from text, audio, video • Focus on long-term preservation • need to migrate whole systems Greg Janée • May 16, 2005
system handle resolver handle resolver storage database database database database fragile Typical repository architecture Greg Janée • May 16, 2005
NGDA architecture access ingest Web ADL OAI bulk loader archival system storage subsystem standard, public data model databases, caches, etc. Greg Janée • May 16, 2005
Post-NGDA architecture Web storage subsystem standard, public data model Greg Janée • May 16, 2005
Storage system requirements • Req’s: • associate UUIDs/RIDs with bitstreams • retrieve global/local bitstream by UUID/RID • determine (parent) UUID of any bitstream • list all UUIDs • Satisfied by: • any filesystem • tag URIs for UUIDs • tag:library.ucsb.edu,2005:identifier Greg Janée • May 16, 2005
directory UUID RID component Archival objects UUID Greg Janée • May 16, 2005
Archival objects • Directory info per component • named relationship/position • format & semantics • by UUID references to definitions • fixity: checksum • provenance: isDerivative • policy: mutability • rights • Components may be provided by archive itself Greg Janée • May 16, 2005
Example USGS Object x DOQQ derived metadata data x.fgdc x.tiff x.gif subtypeOf FGDC GeoTIFF TIFF Greg Janée • May 16, 2005
Archives • Archive = set of archival objects • no structure • no free-floating bitstreams • In anticipation of federation: • associations may cross archive boundaries • archival objects may not Greg Janée • May 16, 2005
Object types • Content • Format definition • Semantic definition • Provider • Organizational structures • collection • series • ingest session Greg Janée • May 16, 2005
Archive-provider agreement • Defines • common structure of objects to be ingested • necessary validations • associations to other objects • policies, rights, etc. • Represents choke point • requires human evaluation Greg Janée • May 16, 2005
Deferred functionality • Incremental ingest • Object revisions • Rights • 3rd-party access • Federation Greg Janée • May 16, 2005
Status • Starting development now • Approach: iterative refinement Greg Janée • May 16, 2005