240 likes | 408 Views
Preservation DataStores (PDS): A demonstration of Preservation Aware Storage. Contacts: Simona Cohen, Michael Factor, Dalit Naor. http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml. Preservation Descriptive Information. Content Information. Content Data Object. Representation
E N D
Preservation DataStores (PDS):A demonstration of Preservation Aware Storage Contacts: Simona Cohen, Michael Factor, Dalit Naor http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Preservation Descriptive Information Content Information Content Data Object Representation Information Reference Provenance Context Fixity PDS Overview • OAIS–based preservation-aware storage, media-agnostic, and generic storage to support logical preservation • Manage preservation specific metadata • Compute fixity • Update technical provenance • Manage PDI RepInfo • Ensure referential integrity • Storlet container • Module container that can execute restricted modules with predefined interfaces for data intensive functions, e.g., transformations, fixity calculations • Optimal scheduling • Update PDS modules, e.g., fixity algorithm, packaging format • Managing availability/ data loss • Physically co-locate data and metadata • Cluster related AIPs on the same media unit, based upon their relative importance • AIP identifier generation – globally unique identifiers http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
PDS Architecture ingestAIP, accessAIP, transformAIP, getPreservationPolicy • Layered approach is based on open standards: OAIS, XAM, OSD • In CASPAR, all layers are utilized. In other embodiments, only some layers can be used • Utilizes XAM to provide logical abstraction of containers (XSets) • Offloads preservation functionality to storage http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
AIP Representation in the Preservation Engine • The basic building-block of an AIP is the Information Object • A single instance of data is a byte stream • Zero or more instances of RepInfo to interpret the data • RepInfo is also an AIP • Shared resource with unique ID • RepInfo recursive network • Keep the design simple http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Preservation of an AIP • Ingest AIP • Storing an AIP in PDS • Bit and logical preservation of an AIP • Bit migrations • Format migrations - data transformation • Transformation modules are packed as AIPs and preserved • Transformation result is a new version for the original AIP • Migrations are documented as Provenance records • During migrations PDS performs operations on AIP • Update PDI (e.g., fixity calculation, additional provenance events) • Execute previously loaded storlets • Access AIP • Retrieval of an AIP • By retrieval time, the original AIP may have several versions and copies http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
MST DataNatural Environment Research Council (NERC) Mesosphere-Stratosphere-Troposphere (MST) Radar at Aberystwyth • Provides measurements of vertical and horizontal components of the wind • Highly complex data • Preserved for the atmospheric scientific community • Needs long-term preservation for research purposes • Community is highly dependent on interoperability http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
British Atmospheric Data Centre and Self-Describing Data Formats • BADC strongly advocate the use of self-describing files to capture provenance and semantic information • Two self-describing file formats compete in the Atmospheric Science Community • NASA AMES (ASCII) • NetCDF (binary) • The current user community finds NetCDF more suitable http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Demo Flow View AIP Ingest AIP 10 years later Load and apply transformation 20 years later Access and view transformed data http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
pds_view Video: http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Ingest AIP flow in PDS • store (XUID, AIP ID) mapping persistently PDS web service AIP ID ingest (AIP) Preservation Engine • store (OID, XUID) mapping persistently XUID create XSet write fields (AIP data) commit XAM • Unpack AIP • Generate AIP ID • Compute fixity • Add “ingest” provenance event XAM to OSD (VIM) create OSD objects write (XSet data) OSD http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Demo Flow View AIP Ingest AIP 10 years later Load and apply transformation 20 years later Access and view transformed data http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
pds_ingest Video: http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
10 Years Later... http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Format Change • Converting NetCDF to NASA-AMES • The MST NetCDF file version is no longer supported by the NetCDF foundations libraries, and files are not backwards compatible with this NetCDF version • The skills base of the atmospheric scientist in the future is heavily biased towards the use of ASCII-based file formats • The NASA-AMES format has become the preferred data format of the atmospheric scientists’ community • Converting NetCDF to PNG • Provide additional quick image view of the data for users with limited access permissions http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Load transformation flow in PDS PDS web service AIP ID loadTransformation (AIP) Preservation Engine • Register transformation • Ingest transformation AIP http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Transform AIP flow in PDS PDS web service AIP ID transformAip (targetAIPID, transformationAIPID) Preservation Engine • Access transformation AIP • Access target AIP • Execute transformation • Ingest result as new AIP, a version of the target AIP http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Demo Flow View AIP Ingest AIP 10 years later Load and apply transformation 20 years later Access and view transformed data http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
pds_xform Video: http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
20 Years Later... http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Access AIP flow in PDS • Validate AIP ID • Validate fixity • Add “access” provenance event • Package AIP PDS web service AIP access (AIP ID) Preservation Engine AIP data open XSet (XUID) read fields XAM • Lookup XUID by AIP ID XAM to OSD (VIM) XSet data read OSD objects • Lookup OSD OID by XUID OSD http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Demo Flow View AIP Ingest AIP 10 years later Load and apply transformation 20 years later Access and view transformed data http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
pds_access Video: http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml
Thank you! Haifa preservation team: Simona Cohen, Michael Factor, Aner Hamama, Ealan Henis, Dalit Naor, Petra Reshef, Shahar Ronen