1 / 28

A Data Model and Architecture for Long-term Preservation

A Data Model and Architecture for Long-term Preservation. Greg Janée, Justin Mathena, James Frew University of California at Santa Barbara. Outline. Project overview Character of geospatial data Observations on preservation requirements Architecture Ongoing work. Project overview.

livana
Download Presentation

A Data Model and Architecture for Long-term Preservation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Data Model and Architecture for Long-term Preservation Greg Janée, Justin Mathena, James Frew University of California at Santa Barbara

  2. Outline • Project overview • Character of geospatial data • Observations on preservation • requirements • Architecture • Ongoing work Greg Janée • JCDL 2008

  3. Project overview • National Geospatial Digital Archive (NGDA) • UCSB (Map & Imagery Laboratory) • Stanford (Branner Earth Sciences Library) • Funded by Library of Congress’s NDIIPP program How to achieve long-term preservation of geospatial data on a national scale? Greg Janée • JCDL 2008

  4. Geospatial data characteristics • Voluminous • Sensor platforms are long-lived • Highly structured • support not ubiquitous • Requires specialized interpretation • Tied to Earth models Greg Janée • JCDL 2008

  5. now + 100 years Starting point content now take action Greg Janée • JCDL 2008

  6. now + 100 years now Preservation: relay across time institution repository system storage system Greg Janée • JCDL 2008

  7. now + 100 years now Preservation: relay across time institution Requirement Each archive facilitates handoff to the next repository system storage system Greg Janée • JCDL 2008

  8. now + 50 now take action now - 50 Mid-century perspective content old content ancient content Greg Janée • JCDL 2008

  9. now + 50 now take action now - 50 Mid-century perspective Requirement Each archive facilitates handoff to the next ... on unfamiliar content ... such that the next archive can make the same claim content old content ancient content Greg Janée • JCDL 2008

  10. Preservation: mitigation of risk • Preservation is an outcome • Risk: insufficient resources and/or desire • Risk: handoff • e.g., from failing institution • e.g., from unsupported repository system Greg Janée • JCDL 2008

  11. Preservation: mitigation of risk • Preservation is an outcome • Risk: insufficient resources and/or desire • Risk: handoff • e.g., from failing institution • e.g., from unsupported repository system Requirement Each archive supports a low-cost, robust “fallback” preservation mode Greg Janée • JCDL 2008

  12. 2008 2108 object (data + metadata) object (data + metadata) computing platform semantics terminology provenance provider quality appropriate usage community object migrate context capture context Preservation: context Greg Janée • JCDL 2008

  13. Geospatial data context • Complex • sensor, platform characteristics • In practice, not handled as metadata • Deep understanding of provenance required • to support reprocessing Greg Janée • JCDL 2008

  14. Ozone reprocessing requirements • xDRs • Delivered IPs • Engineering data (incl. C3S data if not in RDRs) • Upload files • Databases • Software (source code) • Calibration artifacts • data • analysis tools • tables • logs • notebooks • instrument design • All project documentation • All scientific papers • All reports Taken from: Mike Linda, “OMPS Aggregation and Packaging,” 2006 CLASS Users’ Workshop Greg Janée • JCDL 2008

  15. Ozone reprocessing requirements • xDRs • Delivered IPs • Engineering data (incl. C3S data if not in RDRs) • Upload files • Databases • Software (source code) • Calibration artifacts • data • analysis tools • tables • logs • notebooks • instrument design • All project documentation • All scientific papers • All reports Requirement Context must be preserved ... and context must accommodate complex networks of objects Taken from: Mike Linda, “OMPS Aggregation and Packaging,” 2006 CLASS Users’ Workshop Greg Janée • JCDL 2008

  16. storage virtualization layer seamless movement, reliability, redundancy interop standard Architecture archive management, policies, services, access domain- specific logical data model standard packaging of data, semantics best practices/ interop standard physical data model survivable, vendor-neutral representation of above Greg Janée • JCDL 2008

  17. Architecture • Logical data model captures all information required to resurrect, reuse objects • Includes archival of format specs, metadata, contextual information, transitive closure thereof • NGDA: archival objects domain- specific archive logical data model best practices/ interop physical data model storage virtualization layer interop Greg Janée • JCDL 2008

  18. Architecture • Physical data model fully and simply represents logical data model • No vendor lock-in • NGDA: files, filesystems, XML manifests domain- specific archive logical data model best practices/ interop physical data model storage virtualization layer interop Greg Janée • JCDL 2008

  19. Architecture • Storage virtualization layer supports intra- and inter-archive handoffs • NGDA: “logistical networking” domain- specific archive logical data model best practices/ interop physical data model storage virtualization layer interop Greg Janée • JCDL 2008

  20. Architecture: fallback • Combination of complete resurrection information with a simple physical representation provides fallback mechanism domain- specific archive logical data model best practices/ interop physical data model storage virtualization layer interop Greg Janée • JCDL 2008

  21. Architecture: handoffs export ingest archive archive logical data model logical data model physical data model physical data model storage virtualization layer Greg Janée • JCDL 2008

  22. Logical data model Greg Janée • JCDL 2008

  23. Example archival object Greg Janée • JCDL 2008

  24. ...identifier/ manifest.xml cnty24k97.xml data/ source/ cnty24k97.shp cnty24k97.dbf ... cnty24k97.png object structure fixity metadata inter- and intra-object relationships Physical data model Greg Janée • JCDL 2008

  25. Storage abstraction • Bitstreams • create, (delete), read, write • no modify • Directories • create, (delete), list members • Above identified by hierarchical pathnames • Satisfied by filesystems, WebDAV, ... Greg Janée • JCDL 2008

  26. Archive depencies • Filesystem • XML • Character set(s) • Identifier resolution mechanism(s) Greg Janée • JCDL 2008

  27. Summary • Architecture to facilitate handoffs, reduce risk, provide fallback • best practices • interoperability potential • Ongoing work • “logistical networking” for storage virtualization • preservation profiles for other data models • format registries and other achive depencies • whole-archive descriptor • dependencies, policies Greg Janée • JCDL 2008

  28. Questions? Greg Janée • JCDL 2008

More Related