10 likes | 156 Views
ESDORA : A DIGITAL REPOSITORY SYSTEM FACILITATING DATA PRESERVATION, PROVENANCE , DISCOVERY, AND ACCESS * Jerry Pan, Christopher Lenhardt , Biva Shrestha , Yaxing Wei , Robert Cook , Giri Palanisamy , Bruce Wilson
ESDORA: A DIGITAL REPOSITORY SYSTEM FACILITATING DATA PRESERVATION, PROVENANCE, DISCOVERY, AND ACCESS* Jerry Pan, Christopher Lenhardt, BivaShrestha, Yaxing Wei, Robert Cook, GiriPalanisamy, Bruce Wilson Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6407 NASA ESDSWG Annual Meeting, Newport News, VA, November 1-3, 2011 Earth Science Data in Digital Object Repository Architecture (ESDORA) Status, Challenges, Summary Semantic Assertions Open Source Frameworks RDF Store and query support Discovery & Access • Project Status • 2-year ACCESS project started in 2010 • Reached major milestones • System completed except for ingestion workflow • Ingestion workflow scheduled to deliver this quarter • Deployment Plan • We plan to deploy the system at ORNL DAAC as part of the operational system at the end of the project (January 2012). It will serve datasets from the Modeling and Synthesis Thematic Data Center (MAST-DC) project and from ORNL DAAC. • Main Challenges • Fedora Repository by default hides data file associations and their MIME types at file system level, creates a problem for external software that needs direct file access. For examples, FTP, OPeNDAP servers can not work with Fedora out of box. Solution: For Fedora 3.4 and older versions, we implemented a modification of Fedora to expose data file arrangement and MIME types. For the upcoming Fedora 3.6 and newer version, we are to contribute to the Fedora project a permanent fix via a public storage API. • Integration of multiple independent open source technologies with different maturity and underlain programming platforms. Solution: A lot of trials/errors, and communications to make all components work together. • We also have had issues with the integration of an ingestion workflow. Solution: re-implemented with a simpler design. • Summary • ESDORA is a viable technology to infuse data and metadata together and provide an integrated system for preservation, discovery and access of science data content. The digital object data model accommodates different content types and metadata standards in a uniform fashion, which is particularly applicable to Earth Science where the data formats and metadata standards are diverse and numerous. Provenance and descriptive metadata, structured or non-structured, are all accounted for and managed by the system. • A modern Web content management system such as Drupal is also a key component here, as many plugins and functions have been continuously developed for science applications, by a large talent pool, on a well-designed modular architecture. Fedora, Drupal, Islandora, Solr, MySQL Index & Search, OAI-PMH Publishing, and Embedded Viewers Dublin Core Persistent ID RELS-EXT (RDF) RELS-INT (RDF) AUDIT Digital Object POLICY All Content => digital Objects: ATOM, FOXML DOI, Checksum, Derivation History Datastream1 Datastream2 Datastream* Digital Object Abstraction Stewardship & Provenance OAIS Reference Model FGDC, DC … Free Text In-line Editors SIP, AIP, DIP, Data storage & management Flexible Metadata Management Comparable to OAIS Metadata Publishing via OAI-PMH Data History and Audit Trail Data Discovery & Access, Metadata-Driven Views Analyzed_SYNMAP Potential_SYNMAP Data object derivation history Original_SYNMAP AVHRR_CFTC MODIS_GLC GLCC GLC2000 An exemplary data derivation history: synthetic land cover dataset (Original SYNMAP) is derived from four land cover datasets of various sources, and Analyzed SYNMAP and Potential SYNMAP are in tern derived from Original SYNMAP. This provenance information is recorded as RDF statements in ESDORA’s semantic store, and can be queried using SPARQL and iTQL. Opera Access from ESDORA (ORN DAAC THREDDS Service) OAI-PMH publishing of FGDC metadata records • Audit Tails of all components of an object. Various search options including geospatial search Inline metadata editor (FGDC) Metadata-driven dataset displays PROJECT WEB SITE -- http://esdora2.ornl.gov/(Contact: Jerry Pan, pany@ornl.gov) ORNL is managed by the University of Tennessee-Battelle LLC under contract DE-AC05-00OR22725 with the U.S. Department of Energy *Funded by NASA Advancing Collaborative Connections for Earth System Science (ACCESS) Program, Grant #09-ACCESS09-8