100 likes | 216 Views
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research. Bob Cook Environmental Sciences Division Oak Ridge National Laboratory February 6, 2013 NACP All-Investigator Meeting. The DataONE Vision and Approach:.
E N D
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National Laboratory February 6, 2013 NACP All-Investigator Meeting
The DataONE Vision and Approach: Providing universal access to data about life on earth and the environment that sustains it, as well as the tools needed by researchers. 2. Developing sustainable data discovery and interoperability solutions 3. Supporting researcher tools and services 1. Building community
The long tail of orphan data Specialized repositories (50%) Characteristics Big Science Large Volume Automated sensos Well described Well curated Easily Discovered Volume Orphan data (50%) Rank frequency of datatype (B. Heidorn) Characteristics • Small Science • Small Volume • Poorly described • Rarely Indexed • Invisible to scientists • Rarely Used • Dark Data • High spatial resolution • Process based • Theory Development • Model Development • Benchmarking
Check for best practices • Create metadata • Connect to ONEShare Data & Metadata (EML) https://dataone.org http://dataup.cdlib.org/
Model-Data Fusion: Harnessing Observations • Sponsor Requirements for Data Management • Credit for data through citation, DOI, and Data Citation Index • Training in Data Management • Improved tools for data preparation – DataUp • Developing a metadata editor
Model-Data Fusion:Data System Characteristics (1) • Dedicated financial support for data management is essential • Close coordination between the data group(s) and the producers (experimentalists) and users (modelers) of the data products • Based on a data management plan and a data policy • Integrated system that delivers a suite of diverse products • Establish standards (file, workflow, network) and promote interoperability • Processes to assure and document data quality to allow proper interpretation and use
Model-Data Fusion: Data System Characteristics (2) • Facilitate rapid exchange of data, products, and information; rapid exchange of large volume data • Promote the use of best practices to prepare and document data to share and archive • Make efficient use of existing data management infrastructure and resources • Ensure that finalized data and associated documentation are transferred to an appropriate archive • Make numerical models (source code) and description of the models available, along with model parameters and example input and output data (Thornton et al 2005)
Interoperability Coordinating Nodes EML, ISO FGDC KNB EML LTER Internal Metadata Index FGDC, ISO ORNL DAAC Member Nodes Metadata Extraction FGDC CDL • Virtual Portals • Numerous search capabilities • Metadata has link to data, which reside at Member Nodes FGDC, ISO USGS CSAS METS Future DRYAD
The long tail of orphan data “Most of the bytes are at the high end, but most of the datasets are at the low end”– Jim Gray Specialized repositories (e.g. Remote Sensing, NEON) Volume Orphan data (B. Heidorn) Rank frequency of datatype
“Data intensive science” and the “80:20 rule” Intensive science sitesand experiments Decreasing Spatial Coverage Increasing Process Knowledge Extensive science sites Volunteer & education networks Remote sensing Adapted from CENR-OSTP