330 likes | 419 Views
Data Resources US Perspective. Kerstin Lehnert Suzanne Carbotte. Lamont-Doherty Earth Observatory of Columbia University. Scientific Data in the Digital Age.
E N D
Data ResourcesUS Perspective Kerstin LehnertSuzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University
Scientific Data in the Digital Age “It is exceedingly rare that fundamentally new approaches to research and education arise. Information technology has ushered in such a fundamental change. Digital data collections are at the heart of this change.” US National Science Board, Report to the US National Science Foundation,, 2005
Access to Data “Effective access to research data, in a responsible and efficient manner, is required to take advantage of the new opportunities and benefits offered by new information and communication technologies.” Organization for Economic Co-operation & Development: “Principles and Guidelines for Access to Research Data from Public Funding” May 2007
Open Access to Data: Benefits • Democratize access to research resources • Ensure broad dissemination of results • Facilitate new cross-disciplinary approaches - access for non-specialist users • Enable verification of research results • Provide new research opportunities • Provide access to data from variety of sources and enable integration across fields • Provide foundation for use of automated tools • Facilitate more efficient use of resources • Data are often expensive to collect (especially marine!) • often/usually unique, repeat collection/analysis rare
Data Synthesis ‘the Old Way’ Months to Years
Data Synthesis Today 2 Minutes
Data Visualization: 2 Minutes GeoMapApp software: www.geomapapp.org
Sharing Research Data: USA “GAO recommends the agencies explore opportunities in the grants process to better ensure the availability of data to other researchers and determine if additional archiving strategies are warranted.” GAO Report #07-1172 September 28, 2007
Existing US Data Resourcesrelevant for MARGINS Science • Marine Geoscience Data System:hosts the MARGINS Data Portal • Geoinformatics for Geochemistry: hosts PetDB, SedDB, SESAR, EarthChem (links to GEOROC & NAVDAT) • NGDC: Marine geoscience data - mostly legacy programs • IRIS: Seismic network data and earthquake catalogs • UNAVCO: GPS data • GEON: Lidar data • SIO-GDC: hosts marine geoscience data from Scripps expeditions • WHOI: hosts data from vehicles of the NDSF
www.marine-geo.org www.geoinfogeochem.org
Data & IT • GEON • UNAVCO • USGS • IODP • ICDP • Pangaea • CoreWall • PaleoStrat • MetPetDB • LEPR Science GfG & MGDSCollaborations & Partnerships WHOI Scripps Ridge2K PetDB MARGINS Boston Univ Oregon State Boise State SedDB Texas A&M UTIG Seismic Reflection Field Data Center EarthChem University of Kansas Antarctic Multibeam SESAR Sample Registry Legacy NGDC University of NH
Services Data Access Education & Outreach
Scope of the MGDS • Metadata catalog: Central cruise catalog and data repository for all MARGINS programs- important goal is to preserve full data collection context for each expedition • Sensor Database: data documentation and access for multibeam and geophysical data from Palmer & Gould and MCS reflection data from Ewing & Langseth Global DEM: Synthesis of multibeam bathymetry into the Global Multi Resolution Topography - GMRT • MG&G Legacy data and derived data • Tools for data access: lower barrier to data access with tools tailored to science needs October 23-24, 2007
MARGINS Database • Provides access to expedition information & data for all MARGINS funded marine and some terrestrial programs • Diverse data collected during these programs hosted within MARGINS database: • swath bathymetry • gravity and magnetics • MCS reflection • water column data (BLISP, CTD) • side-scan sonar mapping data • rock and fluid sampling information • Database includes links to WHOI (near bottom camera), UTIG (processed MCS), IRIS (seismometer), UNAVCO (GPS)
MGDS Access Interfaces • Data Link (server side) • GeoMapApp (client side) • Web services • Access data hosted at distributed data repositories
Access to data at distributed data repositories Alvin and Jason2 near bottom photos
With bathymetry tiles exposed through a programmatic interface - can make use of GoogleEarth
GfG Program: Scope • PetDB, SedDB, EarthChemdata sets Build and provide access to integrated compilations of large volumes of geochemical data desktop access to the entire published geochemical literature within minutes • EarthChem Portal: Central access point to the broadest range of geochemical data in federated databases • SESAR Sample Registry: Provide global unique identifiers for samples; build global sample catalog
Database Features • Archive & serve integrated data sets of geochemical data (each individual value searchable) • Include complete metadata of samples and analytical procedures for searching and data evaluation • Offer interactive, dynamic user interfaces that allow extraction of any customized subset of the data • Support data analysis • Tools for data quality assessment & control. • Tools for visualization (map interfaces, plotting tools). • Integration with broader Geoscience data via interoperability & partnerships.
Ambiguous Sample Naming Sample names are duplicated. Sample names are modified or changed. Examples from the PetDB Database
International Geo Sample Number IGSN • SESAR serves as registry that provides & manages unique identifiers for samples • IGSN - International Geo Sample Number • Obtained upon submission of sample metadata (registration) • Implementation in sample collection & curation ongoing (IODP, core repositories) • Ca. 4 Mio. samples registered • System still under development
Improving Global Data Access “Building a Global Data Network for Studies of Earth Processes at the World’s Plate Boundaries” International Workshop, Kiel (Germany), May 2007. Attended by 71 people from 14 countries. Sponsored by the MARGINS, Ridge2000, InterMARGINS, InterRIDGE programs. Agreed on statements of principle and recommendations to address technical, procedural, and organizational issues of open global data sharing.
Workshop Recommendations • Science User Needs • Access to all data needed to reproduce scientific results • Access to multidisciplinary & integrated marine & terrestrial data • Data Documentation & Publication • Uniform best practices & standards for data acquisition, data submission to data centers & data publication • Easy procedures for metadata creation & data submission • Data & Metadata Interoperability • Minimize proliferation of metadata standards • Development of a data discovery service across distributed data resources • Opportunities & Obstacles for International Data Sharing • Leverage international bodies & programs (e.g. GEOSS, eGY, ICSU, IPY) • Establish dedicated task group & special interest groups to advance implementation of a global data network
Cyberinfrastructure Goal: A genuine infrastructure of highly reliable, widely accessible capabilities and services to support the entire range of scientific work. Geoinformatics = Cyberinfrastructure for the Geosciences
Infrastructure Components • Technological Infrastructure • Institutional & Management Models • Legal & Policy Framework • Financial Support • Cultural & Behavioral Changes
Seismic Reflection DMS UTIG (Lead) LEGACY NGDC/UNH AntarcticMBS MARGINS TAMU* Ridge2000 WHOI* MGDS