310 likes | 561 Views
Has Data Management Gone Mainstream? . Presented at the BEER Workshop Coconut Grove (Miami), Florida November 9, 2008 Robert C. Groman. Talk Overview. Has data management gone mainstream? “Data” is a plural noun = facts, statistics, or items of information.
E N D
Has Data Management Gone Mainstream? Presented at the BEER Workshop Coconut Grove (Miami), Florida November 9, 2008 Robert C. Groman BEER Workshop November 9, 2008
Talk Overview • Has data management gone mainstream? • “Data” is a plural noun = facts, statistics, or items of information. • Metadata = motherhood and apple pie • Accessing data: Is a picture worth a thousand bytes? • Data Interoperability BEER Workshop November 9, 2008
Purpose • Raise level of awareness (and appreciation) for data management • “Lighter and informative” • Want to use some formulas • Difference between an engineer and a mathematician BEER Workshop November 9, 2008
Venn Diagram:Data and Metadata All data and information (D) necessary to use the data. Data (d) Facts, statistics, or items of information Metadata (m) D ≠ m + d Set Theory BEER Workshop November 9, 2008
Probability of having all the necessary data and information necessary to reuse someone else's data. • Second order effects: • Length of cruise • Success of cruise • Participants • Immediate activity following the cruise BEER Workshop November 9, 2008
Theorems† • Theorem 1: The probability that all the necessary data and information are collected and preserved to allow another researcher to properly use your data is inversely proportional to the time since the data were collected. • Corollary: Unless data and information are collected and preserved during the experiment (cruise), subsequent researchers will have a difficult time using your data. • Theorem 2: The longer the time since the data were collected the less likely the data will ever be considered “final”. †Proofs are left to the reader as an exercise. BEER Workshop November 9, 2008
Seeing Versus Using Someone’s Data • Maybe you don’t want others to use your data. Hard to believe, but this does happen. For example: • I’m not done publishing my papers based on my data • My graduate student is almost done analyzing the data • It’s not final yet – no, but they still may be useful • My dog ate it (no, I haven’t heard this one yet.) • Old policies and practices about data archiving • New policies about data sharing, data publishing and data archiving • Web accessible • NSF mandate (It is for real this time.) • The sum is greater than its parts BEER Workshop November 9, 2008
The more people use your data the better they get. • Heisenberg Uncertainty Principal (HUP) does NOT seem to apply • If Δx and Δp are the uncertainties in the measurements of the position and momentum, then the product ΔxΔp is at least on the order of Planck's constant. • When measuring conjugate quantities, the product of their standard deviations must be at least h / 4π • Not to be confused with the term observer effect (OE) which refers to changes that the act of observing will make on the phenomenon being observed. BEER Workshop November 9, 2008
Biological and Chemical Oceanography Data Management OfficeBCO-DMO • NSF funded 3 year project to provide short and medium term data management, including web based access, to all NSF funded projects from their biological and chemical oceanographic programs • Large NSF projects are expected to have their own data management offices • Web site: http://www.bco-dmo.org/ BEER Workshop November 9, 2008
Data Stewardship • “a concern for creation and preservation of data and all intermediate phases - focuses …on the management of data over the long term” [Baker and Chandler, 2008]; • Data quality control; • Treatment of all information as data fosters data re-use; • Data that lack sufficient metadata has limited value beyond the research program for which they were collected; and • Metadata should include sufficient information to support discovery, value assessment, and accurate re-use of the data. BEER Workshop November 9, 2008
MapServer interface and interoperability enhancements • Provides access to geo-referenced scientific data and metadata • Presents distributed data sets in a unified way • Uses MapServer as the visualization application • Visualize data with graphics generated on-the-fly • Request custom subsets of measurements in a variety of file formats • Compare data from different sources BEER Workshop November 9, 2008
Interoperability • Ability to get someone else's data and use it on your system. (How easy is this really?) • True interoperability. Get someone else's data and use it directly in your application. Do the units match and do the data acquisition and processing steps match yours or are accounted for, including instrumentation differences? BEER Workshop November 9, 2008
JGOFS/GLOBEC Data Management System BEER Workshop November 9, 2008
http://globec.whoi.edu/map Skip BEER Workshop November 9, 2008
Cruise Tracks BEER Workshop November 9, 2008
Select 5 Cruises BEER Workshop November 9, 2008
Click on “Show Data” Button BEER Workshop November 9, 2008
Select CD data in EN307 BEER Workshop November 9, 2008
Shows stations and optional grid lines BEER Workshop November 9, 2008
EN307 graph it options BEER Workshop November 9, 2008
Depth versus salinity and versus temperature BEER Workshop November 9, 2008
Select another cruise: AL9906 BEER Workshop November 9, 2008
Select MOC1 data set BEER Workshop November 9, 2008
Map it options for abundances BEER Workshop November 9, 2008
Interoperability features (for free) BEER Workshop November 9, 2008
MapServer Supports Interoperability Features • Open Geospatial Consortium standards • Web Mapping Service (WMS), and • Show me the data • Web Feature Service (WFS) • Get me the data • Retains the functionality of the JGOFS/GLOBEC Data Management System • Download data as ASCII, CSV, Matlab, NetCDF BEER Workshop November 9, 2008
Related Activities • MMI – Marine Metadata Interoperability • “Promoting the exchange, integration and use of marine data through enhanced data publishing, discovery, documentation and accessibility." • UNOLS Subcommittee to Report on Best Practices for the Collection of Data and Metadata at Sea to Promote Public Dissemination • Too new to even have its own web site • The Working Group on Zooplankton Ecology (WGZE), with guidance from the Working Group on Marine Data Management (WGMDM), is providing these general metadata guidelines for plankton data collected and submitted to ICES. (2003) • Sensor Interoperability Metadata Workshop (2006) • ICES ASC 2006 and 2008 theme sessions on data management, data sharing and related topics • NOAA Coastal Services Center Data Transport Laboratory (DTL) • Integrated Ocean Observing System (IOOS) • Ocean.US data management and communications (DMAC) strategy • Gulf of Maine Ocean Data Partnership • Many, many more …. BEER Workshop November 9, 2008
Metadata Schema The print size is small to protect the innocent and guilty. BEER Workshop November 9, 2008
What is the difference between an engineer and a mathematician? BEER Workshop November 9, 2008
BEER Workshop November 9, 2008
References • Karen, S. Baker and Cynthia L. Chandler, Enabling long-term oceanographic research: Changing data practices, information management strategies and informatics, Deep-Sea Research II, 55 (2008), 2132-2142. BEER Workshop November 9, 2008