130 likes | 275 Views
Key issues in Data Interoperability. R. Duerr, Lead Tech Infusion Data Stewardship Focus Group. What is Data Interoperability?.
E N D
Key issues in Data Interoperability R. Duerr, Lead Tech Infusion Data Stewardship Focus Group
What is Data Interoperability? “Data interoperability is the ability for a data user to work with (view, analyze, process, etc.) a data provider's science data or model output “transparently,” without having to reformat the data, write special tools to read or extract the data, or rely on specific proprietary software.” • From “Practical Data Interoperability for Earth Scientists,” available at http://www.esdswg.com/techinfusion/infusion_news/081027-datainterop Key Issues in Data Interoperability, presented by R. Duerr October 20, 2009 at the ESDSWG Meeting in Wilmington Delaware
Why Bother? • Inter-disciplinary science • Foster collaborations • Enable model/data integration and model coupling • Make data analysis easier • Producing interoperable data is easier than it looks! • Because it maximizes NASA’s return on investment! Key Issues in Data Interoperability, presented by R. Duerr October 20, 2009 at the ESDSWG Meeting in Wilmington Delaware
Role of Data Centers & Science Teams in Interoperability • Data Center Role: • To Provide the Means to Access the Data both Now and in the Future • Science Team Role: • To Make Data Usable both Now and in the Future Key Issues in Data Interoperability, presented by R. Duerr October 20, 2009 at the ESDSWG Meeting in Wilmington Delaware
Making Data Usable • Documentation (i.e., metadata) • Formats Key Issues in Data Interoperability, presented by R. Duerr October 20, 2009 at the ESDSWG Meeting in Wilmington Delaware
Good Metadata is Key • For making data useful now and in the long-term • So users can discover data of interest • So users can understand valid uses of the data and make intelligent choices about which data to use • So users can use the data • For allowing others to reproduce results Key Issues in Data Interoperability, presented by R. Duerr October 20, 2009 at the ESDSWG Meeting in Wilmington Delaware
What is Good Metadata? Information that allows its associated data to be independently understandable by the user community. - OAIS reference model Key Issues in Data Interoperability, presented by R. Duerr October 20, 2009 at the ESDSWG Meeting in Wilmington Delaware
What Kinds of Information are Needed? • Information that allows users to understand the data (syntax and semantics) • Information about the pedigree and history of the data • Persistent, unambiguous identification information • Information that allows users to be certain that the data has not been changed in undocumented ways • Information that allows users to understand the context within which the data were Key Issues in Data Interoperability, presented by R. Duerr October 20, 2009 at the ESDSWG Meeting in Wilmington Delaware
Information About the Data • “Instrument/sensor characteristics including pre-flight or pre-operational performance measurements (e.g., spectral response, noise characteristics) • Instrument/sensor calibration data and method • Processing algorithms and their scientific basis, including complete description of any sample or mapping algorithm used in the creation of the product (e.g., contained in peer reviewed papers, in some cases supplemented by thematic information introducing the data set or product to scientists unfamiliar with it) • Complete information on any ancillary data or other data sets used in generation or calibration of the data set or derived product” Key Issues in Data Interoperability, presented by R. Duerr October 20, 2009 at the ESDSWG Meeting in Wilmington Delaware
Information About the Data (Cont.) • “Processing history including version of processing source code corresponding to versions of the data set or derived product • Quality assessment information • Validation record, including identification of validation data sets • Data structure and format, with definition of all parameters and fields • In the case of earth-based data, station location and any changes in location, instrumentation, controlling agency, surrounding land use and other factors which could influence the long-term record” Global Change Science Requirements Key Issues in Data Interoperability, presented by R. Duerr October 20, 2009 at the ESDSWG Meeting in Wilmington Delaware
Information About the Data (cont.) • “A bibliography of pertinent Technical Notes and articles, including refereed publications reporting on research using the data set • Information received back from users of the data set or product” Global Change Science Requirements Key Issues in Data Interoperability, presented by R. Duerr October 20, 2009 at the ESDSWG Meeting in Wilmington Delaware
Formats • Data • Pick a standard (eg., netCDF/CF-1, HDF and HDF-EOS) • Think about access patterns Key Issues in Data Interoperability, presented by R. Duerr October 20, 2009 at the ESDSWG Meeting in Wilmington Delaware
Formats (continued) • Metadata • Standards • ISO 19115 for science discovery, assessment, and use information • PREMIS for non-science preservation components • Open Provenance Model to record modifications that occur on the way to the user • XML for import/export Key Issues in Data Interoperability, presented by R. Duerr October 20, 2009 at the ESDSWG Meeting in Wilmington Delaware