350 likes | 479 Views
Data Lifecycle Workshop. Overview - R. Duerr. Agenda. Intro to topics Data Stewardship - Al Fleig HDF Maps - C. Lynnes HDF archive format - R. Duerr Object identifiers paper - R. Duerr Discussion and Work plan development. HDF Archive Format. Project Goals.
E N D
Data Lifecycle Workshop Overview - R. Duerr
Agenda • Intro to topics • Data Stewardship - Al Fleig • HDF Maps - C. Lynnes • HDF archive format - R. Duerr • Object identifiers paper - R. Duerr • Discussion and Work plan development 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
HDF Archive Format 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
Project Goals • Prototype development of Archive Information Packages for HDF data: • For entire data sets • For individual “granules” • Test usability of digital library standards with geospatial data 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
Original NOAA SDS Program Plan CDM/NetCDF4 NOAA:NMMR FGDC NOAA:CLASS ECS to FGDC HDF5-AIP NSIDC/ECSMetadata NetCDF4 / HDF5 Data METS ECS to METS NSIDC/ECS HDF4-data H4toH5 NetCDF4/HDF5-data 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 5
Current Program Plan ISO-19115 CDM/NetCDF4 ECS to METS (Data Set) HDF5-AIP NSIDC/ECSMetadata ECS to METS (Granule) NetCDF4 / HDF5 Data METS NSIDC/ ECS HDF4-data H4toH5 NetCDF4/HDF5-data 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
HDF5 File Level Archive Information Packages Data file HDF5 METS Metadata file Primary SchemaExtension Schema |<mets> |---<dmdSec>----------------<ISO 19115> |---<amdSec>--------------|--<techMD> | |--<rightsMD> PREMIS | |--<sourceMD> |----<fileGrp> |----<structMap> HDF5 AIP Components http://www.hdfgroup.uiuc.edu/papers/papers/AIP/HDF5_AIP_White_Paper.pdf 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
Data Set Level Archive Information Package METS Metadata file Primary SchemaExtension Schema |<mets> |---<dmdSec>----------------<ISO 19115> |---<amdSec>--------------|--<techMD> | |--<rightsMD> PREMIS | |--<sourceMD> |----<fileGrp> |----<structMap> HDF- AIP Contextual Infomation HDF- AIP Contextual Infomation HDF- AIP Contextual Infomation HDF- AIP Contextual Infomation HDF- AIP Contextual Infomation 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
Contextual Information: • Instrument/sensor characteristics including pre-flight or pre-operational performance measurements (e.g., spectral response, noise characteristics, etc.) • Instrument/sensor calibration data and method • Processing algorithms and their scientific basis, including complete description of any sampling or mapping algorithm used in creation of the product (e.g., contained in peer-reviewed papers, in some cases supplemented by thematic information introducing the data set or derived product) • Complete information on any ancillary data or other data sets used in generation or calibration of the data set or derived product Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 9
Contextual Information (continued): • Processing history including versions of processing source code corresponding to versions of the data set or derived product held in the archive • Quality assessment information • Validation record, including identification of validation data sets • Data structure and format, with definition of all parameters and fields • In the case of earth based data, station location and any changes in location, instrumentation, controlling agency, surrounding land use and other factors which could influence the long-term record • A bibliography of pertinent Technical Notes and articles, including refereed publications reporting on research using the data set • Information received back from users of the data set or product Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 10
Backup Materials - PREMIS & METS 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
Metadata Standards - PREMIS • Provide a core preservation metadata set with broad applicability across the digital preservation community • Developed by an OCLC and RLG sponsored international working group • Representatives from libraries, museums, archives, government, and the private sector. • Based on the OAIS reference model 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
Metadata Standards - PREMIS • Maintained by the Library of Congress • Editorial board with international membership • User community consulted on changes through the PREMIS Implementers Group • Version 1 was released in June 2005 • Version 2 was just released 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
PREMIS - Entity-Relationship Diagram Intellectual Entities “an action that involves atleast one object or agentknown to the preservationrepository” e.g., created, archived,migrated Rights “a person, organization, orsoftware program associatedwith preservation events inthe life of an object”e.g., Dr. Spock donated it “a discrete unit of information in digital form” For example, a data file “a coherent set of contentthat is reasonablydescribed as a unit” For example, a web site, data set or collection of data sets Objects Agents “assertions of one or more rights or permissionspertaining to an objector an agent” e.g., copywrite notice, legalstatute, deposit agreement Events 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
PREMIS - Types of Objects • Representation - “the set of files needed for a complete and reasonable rendition of an Intellectual Entity” • File • Bitstream - “contiguous or non-contiguous data within a file that has meaningful common properties for preservation purposes” 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
Metadata Standards - METS • Metadata Encoding and Transmission Standard • An initiative of the Digital Library Federation • Based on the Making of America II project 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
METS - What’s Its Purpose? • Provides the means to convey the metadata necessary for • management of digital objects within a repository • exchange of objects between repositories (or between repositories and their users) • Designed to facilitate • shared development of information management tools/services • interoperable exchange of digital materials 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
METS - What’s its status? • Version 1.6 was released in Sept. 2007 • Maintained by the Library of Congress • International Editorial Board • NISO registration as of 2006 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
Backup Materials - MODIS Contextual Info 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
Instrument/sensor characteristics Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 20
Instrument/sensor calibration data and method Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 21
Processing Algorithms & Scientific Basis Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 22
Ancillary Data Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 23
Processing History including Source Code Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 24
Quality Assessment Information Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 25
Validation Information Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 26
Other Factors that can Influence the Record Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 27
Bibliography Presented by R. Duerr at the Summer Institute on Data Curation, June 2-5, 2008 Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 28
Information from users • Data Errors found • Quality updates • Things that need further explanation • Metadata updates/additions? • Community contributed metadata???? 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
Backup Materials - HDF AIP Challenges 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group
Challenges to do the conversion 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group • Retrieve geo-location information from HDF-EOS2 data • Conform to NetCDF4 data model in the existing H4toH5 conversion tool • …… 10/16/2008 HDF and HDF-EOS Workshop XII 31
Grid lacks geolocation fields Use predefined projections Geographic Sinusoidal Polar stereographic … New converter creates geolocation fields HDF-EOS2 API GDij2ll() Challenges: Handle EOS - Grid Data [4][12] Lon[12] Data [4][8] Lon[4][8] Geographic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sinusoidal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 10/16/2008 HDF and HDF-EOS Workshop XII 32
The size of geolocation fields can be different from data fields New converter has to handle geolocation fields correctly Challenges: Handle EOS - Swath . . . . . . . . . . . . . . . . . . . . . . . . . . . 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 10/16/2008 HDF and HDF-EOS Workshop XII 33
Follow CF conventions Create two variables: NewLongitude and NewLatitude Add to the data field an attribute coordinates=“NewLongitudeNewLatitude” Keep the original Latitude and Longitude Challenges in conforming to NetCDF4 Longitude field has two columns Data field has three columns New longitude has three columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group 10/16/2008 HDF and HDF-EOS Workshop XII 34
Object Identifiers Paper 7th Joint ESDSWG meeting, October 22, Philadelphia, PA Data Lifecycle Workshop sponsored by the Technology Infusion Working Group