330 likes | 479 Views
Improving long-term preservation of EOS data by independently mapping HDF4 data objects. The HDF Group. Mapping project team members. The HDF Group. NASA. Ruth Duerr & Luis Lopez(NSIDC ) Chris Lynnes (GES DISC). Ruth Aydt Mike Folk Joe Lee Elena Pourmal Binh-Minh Ribler
E N D
Improving long-term preservation of EOS data by independently mapping HDF4 data objects The HDF Group
Mapping project team members The HDF Group NASA Ruth Duerr & Luis Lopez(NSIDC) Chris Lynnes (GES DISC) • Ruth Aydt • Mike Folk • Joe Lee • Elena Pourmal • Binh-Minh Ribler • Muqun{Kent} Yang Raytheon • Evelyn Nakamura • many others Annual HDF Briefing to NASA
Recap • Problem • The complex byte layout of HDF files makes long-term readability of HDF data dependent on long-term availability of HDF software. • Solution • Create a map of the layout of data objects in an HDF file, allowing a simple reader to be written to access the data. • Implement tools to create layout maps for EOS data products. • Deploy tools at select EOS data centers. Annual HDF Briefing to NASA
HDF4 mapping workflow HDF4 File HDF4 Map File (XML document) h4mapwriter linked with HDF4 library Object Data Groups, Data Objects, Structural and Application Metadata; Locations of Object Data Readerprogram Annual HDF Briefing to NASA
Phase 1Build a prototype(completed in 2009) Annual HDF Briefing to NASA
Phase 2Productize HDF4 Mapping schema and tools for deployment Annual HDF Briefing to NASA
Phase 2 tasks • Investigate integration of mapping schema with existing standards • Determine HDF-EOS 2 requirements • Redesign and expand the XML schema • Implement production quality map writer • Develop demo map reader • Deploy tools at select NASA data centers Annual HDF Briefing to NASA
Task AInvestigate integration of mapping schema with existing standards Annual HDF Briefing to NASA
Investigate existing standards • Investigated: • METS, PREMIS, ESML, NcML, and CSML • Concluded: • Existing standards have different purposes than mapping schema • None meet all needs of mapping project • Develop new schema tailored to project goals • Harmonize with PREMIS • Leverage terminology and approaches from all • Status: • Need to write report • Need to include some PREMIS-like data such as HDF4 file size and possibly MD checksum Annual HDF Briefing to NASA
Task BDetermine HDF-EOS2 requirements Annual HDF Briefing to NASA
Background An HDF-EOS2 file is an HDF4 file, so one can create an HDF4 mapping file to archive the HDF-EOS2 file. However, for some HDF-EOS2 files, it may be extremely difficult to retrieve correct geo-location information from the mapping files. For those files, special HDF-EOS2 mapping files may be needed. Annual HDF Briefing to NASA
Categorize HDF-EOS2 data products • Created a data pool from NASA data centers • GES DISC, NSIDC, LAADS, LP DAAC • LaRC, PO.DAAC, GHRC, OBPG • Analyzed data and reported options for adding HDF-EOS2 contents to the mapping file • Conclusion: No special mapping for HDF-EOS2 needs to be done • However, the study uncovered some important shortcomings in certain HDF-EOS products Annual HDF Briefing to NASA
Status and Plans • Status: Complete • Detailed descriptions of sample data: • http://hdfeos.org/zoo/Data_Collection/index.php • Documents and reports at wiki: • http://wiki.hdfgroup.org/MappingPhase2_TaskB • Plans • We plan to recommend a future task in which these issues are made known to the project Annual HDF Briefing to NASA
Task CRedesign Schema Annual HDF Briefing to NASA
Design priorities and assumptions • Mapping files • Provide complete access to user-supplied content in NASA’s EOS binary HDF4 files • Have enough information to stand on their own • Be as simple as possible • Mapping schema • Describe the Mapping files • Used for validation and documentation • May not be available to target user Annual HDF Briefing to NASA
Status and Plans • Status • All HDF4 objects found in EOS products are now handled by the Mapping schema. • Plans • Complete schema elements for HDF4 file description information • File size, MD checksum (?), HDF4 library version stamp (?) • Finalize schema documentation • Address any additional HDF4 objects found during remainder of project, either by updating schema and map writer, or with follow-on proposal if substantial amount of effort required. Annual HDF Briefing to NASA
Task DImplement map Writer Annual HDF Briefing to NASA
Map Writer Requirements • Retrieve information needed from HDF4 file • Write out corresponding XML file • Quality requirements • Completeness • Don’t miss any objects in file • Report on objects or features not handled by the writer • Accuracy – don’t give wrong information • Readability – provide adequate instructions in the file Annual HDF Briefing to NASA
Activities • Implement functions to facilitate map creation • Develop writer requirements based on new XML schema and additional deployment needs • Implement new functions as needed • Include functions in library as appropriate • Implement writer: h4mapwriter • Interpret map requirements according to schema • Implement writer • Package for deployment • Support deployment Annual HDF Briefing to NASA
Status and Plans • Status • Implement functions to facilitate map creation • All functions implemented • Implement writer • Handles all objects • Available as alpha-2 release • Being tested by GES DISC, NSIDC, Raytheon • Plans • Functions to facilitate map creation • Include in future HDF4 releases • Writer • Finish HDF4 file description elements • Complete testing and documentation • Support deployment, fix bugs and add features as needed Annual HDF Briefing to NASA
Task EImplement demo reader Annual HDF Briefing to NASA
Demo Reader Requirements Multiplatform command line tool Easy to use clear arguments and output Must validate that objects in the mapping file are actually in the HDF4 file Developed in a well-supported high level language (python) Well documented Available as open source Annual HDF Briefing to NASA
Demo reader activities Develop requirements, based on new schema and identification of additional deployment needs. Design reader, based on requirements, and from a review of the prototype design. Implement and document reader. Test reader on EOS file “zoo” Deposit reader, documentation, and tests in open source repository, probably SourceForge. Annual HDF Briefing to NASA
Demo Reader Status • Status • Support provided so far for Vdata, SDS, Group, and Attribute • Current source code available at http://sourceforge.net/projects/hdf4mapreader/ • Documentation at http://hdf4mapreader.sourceforge.net/ • Plans • Add raster image (RIS) and palette support Annual HDF Briefing to NASA
Task GDeploy Annual HDF Briefing to NASA
Task G: Deploy • Begin in April 2011, complete in June • The HDF Group • Provide h4mapwriter map generation tool • Maintain tool during deployment and validation • Assist GES DISC, NSIDC, and Raytheon with deployment and validation • Raytheon • Validate HDF4 map software in anticipation of future deployment • GES DISC and NSIDC: see next slide Annual HDF Briefing to NASA
What about GES DISC and NSIDC? • Activities (formerly): • GES DISC • Incorporate into the existing archive ingest system • Manage the retrofit into existing metadata files • NSIDC • Support implementation in NSIDC’s ECS system • Other ESDCs • Encouraged to join in • But deployment to other centers expected subsequent to the project. • Ruth Duerr’s observation: • The task for NSIDC is to assist in the ECS implementation at NSIDC, which won't take place until 2012 • Task G only includes the work up to the handoff to ECS • Thus, what NSIDC does needs to extend after the period of performance of this award is over • How do we resolve that issue? Annual HDF Briefing to NASA
Beyond July 15 Annual HDF Briefing to NASA
Future work • NSIDC • assist in the ECS deployment at NSIDC • GES DISC: • ? • The HDF Group • Monitor deployment activities by Raytheon and others to identify • Unsupported objects and tags occurring in products • Software defects • Feature requests • As needed, fix defects, add features, and add support for new objects and tags • Address performance issues • Add h4mapwriter tool and supporting API to regular HDF4 testing regime • Perform other services in support of the software as needed • All • Perform post mortem and identify lessons learned • Write paper summarizing the project • Investigate HDF5 mapping Annual HDF Briefing to NASA
Acknowledgements This work was supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration. Annual HDF Briefing to NASA