140 likes | 278 Views
Matthew Cechini Raytheon - EED ID: IN31C -07 . ISO 19115 Experiences in NASA’s Earth Observing System (EOS) ClearingHOuse (ECHO). Agenda. ECHO Metadata Overview Introduction Problem Space Solutions. ISO 19115 Lessons Learned Perceived Issues Gotchas Kudos Conclusion. Introduction.
E N D
Matthew Cechini Raytheon - EED ID: IN31C-07 ISO 19115 Experiences in NASA’s Earth Observing System (EOS) ClearingHOuse (ECHO)
Agenda • ECHO Metadata Overview • Introduction • Problem Space • Solutions • ISO 19115 Lessons Learned • Perceived Issues • Gotchas • Kudos • Conclusion
Introduction • Earth Observing System (EOS) ClearingHOuse (ECHO) • An integral component of metadata management within NASA’s Earth Observing System Data and Information System (EOSDIS) acting as the core metadata repository and providing a centralized mechanism for metadata and data discovery and retrieval. • How metadata is used by ECHO • Discovery • Presentation/Documentation • Interoperability • Validation • Metadata Format Landscape • Existing catalog utilizes ECHO format (based upon ECS data model). • Future science missions projected to provide ISO 19115 metadata.
Problem Space Data discovery and retrieval tenets: • There exists a set of users who will require the entire metadata record for advanced analysis. • There exists a set of ‘core’ metadata fields recommended for data discovery. • There exists a set of users who will require a ‘core’ set of metadata fields for discovery only. • There will never be a cessation of new formats or a total retirement of all old formats. • Users should be presented metadata in a consistent format of their choosing.
Solutions • ECHO’s metadata processing solution: • Identify a cross-format set of ‘core’ metadata fields for discovery. • Implement format-specific indexers to extract the ‘core’ metadata fields into an optimized query capability. • Archive the original metadata in its entirety for presentation to users requiring the full record. • Provide on-demand translation of ‘core’ metadata to any supported result format or standard. • ECHO’s usage of ISO 19115/19139 • Archive original metadata for documentation and advanced usage. • Extract ‘core’ metadata fields for data discovery. • Provide format translations from ISO to/from supported formats.
Agenda • ECHO Metadata Overview • Introduction • Problem Space • Solutions • ISO 19115 Lessons Learned • Perceived Issues • Gotchas • Kudos • Conclusion
Online Resources ISO 19115 - Perceived Issue • MimeType • The existing standard could be included, similar to how GML is incorporated, though maintained separately. • MimeType values facilitate automated access where different file types resuls in different workflows (e.g. displaying native jpg images or extracting from hdf). File extensions are not always indicative. • Type • Code List values promote interoperability, but potentially reduce the ability for intra-community customization. • A type attribute allows for more detailed identification for automated access (e.g. specific service protocols http://xml.opendap.org/ns/DAP/3.3# )
Services Resources ISO 19115 - Perceived Issue • Data Discovery • How are links to discovery services made available (e.g. data casting feeds or search endpoints)? • Endpoints may support multiple response formats, how would that be included? • Data Processing • Support for data processing links appears to be not supported. • Both series and dataset level metadata may have URLs to services that expose subsetting, projection, and other services. • Some service-specific information may be required and will need to be included in the metadata.
Hierarchical Keyword Structure ISO 19115 - Perceived Issue • Representation • Non-Standard Delimiters • A self-defining hierarchy could be introduced within the keyword structure allowing for customized keyword lists. • Automated Usage • Optional Fields • A flat representation of keyword structures that have optional levels may cause issues for automated keyword parsing. • Translation into a metadata format where hierarchy is expected may not be possible. <gmd:keyword> <gco:CharacterString>Earth Science > Oceans > Ocean Temperature > Sub-skin Sea Surface Temperature </gco:CharacterString> </gmd:keyword> <gmd:keyword> <gco:CharacterString>Earth Science | Oceans | Ocean Temperature | Sub-skin Sea Surface Temperature </gco:CharacterString> </gmd:keyword>
Spatial Representations ISO 19115 - Perceived Issue • Coordinate Systems • Cartesian vs. Geodetic • EX_GeographicBoundingBox does not specify a coordinate system. • Two-D Coordinate Systems • Unable to find where coordinate reference systems like WRS-2 and MODIS H/V tiling are a) defined and b) utilized. • Orbit Metadata • Series Level • Unable to find where series level orbit metadata is represented (e.g. swath width, period, inclination angle, etc…). • This information may be required for data discovery. • Dataset Level • Similar concern regarding placement of orbit metadata, again used for discovery (e.g. orbit number, crossing longitude, etc…)
Gotchas • Terminology • Natural difficulties reconciliing terminology between communities. • Dataset & Granules vs. Series & Dataset • Archive Center vs Custodian • Codelists are a double edged sword providing consistency but removing specificity and community vernacular. • Citation Overload • Contact information can be represented in numerous locations. • Potentially stale contact information may be difficult to track down • Combined Series & Dataset Metadata • Good Idea… Combining series and dataset metadata during presentation. • Bad Idea… Combining series and dataset metadata during archival.
Kudos • Citations • Thorough support for providing citations within the metadata. • Metadata Lineage • ISO lineage provides an excellent means to capture repeatable processing history information. • Distribution Information • Thorough support for online and offline access options including support for ordering.
Conclusion • ISO 19115 is on it’s way to becoming aviable metadata standard for metadata as a means of documentation. • ISO 19115 is a bit verbose for the pragmatic requirements of data discovery (specifically dataset level). • ISO 19115 lacks support for the growing presence of data processing services. • All metadata standards are expected to have issues and will improve over time.
http://xkcd.com/927/ Matthew.F.Cechini@nasa.gov Moscone South: IN41B-1406 - Dec. 8, 8:00am-12:20pm