1 / 7

Strategies for Adding EML Support to the GCE Data Toolbox for Matlab

Strategies for Adding EML Support to the GCE Data Toolbox for Matlab. Wade Sheldon Georgia Coastal Ecosystems LTER (WWW: gce-lter.marsci.uga.edu/lter). Background. Needed universal solution for processing tabular data sets (majority of IM work) Goals: Import from various data sources

cecil
Download Presentation

Strategies for Adding EML Support to the GCE Data Toolbox for Matlab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Strategies for Adding EML Support to the GCE Data Toolbox for Matlab Wade Sheldon Georgia Coastal Ecosystems LTER (WWW: gce-lter.marsci.uga.edu/lter)

  2. Background • Needed universal solution for processing tabular data sets (majority of IM work) • Goals: • Import from various data sources • Standardize units, date formats, attribute names • Assign metadata descriptors • Validate/QAQC • Generate statistical summaries, plots, maps • Export to various data/metadata formats • Support sub-setting & queries, super-setting (unions/joins) • Support automation of all steps • Automatically capture metadata throughout interactive processing

  3. Background • Developed Matlab data structure specification for storing data table tightly coupled with metadata • Developed ‘Toolbox’ (function library) for working with data structures • Many roles in GCE IS: • Primary tool for acquisition, QAQC of data from monitoring network, PI submissions • Data/metadata packaging (linked to RDMS) • Data distribution (flexible formats) • New Role: Automated harvesting/processing/QC/web posting of remote data stores (USGS, NOAA) and post-processing of CSI arrays downloaded via modem • Began public distribution of toolbox in 2002 (primarily for end-user analysis of GCE data)

  4. Toolbox Metadata Standard • Full implementation of FLED (+ user-extensible content) • Attribute-level metadata managed with data • General documentation descriptors stored in simple array format (Category, Field, Value) – designed for pre-formatted metadata, but parseable/updateable • Simple user-editable style definition tables used to produce formatted ASCII metadata

  5. EML Differences • Higher granularity • Hierarchical structure (vs flatter 3-tier) • Different delineation of semantic/numerical attribute descriptors (much overlap, but different philosophy) • New unit dictionary requirements for validation contrary to units/unit conversion conventions (at odds with non-IM end-user focus of toolbox) • XML-based (requires extra steps for presentation)

  6. Strategy • Short term: develop XSLT to convert EML (primarily dataset, entity, attribute) to ASCII headers for importing metadata along with data • Medium term: switch to EML-oriented metadata schema (e.g. use similar arrays, but support direct eml schema mapping by using xpath syntax for category/field info) • Long term: add support for direct caching of EML docs, include native xml routines for syncing metadata during processing (requires more users adopt latest Matlab version - R13)

  7. Significance • Allow IM community take full advantage of these tools/capabilities for their own site’s data with minimal re-mastering (EML + ASCII/Matlab table) • Allow LTER IM community to showcase research-oriented, metadata-driven tools to bolster support for EML efforts immediately • If full EML support achieved, could become a useful mechanism for automatically producing EML-documented/validated data sets (datalogging -> harvest -> process -> QC -> dataset+EML -> validation)

More Related