480 likes | 647 Views
INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES. Mark Williams, University of Colorado. Forecasting. Reporting. Analysis. Done poorly. Integration. >>> Increasing value >>>. Data >>> Information >>> Insight. Distribution. Done poorly to moderately. Aggregation.
E N D
INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado
Forecasting Reporting Analysis Done poorly Integration >>> Increasing value >>> Data >>> Information >>> Insight Distribution Done poorly to moderately Aggregation Quality assurance Sometimes done well, by many groups,but could be vastly improved Collation Monitoring The water information value ladder Slide Courtesy CSIRO, BOM, WMO, Ilya, Dozier
Provenance and transparency
CZOs as platforms for research Integrating satellite & ground measurements with modeling CZO measurements provide the basis for advances in multiple Earth sciences CZOs are DATA-RICH places to develop & test Earth system models
Challenges to CZO Data Management Atmosphere Biosphere Hydrosphere Lithosphere Minutes Decades Millenia Eons Hillslope Catchment Watershed • Many Object & Data Types! • Diverse media • Sensor-based • Stationary • Mobile • Spectra/photos • Sample-based • Sub-samples • Preparations/Fractions • Numeric & Categorical
Sample Fractions for Soil GeochemistryAdapting SESAR IGSN for CZO Ziplock (~500g) Bulk soil horizon or depth increment glass vial: <2mm fines dry sieved EA-IRMS FTIR SA DRY SIEVE 2 mm <2mm SA WET SIEVE, or DENSITY, or SETTLING (with or without sonication) glass vial: sand + small detritus XRD CEC The choice here is important. Do we want aggregates or not? EA-IRMS FTIR SPEX mill ICP-MS after Li-borate fusion >2mm: glass vial: plant detritus milled (1) Pick out plant roots & detritus, rinse with DI water, oven dry, mill (SPEX?) SA glass vial: silt + clay EA-IRMS FTIR XRD CEC EA-IRMS FTIR SPEX mill glass vial: pebbles hard ground EA-IRMS FTIR (2) Remaining pebbles & rocks, hard grind ICP-MS after Li-borate fusion ICP-MS after Li-borate fusion XRD? Al Can (~70 g) For Gamma Counting 137Cs Extractions Dithionite-Citrate extraction Na pyrophosphate extraction Ammonium oxalate extraction Christiana River CZO example
Overall Approach • Do not reinvent the wheel! Build on • CUAHSI HIS, EarthChemDB, LTER, etc • Consistent data presentation on web • Metadata • Data values • Central data system for data discovery • Harvested by SDSC (pull system)
CZO data principles and policies • Each CZO will operate and be responsible for its own local data management system for collecting, organizing, quality controlling and publishing data through its web site. • Different philosophy than CUAHSI ODM • Each CZO is master of it’s own data • We don’t care what goes on under the hood • Each site uses it’s own protocols, data bases, etc • Allows CZO to honor site legacy data
CZO data principles and policies • Each CZO publish’s its data on the web in ascii format with sufficient metadata so that the data can be unambiguously interpreted • Metadata follows a proscribed format • Data managers just need rules to follow • Easy to harvest by central portal • Makes it simple at the site level so scientists comply • Addresses the chokepoint that is getting data/metadata from the scientists to data managers
Data Management Team • David Tarboton, Utah State. PI on the CUAHSI Hydrologic Information System (HIS) • Kerstin Lehnert, Columbia. PI on EarthChemDB • Ilya Zaslavsky, Lead, SDSC Spatial Information Systems Lab; hosts CUAHSI HIS. • Mark Williams, CU-Boulder. PI Niwot Ridge LTER • Anthony Aufdenkampe, co-I Christiana River Basin CZO
Integrated CZO data system Synthesizing information management experience and software from CZO partners and neighboring earth science projects into a standards-based system for publishing environmental data to emphasize the “critical zone” nature of our shared data sets
CZO Data Publication System Local CZO DB Local CZO DB Local CZO DB CZO Data Repository and Indexing (CZO Central) External cross-project registries CZOData Products Standard CZO Services DataNet, NEON CZO Desktop Applications Harvester Ontology Archive Shared vocabularies CZO Metadata CZO Web-based Data Discovery System CZO Desktop Matlab R Standard CZO data display formats Excel Web site Web site Web site ArcGIS Modeling Spatial, hydrologic, geophysical, geochemical, imagery, spectral…
Data Publication Process(for hydrologic time series) CZO Desktop ODM WaterML Service CZO Display File CZO Central Catalog Catalog Search Service Raw Display file metadata Is registered with the CZO data portal, to assure original data is discoverable and downloadable. WFS Service Is registered with the CZO data portal OGC WFS Service Broader internet community accessing data using standard protocols. OGC CSW Service CZO Portal utilizes the OGC CSW (catalog services for the web)
CZO data interoperability: what does it mean System components Levels of interoperability Data discovery portal • Find and download CZO resources: files and file collections, services, documents – organized by CZO thematic category and by type • Data available in compatible semantics: ontologies, controlled vocabularies • Data available via the same service interfaces (e.g. WFS, SOS) but different information models • Compatibility at the level of domain information models and databases Different types of data collected by CZOs Shared vocabulariesand ontology management Wider variety of data Deeper integration Serviceadministration (CZOCentral) Well-understood data with formal information models available via standard services CZOdesktop, others
Data Catalogue • Biogeochemistry: Including: anything on (Carbon), N (Nitrogen), P (Phosphorus) nutrients, microbes • Climatology/Meteorology: Including: Met tower, temps, snow • Ecology/Biology: Including: microbial, land use • Geology/Chronology: Including: geologic, descriptions of rocks-mineralogy, CRN ages/rates • Geomorphology: Including: topography, chronological data, sediment flux, fracture space • Geophysics: Including: seismic refraction etc • Geospatial: Including: GIS/RS, imagery, geologic map, Gordon Gulch and GLV camera's
Water Chemistry • Header group (/doc): - Title, Abstract, Investigator, Variable names, Keywords, Methods, Instrument, Citation, Publications, Comments • Header group, column information • COL1. Label=ValueAttribue, value=site • COL2. label=ValueAttribute, value=DateTime, UTCOffset=-7, Timezone=MST, format=”YYYYMMDD hh:mm” • COL3. label=ValueAttribute, value=pH, units=pH, SampleMedium=water, units=pH units, missing value indicator=, ,methods=method1, etc • Header group, column (series) defaults that apply to all columns (eg site below) • Data (/data) • GREENLAKE4,820311,6.4,18,88.51,0.40,,114.77,24.68,21.75,10.23,25.389,,58.296,83.200,,,,,,,,,,,,,,,,,, • GREENLAKE4,820422,5.7,18,90.15,2.00,,99.80,24.68,17.40,12.79,9.591,,72.870,44.928,,,,,,,,,,,,,,,,,, • Automatically harvested using WaterML and EML • ASCII format, metadata and comma-deliminated data
CZO Data Management Web Administration Interface CZO data managers use this web-based system to register display files, edit service metadata, initiate data retrieval, validate the data against shared vocabularies, and update hydrologic time series services The administration system will be extended to geochemical samples and other data http://central.criticalzone.org
Services edited and validated by CZO data managers Editable service definitions and management interface for each CZO data service Data managers control how their data is annotated. Ingesting of Display files is triggered on the server by the Data manager. Display file ingestion log
CZO Central Catalog Statistics, March 24, 2011(time series services only)
New Development: Central CZO Data Discovery Portal Registered data are organized by CZO thematic categories
Display files from CZO web sites are registered to the data discovery portal automatically In addition, display files of known types are expressed as data services, which are also registered in the portal The portal is CSW-compliant (CSW=Catalog Services for the Web): can be federated with other catalogs including data.gov Supports search by location, resource type, thematic category, keywords, plus full-text abstract search Federation with CUAHSI HydroCatalog, to allow search of hydrologic data from ~70 networks
Shared Vocabulary Local CZO DB Local CZO DB Local CZO DB CZO Data Repository and Indexing (CZO Central) External cross-project registries CZOData Products Shared Vocabulary DataNet CZO Desktop Applications Harvester Ontology Archive Shared vocabularies CZO Metadata CZO Web-based Data Discovery System CZO Desktop Matlab R Standard CZO data display formats Excel Web site Web site Web site ArcGIS Modeling Spatial, hydrologic, geophysical, geochemical, imagery, spectral…
CZO Shared Vocabulary System Purpose: To promote the consistent use of terminology. http://sv.critialzone.org Builds on CUAHSI HIS
Data Managers and SV CSV Data File CSV Data File Local CZO Website Observation Database SV Database ❷ Unknown Term Email Data Managers ❶ Request Term Web Page ❸ XML SV List
Preferred vocabularies. Moderators to be designated by CZO with expertise in each category • Variable names (extended from CUAHSI HIS) • Units (extended from CUAHSI HIS) (e.g. m, g/L) • Value type (from CUAHSI HIS) (e.g. Field observation, derived value, model output) • Sample type (from CUAHSI HIS) (e.g. stream water, ground water, rock, soil) • Data type (from CUAHSI HIS) (e.g. average over interval, cumulative, continuous, sporadic) • Data level (based on Ameriflux list) (e.g. level 0=raw data, level 4 = fully infilled and quality controlled) • Spatial references ( extensible based on EPSG) (e.g. NAD 1983, WGS84, UTM zone 11) • KEY: CZO expands ODM controlled vocabularies to a larger audience using “preferred vocabularies”
Methods Major problem for metadata Solution: lookup table that is part of the controlled vocabulary Three parts: sample collection, sample preparation, analytical procedure Up and running, needs moderators
CZO Spatial Data Local CZO DB Local CZO DB Local CZO DB CZO Data Repository and Indexing (CZO Central) Standard CZO Services Spatial Data CZO Desktop Applications Harvester Ontology Archive Shared vocabularies CZO Metadata CZO Web-based Data Discovery System CZO Desktop Matlab R Standard CZO data display formats Excel Web site Web site Web site ArcGIS Modeling Spatial, hydrologic, geophysical, geochemical, imagery, spectral…
Metadata and Spatial View • Metadata • Multi File control • Spatial Extent • Ex: LiDAR flights, transects, etc. • Point data (collected at particular location). • Uses Google Maps API • KML functionality Spatial View Guo lab, UC Merced
Local CZO DB Local CZO DB Local CZO DB Geochemical Samples (based on CZEN) Depth-resolved geochemistry EarthChem Data Engine & Portal Geochemical web services, EarthChemDB CZO Desktop Applications Harvester IGSN management Archive Shared vocabularies Metadata CZO Web-based Geochemical DB CZODesktop Matlab R Standard CZO data display formats Excel Web site Web site Web site ArcGIS Modeling Geochemical samples
Sample 1 Preparat./Treatment 2 Chemical Phys. Minr Others Personcontributor Meta-Data Publication Var-Lookup/Unit Country/State Landuse/Veg. Project Sources Methods Loc_info/Climate Geo-Info SMPLTime Series Precision Lab-Info Sub-sample Main Data Sub-smpl 2 ... Lab Analysis Sub-smpl n Location(Watershed) Sampling Site(Soil / Water) Sample(Layer/Depth) Sub-Sample Preparation/Treatment Analysis Data CZO Chemistry Database Conceptual Model – (CZOCHEMDB) Penn State lead
Progress • Database is accessible at www.czo.psu.edu • PSU CZO students and post-docs have used template for data • entry • Susan Melzar (Colorado State) has used template and data • has been entered into database • Published data from Muhs et al. (2001), Harden 1987, White et • al. (2008) • Current version contains 1391 records, representing 17,604 data values • Ran webinar August 24th to show database capabilities and usage of data entry template • 15 participated with representation from all 6 CZO’s • User guide is in progress
Integration withEarthChemDB EarthChem Portal EarthChem XML DB Topical Data Collections Geochemical Resource Library GEOROC External Databases datasets (original data & derived products) GCDM DB Metadata catalog NAVDAT USGS GfG Data Entry User Submission Kerstin Lehnert 35
EarthChem Portal GEOROC NAVDAT USGS Others PetDB XML XML XML XML XML EarthChem Data Engine Database Partner databases encode their data & metadata in XML and send them to the EarthChem portal database in Kansas. Queries submitted at the EarthChem portal search the contents of the EarthChem Portal Database. EarthChem Data Engine Search & Visualization Similar to our ODM hydrology portal 36
INTERNATIONAL GEOSAMPLE NUMBER D3-1 ? • Purpose: Unique identification for samples and related sampling features in the Earth Sciences • To allow unambiguous referencing of data to samples in publications and data systems • To allow tracking samples through repositories & labs • To allow integration of distributed data for samples
Sample 1 Sample 1 Sample 2 Sample 2 Sample 3 Sample 3 Parent Child Parent Child Child Parent Core Section 1 Fossil separate Sample 1 Microprobe mount IGSN:XXX0065B3 Sample 2 Core Section 2 Rock powder Core IGSN:ABC0L653X Mineral conc. IGSN:ABC078HGB IGSN:XXX000120 IGSN:ABC0L53NW IGSN:XXX07ST4K Leachate IGSN:ABC0L98SW Core Section 3 IGSN:XXX9K23G6 IGSN:XYZ0G693M Geoinformatics for Geochemistry
IGSN International Organization Managing Agent: SESAR ExoPlanet (invented example) Near Space Observatory (invented example) Registrar IEDA USGS Geoscience Australia CZO ICDP Registration Agents: Registrants: Analytical Lab Repository Investigator
ADAPTING IGSN for CZO Register any type of sample: pedons, hand specimens, mineral concentrates, etc. … Register any type of material: soil, rock, sediment, fluid, gas, bio …. Register ‘sample-related features’: sites, wells, cores, dredges … Register relations (parent – children): e.g. site pedon mineral
Exploring A More General Data Model: ODM 2.0 To achieve interoperability between EarthCHEM, CUAHSI ODM, LTER EML Better support for samples and unique identifiers (IGSN/SESAR) Extensibility to table attributes Better annotation and provenance Enable integrated web service based publication of a broader class of CZO data
ODM 2.0 – Field Sensor Extension to support field sensor deployments and in situ observations Sensor deployment details Attributes of sensor Data series from sensor
ODM 2.0 – Provenance and Annotations Extensions Better support for storing provenance of observational data
General Extensibility Provides capability to record information (add fields) in tables that was not anticipated a-priori
Web-based User Access CZO Web Discovery CZO Desktop Other client systems GeoChemDB Search EarthChem Portal EarthChemXML CZO-Services Geochem Services (IEDA) USGS NAVDAT GEOROC EarthChemXML CZO-Central GeoChemDB [ODM 2.0] Geochemical database IEDA Data Publication Service (DataCite) GfG Data Validation & Ingest CZO Data Display Format IEDA Long-Term Archiving Service CZchemDB Sample Registration SESAR
Where we are today • Each site has a data manager • Data sets are posted to the web • consistent metadata and ascii format in progress • We’ve prototyped harvesting data and posting to a central data portal • Shared vocabulary system in place • Developed protocol for unique sample ID • Partnering with EarthChemDB • Expanding ODM to become more general • Way beyond what I thought possible
Work plan for next two years • Extending the CZO data publication model to geochemical and GIS data; then to other types of data • towards deeper interoperability • Integration based on service and information model standards (WaterML, EarthChemXML, EML, OGC services) • Requirements gathering from all CZOs, data modeling, display file format specification, services specification, development and validation • Upgrade to WaterML 2 once approved as international standard (~Q3, 2011) • Registering more hydrologic time series data via CZO Central • Regularly harvesting registered files and updating CZO services; keeping provenance information • Enhancing parameter-based search across CZOs, with a shared parameter ontology • Making CZO central data system more robust • Currently a single server with 24/7 monitoring; need redundant setup • Enhancing role of Data Managers