1 / 38

DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS. Kerstin Lehnert. Data from Samples. Distributed data acquisition Different labs/researchers analyze the same sample or subsamples of it. Distributed data publication Different data for the same sample are published in different papers.

saburo
Download Presentation

DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS Kerstin Lehnert

  2. Data from Samples • Distributed data acquisition • Different labs/researchers analyze the same sample or subsamples of it. • Distributed data publication • Different data for the same sample are published in different papers. • Distributed data archiving • Data for the same sample are kept in different data systems. • Integrated data access required to maximize utility.

  3. Geochemical Data • diverse • hundreds of parameters • thousands of materials • vary with space and time over a range of more than ten orders of magnitude • complex • mostly sample-based with complex relations among samples & subsamples • distributed data acquisition(one sample analyzed in different labs by different researchers at different times) • Idiosyncratic data acquisition methods

  4. Geoinformatics for Geochemistry • DATABASES • thematic geochemical databases (PetDB, SedDB, VentDB) • DATA REPOSITORY • Geochemical Resource Library • REGISTRIES • System for Earth Sample Registration SESAR • IEDA Data Publication Agent of the STD-DOI system (DataCite®) • GeoPass: single sign-on authentication system • DATA ACCESS & ANALYSIS TOOLS • GfG user interfaces • EarthChem Data Engine (Portal)

  5. GfG Architecture EarthChem Portal EarthChem XML DB Topical Data Collections Geochemical Resource Library GEOROC External Databases datasets (original data & derived products) GCDM DB Metadata catalog NAVDAT USGS GfG Data Entry User Submission

  6. GeoChemical Data Model publication data source sample analysis feature of interest observed value collection, geospatial method/DQ material preparation, obs. point

  7. Geospatial Geographical coordinates Geographical names Collection Sampling technique Field program Description & Age Classification Texture Alteration Age Data Quality Technique Instrument Laboratory Precision Reference material measurements Correction procedures Metadata

  8. Standards for Data Access & Integration • WMS, WFS • For visualization tools • OAI-PMH • For joint data inventories • EarthChemML • For integration across geochemical data systems • For interoperability with other systems

  9. IEDA System-wide Inventory Inventory Expedition Metadata Reference Metadata Dataset Metadata Geospatial Metadata RSS feed DOI Registration SESAR MGDS EarthChem GRL Geochem DBs  Chemical Data Cruise Info  Object Registration   Object Metadata

  10. EarthChem Portal GEOROC NAVDAT USGS Others PetDB XML XML XML XML XML EarthChem Data Engine Database Partner databases encode their data & metadata in XML and send them to the EarthChem portal database in Kansas. Queries submitted at the EarthChem portal search the contents of the EarthChem Portal Database. EarthChem Data Engine Search & Visualization

  11. Access Levels

  12. EarthChemML

  13. EarthChem Repository: user submission • need tools that are easy to use and support the data flow from lab to publication • ideally, represent ‘pipelines’ for data capture early in the data acquisition process • tools need to include data validation and DQC procedures • offer citable data publication • need data policies

  14. IEDA data publication service

  15. STD-DOIs • The STD-DOI metadata are mainly Dublin Core elements, plus data specific elements. • The metadata transmitted to the National Library via web service (HTTP/SOAP) and incorporated into the library catalogue. • The metadata may contain references to other objects (DOI, IGSN, ...): • Element <RelatedIdentifier> • isCited, isParent, isChild, isDuplicate, …

  16. STD-DOIs • The element <relatedIdentifier> can be used to point to other electronic objects: • Point to the literature where the data set is interpreted. • Point to samples, from which the data were derived. • Point to other datasets that belong to the same collection of datasets. • These links can be used by machines (e.g. data portals) to make search suggestions and thus aid discovery of data, literature and samples, or other added value services.

  17. STD-DOI System Architecture

  18. Data DOIs

  19. Information Discovery Link to publication Citation of data IGSN points to sample

  20. The International GeoSample Number

  21. Ambiguous Sample Naming Examples from the PetDB Database Sample names are duplicated. Sample names are modified or changed.

  22. Provides & manages unique identifiers for samples • IGSN - International Geo Sample Number • Assigned upon registration of sample metadata • Catalogs & archives sample metadata • Access to sample metadata via web site & web services • Long-term preservation of metadata • Link to sample archives • Facilitates links to data • IGSN will be incorporated into persistent resolvable GUIDs

  23. International GeoSample NumberA Global Unique Identifier for Earth Samples • Strict syntax (9 digits, alphanumeric) • First three characters are unique user code (registered with SESAR) • Last 6 characters are random numbers + letters • Allows 2,176,782,336 sample identifiers per registrant • Does not replace personal or institutional names. • Applied to samples & sub-samples • system tracks relations IGSN:SIO8JH3M4  Name space www.geosamples.org

  24. Sample 1 Sample 1 Sample 2 Sample 2 Sample 3 Sample 3 Parent Child Parent Child Child Parent Core Section 1 Fossil separate Sample 1 Microprobe mount IGSN:XXX0065B3 Sample 2 Core Section 2 Rock powder Core IGSN:ABC0L653X Mineral conc. IGSN:ABC078HGB IGSN:ABC0L53NW IGSN:XXX000120 IGSN:XXX07ST4K Leachate IGSN:ABC0L98SW Core Section 3 IGSN:XXX9K23G6 IGSN:XYZ0G693M Geoinformatics for Geochemistry

  25. Sample Types • “Sampling events” such as holes, cores, dredges, stratigraphic sections • “Individual samples”: specimens rocks, minerals, fossils, fluid samples, precipitates, synthetic material, etc. • “Sub-samples” of any of above: processed samples such as mineral or fossil separates, leachates, thin sections, etc.

  26. Sample Registration Spreadsheet forms for batch loading SESAR Web Site Interoperability (web services)

  27. Implementation Challenges • Diversity of users • Large sampling campaigns (IODP, ICDP, ECS) • Repositories • Data systems • Individual investigators • Diversity of sample types • Integration into existing policies, procedures, data systems • International scope • Connectivity in the field

  28. Solutions • Schema improvements • Web-service based registration from client data systems • Distributed system of registration nodes (Trusted Agents) • Handle service for IGSNs (persistent, resolvable) • http://dx.doi.org/18.2539/IGSN.SIO001234 • Tools to facilitate registration • iSESAR (registration via iPhone) • eCollections (personal sample management) • webCollections (hosting services for repositories) • IGSN International Consortium

More Related