1 / 31

Mark Schildhauer Director of Computing, NCEAS

Opportunities for earth science data interoperability through coordinated semantic development, using a shared model for observations and measurements. Mark Schildhauer Director of Computing, NCEAS Logan Utah: CUAHSI Conference on Hydrologic Data and Information Systems June 2011. SONet.

earl
Download Presentation

Mark Schildhauer Director of Computing, NCEAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Opportunities for earth science data interoperability through coordinated semantic development, using a shared model for observations and measurements Mark Schildhauer Director of Computing, NCEAS Logan Utah: CUAHSI Conference on Hydrologic Data and Information Systems June 2011 SONet

  2. Integrative Environmental Research Analyses require a wide rangeof data • Broad scales: geospatial, temporal, biological (micro-macro) • Diverse topics: abiotic and biotic phenomena • Predicting impact of invasive insect species on crop production • Documenting effects of climate change on forest composition • Large amounts of relevant data… • E.g., over 25,000 data sets are available in the Knowledge Network for Biocomplexity repository (KNB– http://knb.ecoinormatic.org) • But researchers struggle to … • Discover relevant datasets for a study • And combine these into an integrated product to analyze

  3. for Discovery, Access, Interpretation, Re-use:SHARED KNOWLEDGE MODELS • Need consistency and rigor in terminology • Standardized protocols, methods when possible • Interoperability (syntax) • Comparability (semantics) • Minimally, need a “shared community vocabulary” • For hydrologists--- WATERML? • For broader, integrative environmental science--- ?

  4. SHARED KNOWLEDGE MODELS • metadata and keywords are good start, but not enough: ambiguous, idiosyncratic, hard to parse • controlled vocabularies: an improvement, but can do more with today’s technology

  5. SHARED KNOWLEDGE MODELS • Ontologies provide a “shared vocabulary” • Common “external” definitions (namespaces) • for explicating relationships among terms • describing data schemas (observations) • for machine-assisted discovery, reasoning, integration • Standard technologies for creating and operating on ontologies: • Syntaxes: RDF, SKOS, OWL • FOSS applications and frameworks: Jena, Protégé • Standard Reasoners: Pellet, FaCT++, Racer

  6. Another Opportunity: Observational data Environmental and earth science data often consists of “observations” • Data sets are often stored in tables (e.g., flat files, spreadsheets) • Represent collections of associated measurements • Highly heterogeneous (format, content, semantics) • (cell) Values represents measurements

  7. Examples of “raw” observational data

  8. Several prospective observation models…

  9. Observational Data Models • High degree of similarity across independently derived models • Opportunity to enableenhanced data interoperability and uniform access • Domain-neutral “foundational” template • Abstracts away underlying format issues • Domain ontologies “extend” core concepts, to formalize semantics of terms used to describe measurements

  10. Observational Data Model • Implemented as an OWL-DL ontology • Provides basic concepts for describing observations • Specific “extension points” for domain-specific terms Context * ObservedEntity * Observation Entity 1..1 * 1..1 1..1 * Value Measurement * + precision : decimal + method : anyType 1..1 * Characteristic * * 0..1 0..1 Protocol Standard

  11. Observational Data Model Observations are of entities (e.g., River, Water, Sample, …) • An observation can have multiple measurements • Each measurement is taken of the observed entity Context * ObservedEntity * Observation Entity 1..1 * 1..1 1..1 * Value Measurement * + precision : decimal + method : anyType 1..1 * Characteristic * * 0..1 0..1 Protocol Standard

  12. Observational Data Model A measurement consists of • The characteristic measured (e.g., Ammonium concentration) • The standard used (e.g., unit, coding scheme) • The measurement protocol • The measurement value Context * ObservedEntity * Observation Entity 1..1 * 1..1 1..1 * Value Measurement * + precision : decimal + method : anyType 1..1 * Characteristic * * 0..1 0..1 Protocol Standard

  13. Observational Data Model Observations can have context • E.g. geographic, temporal, or biotic/abiotic environment in which some measurement was taken • Context is an observation too (entity + characteristic) • Context is transitive Context * ObservedEntity * Observation Entity 1..1 * 1..1 1..1 * Value Measurement * + precision : decimal + method : anyType 1..1 * Characteristic * * 0..1 0..1 Protocol Standard

  14. Similarities among Observational Data Models OGC’s Observations and Measurements (O&M) ObservationContext relatedContextObservation FeatureOfInterest ofFeature carrierOfCharacteristic OM_Observation forProperty ObservedProperty usesProcedure hasResult OM_Process Result

  15. Similarities among Observational Data Models SEEK/Semtools Extensible Observation Ontology (OBOE) Context (other Observation) hasContext Entity Observation ofEntity hasMeasurement hasCharacteristic hasValue (a) Dataset Measurement ofCharacteristic Characteristic usesProtocol hasPrecision usesStandard Precision Protocol Standard (b) Semantic annotation to dataset (a)

  16. Seronto basic classes: Similarities among Observational Data Models

  17. SHARED KNOWLEDGE MODELS NSF INTEROP program: foster communication among domains to enable greater interoperability • Scientific Observations Network, SONet • Many earth and life science domains participating • Advanced conceptual modeling • Unifying abstraction of ‘observation’ • Semantic web & ontologies • Domain scientists & knowledge engineers SONet

  18. Developing a core model (SONet project) Identify the key observational models in the earth and environmental sciences Are these various observational models easily reconciled and/or harmonized? Are there special capabilities and features enabled by some observational approaches? What services should be developed around these observational models?

  19. OBOE O&M Similarities among Observational Data Models FeatureOfInterest Entity ObservedProperty Characteristic OM_Observation Measurement OM_Process Protocol Standard Value Precision Result Context ObservationContext (b) Semantic annotation to dataset (a)

  20. SONet/Semtools Semantic Approach • Data-> metadata-> annotations-> ontologies • Annotations link EML metadata elements to concepts in ontology thru Observation Ontology • EML metadata describe data and its structures

  21. Linking data values to concepts through observations • Link data (or metadata) through observational data model to terms from domain-specific ontologies • Context can inter-relate values in a tuple • Can provide clarification of semantics of data set as a whole, not just “independent”measurements

  22. Semantic annotation Attribute mappings

  23. How to use observational data models…

  24. linking observational data models to data…

  25. Special Mojo of OWL Ontologies • Class hierarchies • Parent, sibling, child class relationships • Object properties • to relate instances between classes • reflexive, symmetric, transitive • Specify domain and ranges Contained in • Datatype properties • to relate instances to values • Cardinality • Polyhierarchies

  26. Special Mojo of OWL Ontologies • Reasoning offers axioms such as: • Disjointness: e.g. can’t be both X and Y • inSediment AND inWaterColumn • Equivalence (classes) or Same_as (instances) • Synonymy across namespaces • Properties for mereology • Composite of • Contained in • Connected to • Reasoner can infer relationships, determine inconsistencies in assertions

  27. Special Mojo of Observations • Enables faceted discovery along entity & characteristic hierarchies • Economical use of concepts: don’t need to have a “red-colored eye”, or “red-colored wing”– instead re-use concept of “red” with variety of entities • Express whether observations taken from the same instance or not (tuple explication) • E.g. Multiple chemical concentrations measured from single water sample • Use of equivalence class (measurement types) to apply to realized measurements in data

  28. Ontology Design Pattern

  29. Ontology Design Pattern ThesauForm: LaPorte, Huguenot & Garnier TraitNet: Bunker, Ahrestani, Naeem

  30. This material is based upon work supported by the National Science Foundation under Grant Numbers 0743429, 0753144. Acknowledgements Mark Schildhauer*, Matthew B. Jones, Ben Leinfelder: NCEAS, Santa Barbara CA, USA Luis Bermudez: Open Geospatial Consortium Inc., Wayland MA, USA Shawn Bowers: Gonzaga University, Spokane WA, USA Phillip C. Dibner: OGCii, Berkeley CA, USA CorinnaGries: University of Wisconsin, Madison WI, USA Deborah L. McGuinness: Rensselaer Polytechnic Institute, Troy NY, USA Margaret O’Brien: UCSB, Santa Barbara CA, USA Huiping Cao: New Mexico State University, Las Cruces NM, USA Simon J.D. Cox: Earth Science & Resource Engrg, CSIRO, Bentley WA, AUS Steve Kelling, Carl Lagoze: Cornell University, Ithaca NY, USA Hilmar Lapp: NESCent, Durham NC, USA Joshua Madin: Macquarie University, Sydney NSW, AUS * presenter SONet

  31. FIN “How many fingers, Winston?” Orwell, 1984

More Related