200 likes | 336 Views
Ideas to Improve Semantics for CUAHSI Controlled Vocabularies. Gary Berg-Cross SOCoP Executive Secretary CO-PI Spatial Ontology Community of Practice INTEROP Grant Co-Chair RDA WG on Data Foundations and Terminology Presented at 2013 CUAHSI Conference on Hydroinformatics and Modeling
E N D
Ideas to Improve Semantics for CUAHSI Controlled Vocabularies Gary Berg-Cross SOCoPExecutive Secretary CO-PI Spatial Ontology Community of Practice INTEROP Grant Co-Chair RDA WG on Data Foundations and Terminology Presented at2013 CUAHSI Conference on Hydroinformatics and Modeling July 17 – 19, 2013, Utah State University, Logan UTAH
Outline of Talk • “Vocabulary”/”Ontology” Overview: • Leverage Controlled Vocabulary & work from BioMedarea • Threeideas for improvement • Internal & External Audits - Examples • Judging Vocabulary qualities using an audit of existing work. • Incrementally adding Basic Semantics – Better Semantic Relations • Adding Semantics via Design Pattern Schema • Spatial Semantics & Semantic Trajectory example • Final Thoughts
Controlled Vocabulary (CV) • “Temp” =“ Temperature” = “Water Temperature” • MYOCARDIAL Infarction synonymous with heart attack • Name Equivalent Semantics- meaning terms refer to same concept • A consensus, standardized set of terms used to refer to concepts • Term equivalence is important • “mg/l” has synonym of “milligrams per liter” In 2004 269,864 classes, named by 407,510 names Jan. 31, 2013 Concepts: ~300,000 active Terms: ~1.1M active “descriptions” Available at http://www.ihtsdo.org/ • Water has salinity • County has city • River has tributary …different meaning of Has
Useful Comparison to BioMed Work to Standardize Vocabularies CLINICAL QUALITY MEASURES & Medicare & Medicaid incentive payments for adopting certified EHR technology and use it to achieve specified objectives Olivier Bodenreiderolivier@nlm.nih.gov “Quality assurance of biomedical ontologies” talk at Ontology Summit, 2013
Quick Look at Multiple Views - Variables, Tags & Ontology Concepts ODM uses RDB structure to Integrate files & handle heterogeneity, Good MD attributes -Limited semantics Classifier type structures –for Navigation, tagging & keyword search HydroTagger HasConstituent Diverse Classes? HasConcentration Classifier type structures Connect to variable terms Navigation 1220Deca_Chloro_PCB5
1. The Flavor of Quality Audits of Vocabularies Types of Intrinsic/Internal to Vocabulary Analysis • Lexical – consider separating types of modifiers likedissolved, suspended, total from core “chemicals” concepts (GALEN, a BioMed ontology, does this) • “Dissolved” and “suspended” are Features of some mass of chemical/element while “total” is a qualifier of them • A modifier like TOTAL is applied to dissolved, suspended or # of organisms in sample. • Structural classification/hierarchy not every level used in same way in current CUAHSI CV/Ontology • E.g. Storet WQX Domain has enormous list of Characteristic(s) which include very different things like 1-Naphthalenamine & Oxygen that the ontology helps organize (More on Next Slide) • Redundancy • In Spreadsheet there are two place levels for “acidity” • 122 acidity 2 (use at this level different than lower level) • 2277 Acidity 9999 (this is the real one for data) • Missing Concepts • Semantic Analysis - Missing or Inaccurate Relations • Compliance with ontological principles etc.
Organization of Features • Similar physical items like volume @level 2 & severity @level 4 • Physical>Volume • Physical>Water>Water, descriptive>Severity • Both are characteristic/feature properties, like temperature or biomass. Is another organization useful to help handle heterogeneity? More relation types here.
Better Conceptualization of Properties Organize Properties like size as a physical qualities -inheres in a physical object. Include measured properties like stream flow, level, pollutants, evapotranspiration etc. • Currently we have them at many levels • E.g. 2291 Major, bulk properties 4 hasLayer ….. Grams /cm3 Water Density Unit Water Density Water Body hasDensity Unit hasConstituent hasFeature hasUnit HasFeature IsA hasValue Area Real Number Area Quantity Chesapeake Bay Sq Miles hasQuantity hasUnit • External Audit • For connecting to Chem/BioChem ontologies there might be sub-categories of Physical Features for elements – optical, hardness, color • See Dumontier Lab ontologies to represent bio-scientific concepts and relations. • http://dumontierlab.com/?page=ontologies
2. Adding Better Semantic Relations/Properties (External Audit) Data models & SKOS offer some relations, but they are limited with some relations embedded in Variable attributes or Var names. SKOS is more useful for terms than concepts Consider Irreflexive, Anti-symmetric& Transitive constructs that captures common understanding. Observation –Streams flow into rivers etc. • Property “flows-into” is irreflexive • any one river or stream cannot flow into itself as a loop • “flows-into” is also anti-symmetric • if one river flows into the second, the second one can’t flow into the first. • Transitive property for Regions means that the subRegionOf property between Regions is transitive • <owl:TransitivePropertyrdf:ID="subRegionOf"> <rdfs:domainrdf:resource="#Region"/> <rdfs:rangerdf:resource="#Region"/> </owl:TransitiveProperty> If Logan, Cache County and Utah are regions, and Logan is a subRegion of Cache County , Cache County is a subRegion of Utah, then Logan is also a subRegion of Utah.
Organizing Relations - Three Kinds of “Structure” Relations in GeoSPARQL • X has material constituent Y only if Y is tangible and pervasive in X • Great Salt Lake has-constitutent salty water. fishing zone has-depth with average value x What does ``X has Y” Mean? Gulf of Mexico has-part gulf fishing zone which has-volume y which is-inside Gulf pollution zone Zone A has area Z……...is-inside Gulf…..has-constituent-nitrogen
3. Adding Relations Incrementally: Richer Schemata & Reusable Patterns or salty, acidic…. River, sub-surface water…. or height, salinity, acidity…. Simple Feature-State Model (from GRAIL) becomes a richer schema Every River is a Water Body described by a path, made of a mass of water & has parts source and mouth……
Ontology Design Patterns (ODPs) of Semantic Trajectory Hydro Observations as Annotations • ODPs (aka microtheories) small, modular, & coherent schemas like Temperature. • Relatively autonomous but conceivably composable with other schemas. • E.g. compose a Semantic Trajectory Pattern • Trajectories/spatial paths/segments • Point Of Interest (POI)- observation area etc. • Environmental Observations fit into this schema. • Fixes may be hydrometric feature observations & at some PoI(and offset Fix) for some point or period of time denoting important activities and/or decision points, that researchers may be interested in labeling and classifying. • Observations including timeseries sets might be applied to something like streamflow or temperature plots or a pollution plume • You may query Schema : • “Show locations within Gulf of Mexico fishing area with colored dissolved organic matter” Hydro Obs/Device Hydro Var & attr/data or value type of Interest Paths & POIs Have Geometries including Polygon Areas Hydro Object or moving device A Geo-Ontology Design Pattern for Semantic Trajectories COSIT 2013: YingjieHuet al.
Wrapping Up • The 3 things discussed here –audits, standard relations, schema – are possible paths to improved semantics for Hydro and related vocabularies • Work can leverage existing efforts • Lots of work in BioMed on structures and processes & audits • Methods to build ODP for general and specific use • DOLCE ROCKS Ontology - Integrates DOLCE + GeoSciMLorSSWOBoyanBrodaric & TorstenHahmann. • Work might be focused by a set of requirements and Use Cases Work supported by National Science Foundation under Grant No. 0955816
Thank You Questions? For information on SOCoP free workshops on ontology building see http://ontolog.cim3.net/cgi-bin/wiki.pl?SocopWorkshops/Socop2012Workshop & VoCamps at http://vocamp.org/wiki/Main_Page#Previous_VoCamps
“When” Time, T t A data value vi (s,t) “Where” s Space, S Vi “What” Variables, V Useful Schema - Content Ontology Design Patterns (ODPs) –Semantic Trajectory Pattern Example • ODPs (aka microtheories) small, modular, & coherent schemas like Temperature. • Relatively autonomous but conceivably composable with other schemas. • E.g. Trajectories/spatial paths, Point Of Interest (POI)- observation area. • Semantic Trajectory example • Indexed by Space-Time-Variable dimensions • When we annotate path points of interest (aka Fix) & object motion it is called a Semantic trajectory ODPs developed at GeoVoCampSB2012 & DaytonGeoVocamp2012 Zhixian Yan. Towards Semantic Trajectory Data Analysis: A Conceptual and Computational Approach. VLDB 2009.
In ODM MCV too Audit of Coverage - Anything Missing? • Omissions • In HydroTagger water acidity missing variables? • less coverage than alkalinity in http://hiscentral.cuahsi.orgtool • Sub-surface water missing variables compared to surface water etc. • Missing axioms to clarify things, like what causes or influences what • Missing primitives to connect things etc. ODM MCV Acid neutralizing capacity (biochem?) Acidity, CO2 acidity Acidity, hot Acidity, mineral acidity Acidity, total acidity How about metal acidity?
External Audits -Simple Example Body of Water Ontology From RPI work From SWEET New sub-class uses IntersectionOf for definition with restricted measures. <owl:Restriction> <owl:onProperty rdf:resource="&pol;hasMeasurement"/> <owl:someValuesFrom rdf:resource="#WaterMeasurement"/> </owl:Restriction> escience.rpi.edu/ontology/
What does ``X has Y” Mean? Distinguish ideas of regions, material, possesses, part, component What do we mean when we say X has-region Y? Salt lake has-region Antelope Island, Gulf of Mexico has-region hypoxic zone X has-region Y if • Y is a region of space defined in relation to X • We associate regions (e.g. Antelope Island) with measures such as length, area, or volume X has material constituent Y only if Y is tangible and pervasive in X • Great Salt Lake has-constitutent salty water. X possesses/characterisitized-by Y ( example, lake possesses temperature gradient) X has-Part Y X has element/component Y - Chem
Even Areas like RX with no Hierarchy have defined Conceptual Relations