230 likes | 292 Views
Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data. Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010. Nature of scientific data sets. Scientific data often in tables
E N D
Semantic annotation on the SONet and Semtools projects:Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010
Nature of scientific data sets • Scientific data often in tables • Tables consist of rows (records) and columns (attributes) • The association of specific columns together (tuple) in a scientific data set is often a non-normalized (materialized) view, with special meaning/use for researcher • Individual cells contain values that are measurements of characteristic of some thing
SONet/Semtools Semantic Approach • Data-> metadata-> annotations-> ontologies • Ontology: formal knowledge representation in OWL-DL • Hierarchical structure of concepts • Relationships can link concepts • Annotations link EML metadata elements to concepts in ontology thru Observation Ontology • EML metadata describe data and its structures
Linking data values to concepts • Extensible Observation Ontology (OBOE) • OBOE provides a high-level abstraction of scientific observations and measurements • Enables data (or metadata) structures to be linked to domain-specific ontology concepts • Can inter-relate values in a tuple • Provides clarification of semantics of data set as a whole, not just “independent” values
Concepts of Semantic Search • Annotations give metadata attributes semantic meaning w.r.t. an ontology • Enable structured search against annotations to increase precision • Enable ontological term expansion to increase recall • Precisely define a measured characteristic and the standard used to measure it via OBOE
Annotations • XML schema defines annotation properties • Namespaces to identify sources of terms • Search performed against annotations not the metadata itself • Returns metadata documents that are linked to the annotation • Reasoning (term expansion, consistency, etc.) through domain ontology
KNB metadata catalog • Stores EML (XML) and raw data objects • Extended to store Ontologies, domain and OBOE (OWL-DLs serialized in XML) • Extended to store Annotations (XML) • Jena to facilitate querying ontologies • Pellet to reason (consistency of ontologies; class subsumption)
OBOE Conceptual Model (OWL-DL) 0..* 1..1 Context Relationship hasContextRelationship 0..* hasContext 1..1 hasContextObservation 0..* 1..1 Observation Entity ofEntity 1..1 0..* hasMeasurement 1..1 0..* 0..* 1..1 Value Measurement Characteristic hasValue ofCharacteristic 0..* 1..1 usesStandard Standard
Annotation Examples (12/18/2009) Dataset Annotation OBOE Concepts Define uses terms from (view def.) Materialize instantiates observation-based representation of Query* OBOE Model (individuals/triples) * Conceptually, we want to query datasets via annotations
Annotation Examples <observation label="o1”> <entity id=”TemporalRange"/> <measurement label="m1”> <characteristic id=”Year"/> <standard id=”DateTime"/> </measurement> </observation> <observation label="o2"> <entity id=“Tree"/> <measurement label="m2" precision="0.1"> <characteristic id=”DBH"/> <standard id=”Centimeter"/> </measurement> <measurement label="m3"> <characteristic id=”TaxonomicTypeName"/> <standard id=”ITIS"/> </measurement> <measurement label="m4”> <characteristic id=”EntityName"/> <standard id=“LocalTreeNames"/> </measurement> <context observation="o1"> <relationship id=“Within"/> </context> </observation> <map attribute="yr" measurement="m1"/> <map attribute="diam" measurement="m2" if="diam ge 0"/> <map attribute="spec" measurement="m4"/> <map attribute="spp" measurement="m3" value="Picea rubens” if="spp eq 'piru'"/> <map attribute="spp" measurement="m3" value="Abies balsamea” if="spp eq 'abba'"/> observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within” map “yr" to “m1” map “diam” to “m2" if diam > 0 map “spec" to “m4” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” Annotation Syntax * Code exists to read/write annotations using this XML format
Annotation Examples Annotation Dataset observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within” map “yr" to “m1” map “dbh” to “m2" if dbh > 0 map “spec" to “m4” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” hasContext : Tempral Range : Obs : Obs : Tree : Meas : Meas : Meas : Meas : DBH : TaxN : EntN 35.8 Picea. 1 : Year 2007 : DateTime : Centim. : ITIS : LocTN. hasContext : Obs : Tempral Range : Obs : Tree : Meas : Meas : Meas : Meas : DBH : TaxN : EntN 36.2 Picea. 1 : Year 2007 : DateTime : Centim. : ITIS : LocTN. hasContext • Basic idea: go row-by-row through dataset, generating individuals/triples • “external” terms should have namespacing prefix URI : Tempral Range : Obs : Tree : Obs : Meas : Meas : Meas : Meas : DBH : TaxN : EntN 33.2 Abie. 2 : Year 2008 : DateTime : Centim. : ITIS : LocTN.
Annotation Examples Annotation Dataset observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within” map “yr" to “m1” map “dbh” to “m2" if dbh > 0 map “spec" to “m4” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” hasContext : Tempral Range : Obs : Obs : Tree : Meas : Meas : Meas : Meas : DBH : TaxN : EntN 35.8 Picea. 1 : Year 2007 : DateTime : Centim. : ITIS : LocTN. hasContext : Obs : Tempral Range : Obs : Tree : Meas : Meas : Meas : Meas : DBH : TaxN : EntN 36.2 Picea. 1 : Year 2007 : DateTime : Centim. : ITIS : LocTN. hasContext : Tempral Range : Obs : Tree : Obs • Same Trees!! (both have name = 1) • Same Year and year observation!! : Meas : Meas : Meas : Meas : DBH : TaxN : EntN 33.2 Abie. 2 : Year 2008 : DateTime : Centim. : ITIS : LocTN.
Annotation Examples Annotation Dataset observation "o1” distinct yes entity ”TemporalRange” measurement "m1” key yes characteristic ”Year” standard ”DateTime” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” key yes characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within” map “yr" to “m1” map “dbh” to “m2" if dbh > 0 map “spec" to “m4” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” hasContext : Tempral Range : Obs : Obs : Tree : Meas : Meas : Meas : Meas : DBH : TaxN : EntN 35.8 Picea. 1 : Year 2007 : DateTime : Centim. : ITIS : LocTN. hasContext : Tempral Range : Obs : Obs : Meas : Meas : Meas : Meas : DBH : TaxN : EntN 36.2 Picea. 1 : Year 2008 : Centim. : ITIS : LocTN. : DateTime Every observation has an implicit “distinct” attribute (set to “no”) … and every measurement has an implicit “key” attribute (set to “no”) : Tree : Obs : Meas : Meas : Meas : DBH : TaxN : EntN 33.2 Abie. 2 : Centim. : ITIS : LocTN.
Annotation Examples • Observation measurement keys • Like a primary key constraint • States that observation instances with the same measurement key values are of the same entity instance • Does not imply the same observation instance, unless the observation is declared distinct • All key measurements of an observation together form the primary key • Distinct observations • Only applies if at least one key measurement is defined • States that observation instances with the same entity instance are of the same observation instance
Annotation Examples Annotation Dataset observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context observation “o1” relationship “Within” map “plt" to “m1” map “dbh” to “m2” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” hasContext : Plot : Obs : Obs : Tree : Meas : Meas : Meas : DBH : TaxN 35.8 Picea. : EntN A : Nominal : Centim. : ITIS hasContext : Obs : Meas : Meas : DBH : TaxN 36.2 Picea. Here we don’t have unique ids for trees But, assume each spp name within a plot uniquely identifies a tree … i.e., at most one tree of a particular type was measured (possibly multiple times) in each plot : Centim. : ITIS : Plot : Obs : Obs : Meas : Meas : Meas : DBH : TaxN 33.2 Picea. : EntN B : Nominal : Centim. : ITIS
Annotation Examples Annotation Dataset observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context observation “o1” relationship “Within” map “plt" to “m1” map “dbh” to “m2” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” hasContext : Plot : Obs : Obs : Tree : Meas : Meas : Meas : DBH : TaxN 35.8 Picea. : EntN A : Nominal : Centim. : ITIS hasContext : Obs : Meas : Meas : DBH : TaxN 36.2 Picea. • The Tree entity instance should depend on the plot it is in!!! (context) : Centim. : ITIS : Plot : Obs : Obs : Meas : Meas : Meas : DBH : TaxN 33.2 Picea. : EntN B : Nominal : Centim. : ITIS
Annotation Examples Annotation Dataset observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal” observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context identifying yes observation “o1” relationship “Within” map “plt" to “m1” map “dbh” to “m2” map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea” hasContext : Plot : Obs : Obs : Tree : Meas : Meas : Meas : DBH : TaxN 35.8 Picea. : EntN A : Nominal : Centim. : ITIS hasContext : Obs : Meas : Meas : DBH : TaxN 36.2 Picea. : Centim. : ITIS Every context relationship has an “identifying” qualifier (set to “no”) Uniqueness within context observation Similar to a weak-entity constraint (ER) : Plot : Tree : Obs : Obs : Meas : Meas : Meas : DBH : TaxN 33.2 Picea. : EntN B : Nominal : Centim. : ITIS
Annotation Examples Representing instances … • Annotation(AnnotId, Resource) • Observation(ObsId, AnnotId, EntId) • Measurement(MeasId, ObsId, MeasType, Value) • Context(ObsId1, ObsId2, Rel) • Relationship(RelId, RelType) • Entity(EntId, EntType) This could be queried itself and/or mapped to triples Note that ObsIds are unique across annotations Context.ObsId’s must be for the same annotation * Simple relational schema for OBOE models (individuals/triples)
Ongoing Activities • Developing compatible domain ontologies (design patterns for use with observation ontology) • Scalability of materialization algorithm from annotations (data result sets) • Testing and developing capabilities motivated by Use Cases (coastal ecosystems and plant traits) • SONet and JWG-ODMS continue to meet and discuss
Acknowledgements: Shawn Bowers, Huiping Cao, SEEK KR/SMS working group, and all members of SONet and Semtools projects Thanks also to Chad Berkeley and Ben Leinfelder, project software engineers Work supported by National Science Foundation awards 0225674, 0225676, 0743429, 0733849, 0753144, 0630033