120 likes | 254 Views
SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping. Session: Technologies, Reasoning, and Annotation Methods of the Semantics for Biodiversity Symposium at the 2013 TDWG conference. Patrice Seyed, Katherine Chastain, Brendan Ashby,
E N D
SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics for Biodiversity Symposium at the 2013 TDWG conference Patrice Seyed, Katherine Chastain, Brendan Ashby, Evan Patton, Tim Lebo, and Deborah McGuinness (presented by Cynthia Parr)
Introduction • Challenges of enabling search and discovery of scientific data. • Semantic web technologies + Linked Data (LD) a medium for meeting these challenges • 2 Major Obstacles: • The process of translating tabular data and domain knowledge sources into a linked data format still has its difficulties, based on existing tools. • The notion of building an IT infrastructure that relies heavily on linked data can be perceived as a risky proposition due to immaturity of current LD management tools.
SemantEco Annotator • Mitigates both obstacles • To address #1, plays the role of translator • Converting tabular data into RDF • Leveraging OWL ontologies and vocabularies • Resulting enriched RDF data can be used immediately within RDF stores / hosted as LD. • To address #2, plays the role of a semantic mapper • column headers -> OWL properties • Column value typing -> OWL classes or datatypes • Mappings are serialized as RDF, can be used for • RDF/XML to XML Schema via XSLT for use in non-linked data environments (e.g., SBC-LTER) • Clarifying or extending the schema of their data • Enabling optimized semantic search • RDF based annotation (e.g., Open Annotation Model) • Services both LD or non-LD IT environments • Provides the architects of non-LD environments the ability to “future proof” and migrate to LD at their own pace.
SemantEco Annotator • A web application that a user visits in a web browser, loads a CSV-delimited file • The ontology selector menu to select hard-coded ontologies (e.g., OBO-E, SWEET, ENVO) or enter in a URI that is a URL that resolves to an RDF graph for vocabulary selection. • Provides advanced manipulation features such as column based translation, and aggregating columns along implicit entity representations • Recently to convert eBird data + eBirdtaxonomy into RDF, which is available now in our SemantEco Discovery and Search Portal, alongside water quality data, to enable a researcher to identify potential trends between water quality and organism counts.
Mappings Example (in RDF/Turtle) prefix ov: <http://open.vocab.org/terms/> prefix conversion: <http://purl.org/twc/vocab/conversion/> prefix geonames: <http://www.mindswap.org/2003/owl/geo/geoFeatures20040307.owl#> _:c1 rdf:type conversion:EnhancementConversionProcess; _:c1 conversion:enhance _:c2, _:c3 _:c2 ov:csvCol http://base.org/source/SSS/dataset/DDD/version/VVV/input/1.csv#col3; ov:csvHeader "Lake Name"@en; conversion:label "Lake Name"@en; conversion:range rdfs:Resource; conversion:equivalent_property prov:atLocation ; conversion:range_name “Lake”@en . :_c3 conversion:class_name “Lake”@en; conversion:subclass_of geonames:GeographicFeature .
Translated Data (in RDF/Turtle) prefix prov: <http://www.w3.org/ns/prov> b:thing_2 void:inDataset <http://purl.org/twc/semantgeo/source/a/dataset/b/version/2> ; prov:atLocation <http://purl.org/twc/semantgeo/source/a/dataset/b/typed/Big_Moose> ; e1:accession_code_sample "9446846" ; e1:date "30-Jun-94" ; e1:z_max_m "22.8" ; e1:sample_z_m "6" ; e1:nh4_mg_l "0.03" ; e1:no3_mg_l "1.3" ; ov:csvRow "2"^^xsd:integer . <http://purl.org/twc/semantgeo/source/a/dataset/b/typed/Big_Moose> a prov:Entity
SPARQL CONSTRUCT(to refactor Mapping as an Annotation) prefix oa: <http://www.w3.org/ns/oa#> CONSTRUCT { //open annotation style (http://www.openannotation.org/spec/core/) _:x a oa:Annotation ; oa:target?colNum; oa:body ?property ; oa:body?typing ; oa:motivatedByoa:tagging . ?property a owl:ObjectProperty . ?typing a owl:Class . } WHERE { ?cprdf:typeconversion:EnhancementConversionProcess; ?cpconversion:enhance ?en1, ?en2 . ?en1 ov:csvCol?colNum; ov:csvHeader ?colHeader conversion:range_name?className conversion:equivalent_property?property . ?en2 conversion:class_name ?className conversion:subclass_of?typing. } (?colNum has convention http://base.org/source/SSS/dataset/DDD/version/VVV/input/1.csv#col1)
“Okay, so I get annotations out, and I can do whatever I want with that -- but what could I possibly want?” • We've already done it for translating tabular to RDF linked data. • The current RDF output of the annotator can be mapped to other forms (OpenAnnotation Model) • Annotate “legacy” stuff (e.g., XML) to facilitate semantic mappings among them. • Can extend it to annotation images, etc., as well
Future Work • Automatic mappings directed to a particular graph closed under a predicate/object pair, use of OWL domain and range restriction axioms to guide the user in vocabulary selection decisions • Use of OWL class definitions to enable a top-down approach for modeling their data • Ontology extraction to complement and enable reasoning alongside the generated RDF • Architecting a platform for better management of linked data, within which the Annotator plays a vital role.
SemantEco Annotator Quick Look (YouTube Video) http://bit.ly/17VEfSp 4:40 minute duration