370 likes | 470 Views
SEEK Semantic Mediation. Shawn Bowers Bertram Ludäscher e-Science Centre, May 11-14, 2004,. Outline. The Sparrow Toolkit Semantic Registration Ontology-Driven Structural Transformation. Outline. The Sparrow Toolkit Semantic Registration Ontology-Driven Structural Transformation.
E N D
SEEK Semantic Mediation Shawn Bowers Bertram Ludäscher e-Science Centre, May 11-14, 2004,
Outline • The Sparrow Toolkit • Semantic Registration • Ontology-Driven Structural Transformation
Outline • The Sparrow Toolkit • Semantic Registration • Ontology-Driven Structural Transformation
Semantic Mediation in SEEK: Our focus Resource Discovery • Ontology-driven tools to help search for datasets and services using semantic descriptions … Data Transformation • Determine and execute mappings to compose services and bind data to services Data Integration • Provide reconciled, uniform access to multiple datasets “Semantic” Workflow Analysis • Verify semantic correctness, accumulate semantic information, and provide workflow planning/suggestion services … the future
The Sparrow Toolkit: Vision Lightweight Languages and command-line-style services to support mediation • Syntax and language conversion • DL, FOL, OWL, RDF, … • Reasoning • subsumption, classification, consistency, satisifiability, datatypes, instance classification, … • Display utilities • hierarchies, OO/ER style models, OWL DLs? • Query • Query answering, semantic query rewriting, semantic registration, integration, … Logic-based implementation (Prolog)
Outline • The Sparrow Toolkit • Semantic Registration • Ontology-Driven Structural Transformation
Adding semantics to EML: Observations The finer grain the annotation, the more opportunity for discovery, integration, and transformation … The coarser grain the annotation, the harder it is to do useful operations; unless your ontology is very deep deep maximal ontology/annotation leverage ontology depth shallow course fine annotation granularity
Semantic Registration (SSDBM’04) By annotation granularity, we mean: • Resource-Level “Metadata” • Attribute Level (the attribute itself) • Attribute Level (as a collection-value) • Attribute Level (as independent values) • Attribute Groups (as a collection-value or independent values) • Filtered values (e.g., SQL where-clause) • Specific value annotations (as a mapping function or stated by-hand) Often, integration and transformation require very detailed annotations
Some Examples (arguments against concepts-as-labels) r(…, lt, ln, …) sem(lt) == latitude sem(ln) == longitude Question: What do these annotations mean? • The name “lt” itself refers to latitude? • The set of values in the column taken as a whole make up a latitude (like coverage) • Each individual value in the column denotes a separate latitude (Is it a latitude though? Or just a coded rep.?) We want to avoid these ambiguous anntotations … often
Some Examples (still not enough) r(…, lt, ln, …) sem(lt) == values represent latitude sem(ln) == values represent longitude More problems: How do I know lt and ln go together to form a location, for example, … Location lat lon Latitude Longitude
Some Examples (still not enough) r(…, lt, ln, lt-end, ln-end, …) sem(lt) == values represent latitude sem(ln) == values represent longitude sem(lt-end) == values represent latitude sem(ln-end) == values represent longitude Which lat goes with which lon? Location lat lon Latitude Longitude
Some Examples (still not enough) r(…, lt, ln, lt-end, ln-end, …) sem(lt, ln) == values represent location and lat leads to semval(lt) and lon leads to semval(ln) ** sem(lt, ln) == values represent location sem(lt) == values represent latitude sem(ln) == values represent longitude sem(lt, ln) == values represent location and … sem(lt-end) == values represent latitude sem(ln-end) == values represent longitude What if we want to integratewith another dataset withtwo lat/lons? What do we do? Location lat lon Latitude Longitude * We could infer the lat and lon roles here; in general, I don’t think we can infer roles as such…
Some Examples (still not enough) r(…, lt, ln, lt-end, ln-end, …) sem(lt, ln, lt-end, ln-end) === values represent transect and start leads to semval(lt, ln) and end leads to semval(lt-end, ln-end) sem(lt, ln) == values represent location and … sem(lt) == values represent latitude sem(ln) == values represent longitude sem(lt, ln) == values represent location and … sem(lt-end) == values represent latitude sem(ln-end) == values represent longitude So, even in verysimple cases,annotationscan become complex… start end Transect Location lat lon Latitude Longitude
Executable, Fine-Grain Semantic Registration genus species count lat lon 'Acanthomyops' 'latipes' 1 41.6, -119.383'Acromyrmex' 'versicolor' 1 33.1839 -114.866'Anergates‘ 'atratulus' 1 37.9833 -84.5167'Anergates‘ 'atratulus' 4 38.8833 -77.1167 Each row represents a RatioMeasurement RatioMeasurement
Executable, Fine-Grain Semantic Registration (cont.) genus species count lat lon 'Acanthomyops' 'latipes' 1 41.6, -119.383'Acromyrmex' 'versicolor' 1 33.1839 -114.866'Anergates‘ 'atratulus' 1 37.9833 -84.5167'Anergates‘ 'atratulus' 4 38.8833 -77.1167 For a row, count is the value of the measurement RatioMeasurement LocalInteger value dataValue 1
Executable, Fine-Grain Semantic Registration (cont.) genus species count lat lon 'Acanthomyops' 'latipes' 141.6 -119.383'Acromyrmex' 'versicolor' 1 33.1839 -114.866'Anergates‘ 'atratulus' 1 37.9833 -84.5167'Anergates‘ 'atratulus' 4 38.8833 -77.1167 For a row, lat/lon are the locations values of the measurement RatioMeasurement LocalInteger value dataValue 1 LocationContext context GeogCoordPoint location latitude 41.6 longitude -119.383
Executable, Fine-Grain Semantic Registration (cont.) genus species count lat lon 'Acanthomyops' 'latipes'141.6 -119.383'Acromyrmex' 'versicolor' 1 33.1839 -114.866'Anergates‘ 'atratulus' 1 37.9833 -84.5167'Anergates‘ 'atratulus' 4 38.8833 -77.1167 For a row, genus/species are mapped to standard values, associated RatioMeasurement … Count itemMeasured TaxonomicGroup propertyEntity SimpleTaxonomicId taxonomicID Genus genus rankName taxon:1883/5 subCat superCat species rankName Species taxon:1883/3
Querying based on Semantic Registrations RatioMeasurement LocalInteger value dataValue 1 LocationContext context GeogCoordPoint location latitude 41.6 longitude -119.383 Count itemMeasured TaxonomicGroup propertyEntity SimpleTaxonomicId taxonomicID Genus genus rankName taxon:1883/5 Find all datasets that measure species of ‘Acanthomyops’ in South Africa … and return a set of all lat/lon “points”(demo …) subCat superCat species rankName Species taxon:1883/3
Semantic Annotations Architecture Taxon Services Ontology repository Dataset repository (heterogeneous) Synonyms Concept IDs … Mappings Lat/Lon Species Queries SMS Operations Results discover_resources query_resourcesintegrate_resources
Finding user interfaces that are easy-to-use, but provide detailed annotations <<ontology view>> <<sample instance view>> <<annotation, schema, and data>> resource id: antweb:040412 <<registration information/properties>> Value Value Value TaxaConceptID lat lon count genus species 41.6 -119.4 5 ‘Manica’ ‘bradleyi’ 34.9 -120.7 2 ‘Formica’ ‘fusca’
A Sparrow Executable Semantic Annotation Registration A partial object instantiation (of onto classes) The resource can be queried directly using the object structure (i.e., using the ontology)
Outline • The Sparrow Toolkit • Semantic Registration • Ontology-Driven Structural Transformation
root population = (sample)* elem sample = (meas, lsp) elem meas = (cnt, acc) elem cnt = xsd:integer elem acc = xsd:double elem lsp = xsd:string Example Structural Types (XML) structType(P2) structType(P3) root cohortTable = (measurement)* elem measuremnt = (phase, obs) elem phase = xsd:string elem obs = xsd:integer <population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> … <population> <cohortTable> <measurement> <phase>Eggs</cnt> <obs>44,000</acc> </measurement> … <cohortTable> P2 P3 P5 P1 S1(life stage property) S2(mortality rate for period) P4
Example Semantic Types Portion of SEEK measurement ontology appliesTo MeasContext 0:* hasContext 1:1 hasProperty itemMeasured MeasProperty Observation Entity 0:* 1:* EcologicalProperty AccuracyQualifier hasLocation Spatial Location AbundanceCount LifeStage Property 1:1 hasValue 1:1 hasCount Numeric Value 1:1
Example Semantic Types Semantic types for P2 and P3 MeasContext Observation hasContext appliesTo LifeStage Property 1:1 1:1 itemMeasured hasCount semType(P3) Abundance Count Number Value 1:1 1:1 1:1 ⊑ hasValue hasProperty semType(P2) AccuracyQualifier 1:1 P2 P3 P5 P1 S1(life stage property) S2(mortality rate for period) P4
The Ontology-Driven Framework Ontologies (OWL) Compatible (⊑) SemanticType Ps SemanticType Pt Registration Mapping (Input) Registration Mapping (Output) StructuralType Ps StructuralType Pt Source Service Target Service Pt Ps Desired Connection
The Ontology-Driven Framework Ontologies (OWL) Compatible (⊑) SemanticType Ps SemanticType Pt Registration Mapping (Input) Registration Mapping (Output) StructuralType Ps StructuralType Pt Correspondence Source Service Target Service Pt Ps Desired Connection
The Ontology-Driven Framework Ontologies (OWL) Compatible (⊑) SemanticType Ps SemanticType Pt Registration Mapping (Input) Registration Mapping (Output) StructuralType Ps StructuralType Pt Correspondence (Ps) Generate Source Service Target Service Transformation Pt Ps Desired Connection
Datasets used in the Prototype genus species count lat lon 'Acromyrmex' 'versicolor‘ 1 33.1839 -114.866… Antweb genus species cnt lt ln Camponotus‘ ‘festinatus‘ 3 30.55 -103.833… South Africa Museum mbcnt cfcnt lat lon 1 2 -25.35 -77.1167… “faked” genus1 species1 genus2 species2 Manica parasitica Manica bradleyi… Dulosis Parasite/Host