630 likes | 782 Views
Ontologies in Data and Application Integration – an Update. Kai Lin Bertram Ludäscher Knowledge-Based Information Systems Lab Data and Knowledge Systems (DAKS) San Diego Supercomputer Center University of California San Diego. http://www.geongrid.org. Outline. Motivation
E N D
Ontologies in Data and Application Integration – an Update Kai Lin Bertram Ludäscher Knowledge-Based Information Systems Lab Data and Knowledge Systems (DAKS) San Diego Supercomputer Center University of California San Diego http://www.geongrid.org
Outline • Motivation • Ontology Cheat Sheet • Ontology-enabled Prototypes and Tools • Data & Service Registration (Structural + Semantic) • Scientific Workflows
Ontology Cheat Sheet (1/2) • What is an ontology? An ontology usually … • specifies a theory (a set of models) by … • defining and relating… • concepts representing features of a domain of interest • Also an overloaded (sometimes sloppy) term for: • Controlled vocabularies • Database schema (relational, XML, …) • Conceptual schema (ER, UML, … ) • Thesauri (synonyms, broader term/narrower term) • Taxonomies • Informal/semi-formalrepresentations • “Concept spaces”, “concept maps” • Labeled graphs / semantic networks (RDF) • Formal ontologies, e.g., in [Description] Logic (OWL) • “formalization of a specification” constrains possible interpretation of terms
A Multi-Hierarchical Rock Classification “Ontology” (GSC) Genesis Fabric Composition Texture
Ontology Cheat Sheet (2/2) • What are ontologies used for? • Conceptual models of a domain or application, (communication means, system design, …) • Classification of … • concepts (taxonomy) and • data/object instances through classes • Analysis of ontologies e.g. • Graph queries (reachability, path queries, …) • Reasoning (concept subsumption, consistency checking, …) • Targets for semantic data registration • Conceptual indexes and views for • searching, • browsing, • querying, and • integration of registered data
+/- Energy +/- a few hundred million years GEON Metamorphism Equation: Geoscientists + Computer Scientists Igneous Geoinformaticists domain knowledge Application Example: Geologic Map Integration Knowledge representation Ontologies!? Nevada
After registering datasets, ontologies (here: “classes”), and an application (“OMI”), the datasets can be searched and displayed in an integrated way. Geologic Map Integration in the Portal
Concept-Based Queries and Analysis • After registering a source with one or more ontologies, concept-based queries and analysis can be launched • Here: light-weight client-side processing (SVG)
Ontologies and Data Management • Where do ontologies fit within data management architectures? • Several answers, specifically: • An ontologyis similar to a schema or conceptual model if one exists, but is • Developed independently of a particular application • Probably given in a different language • Inherently more general • Usually not a very good schema (weak structure)
Ontologies and Data Management( watch out for Semantic Data Registration later) Ontology use concepts from (explicitly or implicitly) Design Artifact Conceptual Model Conceptual Model Schema Schema Schema Schema Metadata Data
Creating and Sharing Concept Maps (here: Seismology concept map & Cmap tool) • Lock up scientists for 2+ days • Add CS/KRDB types • Create concept maps • Refine • Iterate from napkin drawings, to concept maps, to ontologies
Graph (RDF) Queries on Ontologies visualisation RQL Query: Show all “products” Query Results
Community-Based Ontology Development • Draft of a geochemistry ontology developed by scientists • Current concept maps and • emerging ontologies: • Igneous Rocks/Plutons • Seismology • Geochemistry
Sparrow (a poor man’s OWL tool …) Simple ASCII-based RDF and OWL entry and manipulation
What is Data/Ontology/… Registration? • A mechanism by which data sources, ontologies, services, … • … are publishedin a repository/registry • for the purpose of “smart” discovery, querying, integration
Things to Register • Data files (individual files) • Shapefile as a blob (+ file type) • Collections (of files; nested; eg satellite data) • Databases (has schema and can be queried) • Shapefile with schema registered • Ontologies • Services (web + grid services) • Other/external applications
DataCollectionEvent Measurement MeasurementContext MeasurableItem SpeciesCount SpeciesAbundance AbundanceCollectionEvent Location LTERSite SBLTERSite {naples,…} ⊑ contains.Measurement ⊑ measureOf.MeasurableItem ⊓ hasContext.MeasurementContext ⊑ hasTime.DateTime ⊓ hasLocation.Location ⊑ hasUnit.Unit ⊓ hasValue.UnitValue ⊑ MeasurableItem ⊓ hasSpecies.Species ⊓ hasUnit.RatioUnit … ⊑ Measurement ⊓ measureOf.SpeciesCount ⊑ DataCollectionEvent ⊓ contains.SpeciesAbundance ⊑ position.Coordinate ⊑ Location ⊑ LTERSite ⊓ position.SBLTERCoordinate ⊑ SBLTERSite Connecting Datasets to Ontologies Ontology (snippet) How can we “register” the dataset to concepts in the Ontology? Dataset Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Step1: Selecting Relevant Concepts Concepts from an Ontology • DataCollectionEvent • AbundanceCollectionEvent • Measurement • Abundance • SpeciesAbundance • MeasurementContext • … • Location • LTERSite • SBLTERSite • naples • Species • … • MeasurableItem • SpeciesCount Dataset Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Step1: Selecting Relevant Concepts Concepts from an Ontology • DataCollectionEvent • AbundanceCollectionEvent • Measurement • Abundance • SpeciesAbundance • MeasurementContext • … • Location • LTERSite • SBLTERSite • naples • Species • … • MeasurableItem • SpeciesCount Dataset Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Step2: Generate Object Model Concepts from an Ontology • DataCollectionEvent • AbundanceCollectionEvent • Measurement • Abundance • SpeciesAbundance • MeasurementContext • … • Location • LTERSite • SBLTERSite • naples • Species • … • MeasurableItem • SpeciesCount Abundance Collection Event contains measureOf SpeciesCount SpeciesAbundance hasValue hasSpecies hasUnit Species RatioUnit hasTime hasLoc RatioValue SBLTERSite DateTime
Applications of Semantic Registration • Mentioned before: • Smart data discovery, integration etc. • New application: • Generating data transformation semi-automatically for chaining together computational services
Problem: Service Reusability • Unless “designed to fit,” independent services are structurally incompatible • Generally, the source output type will not be a subtype of the target input type Incompatible StructuralType Ps StructuralType Pt (⋠) Desired Connection Source Service Target Service Pt Ps
(≺) Service Reusability • A data transformation mapping () is required to connect the services … artificially creating subtype compatibility • If such a exists, the services are “structurally feasible” Incompatible StructuralType Ps StructuralType Pt (⋠) (Ps) Desired Connection Source Service Target Service Pt Ps
Service Reusability • Idea: • annotate services with semantic types (concept expressions) primarily for discovery of services Ontologies (OWL) Compatible (⊑) SemanticType Ps SemanticType Pt Desired Connection Source Service Target Service Pt Ps
(≺) Service Reusability • Services can be semantically compatible, but structurally incompatible Ontologies (OWL) Compatible (⊑) SemanticType Ps SemanticType Pt Incompatible StructuralType Ps StructuralType Pt (⋠) (Ps) Desired Connection Source Service Target Service Pt Ps
The Ontology-Driven Framework (work w/ Shawn Bowers, SEEK) Ontologies (OWL) Compatible (⊑) SemanticType Ps SemanticType Pt Registration Mapping (Input) Registration Mapping (Output) StructuralType Ps StructuralType Pt Correspondence (Ps) Generate Source Service Target Service Transformation Pt Ps Desired Connection
Example Generated Data Transformation (in XQuery) • Based on the structural correspondences and certain assumptions, we derive the transformation query: <cohortTable> { for $s in /population/sample return <measurement> { for $c in $s/meas/cnt return <obs>{$c/text()}</obs> } { for $l in $s/lsp return <phase>{$l/text()}</phase> } </measurement> } </cohortTable>
Reverse Engineering a Scientific Workflow using the KEPLER Tool (Efrat Jaeger)
A Scientific Workflow in Kepler Extract mineral composition for row Id. Igneous Rock Diagrams information. Rock Name.
Kepler … is a community-based, cross-project, open source collaboration for “minute made” application integration using web (grid) services as basic building blocks has a joint CVS repository, mailing lists, web site, … is gaining momentum thanks to contributors and contributions BSD-style license allows commercial spin-offs a pre-packaged, shrink-wrapped version (“Kepler-to-GO”) coming soon to a place near you… KEPLER and YOU
The KEPLER GUI (Vergil from Ptolemy II) Drag and drop utilities, director and actor libraries.
Distributed Workflows in KEPLER • Web and Grid Service plug-ins • WSDL • ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard • SRB • SSH, SCP • Web Service Harvester • Imports all the operations of a specific WS (or of all the WSs in a UDDI repository) as Kepler actors • XSLT and XQuery transformers to link non-fitting services together • Web Service Deployment (…ongoing work…)