280 likes | 415 Views
Towards Semantic Mediation for GEON: Facilitating Scientific Data Integration using Knowledge Representation. Bertram Ludäscher ludaesch@sdsc.edu Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego. +/- Energy. GEON Metamorphism Equation:.
E N D
Towards Semantic Mediation for GEON:Facilitating Scientific Data Integration using Knowledge Representation Bertram Ludäscher ludaesch@sdsc.edu Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego
+/- Energy GEON Metamorphism Equation: Geoscientists + Computer Scientists Igneous Geoinformaticists “Smart” Geologic Map Prototype: Kai Lin klin@sdsc.edu Data and Knowledge Systems San Diego Supercomputer Center Geo-Knowledge-Engineer: Boyan Brodaric brodaric@NRCan.gc.ca Natural Resources Canada ... and many GEONites : Dogan, Krishna, ..., State Geologic Surveys, Chaitan, Ilya, Michalis, Ashraf, ... (upcoming demo) Acknowledgements
Midatlantic Region Rocky Mountains GEON and “Semantic” Data Integration
What is Knowledge Representation? Relating Theory to the World via Formal Models Source: John F. Sowa, Knowledge Representation: Logical, Philosophical, and Computational Foundations “All models are wrong, but some are useful!”
What is (an) “Ontology” ???(... what CS graduate students need to know ...) 1. Ontology as a philosophical discipline 2. Ontology as a an informal conceptual system 3. Ontology as a formal semantic account 4. Ontology as a specification of a “conceptualization” 5. Ontology as a representation of a conceptual system via a logical theory 5.1 characterized by specific formal properties 5.2 characterized only by its specific purposes 6. Ontology as the vocabulary used by a logical theory 7. Ontology as a (meta-level) specification of a logical theory [Guarino’95] http://ontology.ip.rm.cnr.it/Papers/KBKS95.pdf
What is an Ontology? (CSE-291 cont’d ;-) • Given a logical language L ... • ... a conceptualization is a set of models of L which describes the admittable (intended) interpretations of its non-logical symbols (the vocabulary) • ... an ontology is a (possibly incomplete) axiomatization of a conceptualization. set of all models M(L) logic theories ontology conceptualization C(L) [Guarino96] http://www-ksl.stanford.edu/KR96/Guarino-What/P003.html
domain knowledge ? Information Integration Knowledge Representation: ontologies, concept spaces Database mediation Data modeling raw data Problem: Scientific Data Integration ... from Questions to Queries ... What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? “Complex Multiple-Worlds” Mediation GeoPhysical (gravity contours) Geologic Map (Virginia) GeoChronologic (Concordia) Foliation Map (structure DB) GeoChemical
Got Glue? Which one? What for? • XML (common syntax) • flexible (semistructured) data model • used at all levels: data / metadata exchange, message exchange (SOAP), schemas & data types (XML Schema), Semantic Web & web ontologies (RDF(S), OWL), ... • Grid infrastructure (system interoperation) • distributed computing and data management • web services • Controlled Vocabularies (“joins”) • data level: joins across different data sets • but meta-data and ontologies (concept names, relationship names, ...) are also data! • Integrated View Definitions (mediated views/virtual databases) • declarative specification of “integration logic”: XQuery, Datalog, ... • Thesauri (translator for retrieving related information) • synonyms, broader/narrow term, e.g., UMLS (meta-thesaurus, “ontology”) • Taxonomies (classification) • shared vocabulary, concept hierarchy (is-a) • Ontologies (classification + additional semantics): • formal specification of a conceptualization, shared meaning • facilitates “smart querying”, semantic mediation
Semantics Structure Syntax • reconciling S4heterogeneities • “gluing” together multiple data sources • bridging information and knowledge gaps computationally System aspects Information Integration Challenges • System aspects: “Grid” Middleware • distributed data & computing • Web Services, WSDL/SOAP, OGSA, … • sources = functions, files, data sets, • … • Syntax & Structure: • (XML-Based) Data Mediators • wrapping, restructuring • (XML) queries and views • sources = (XML) databases • Semantics: • Model-Based/Semantic Mediators • conceptual models and declarative views • Knowledge Representation: ontologies, description logics (RDF(S),OWL ...) • sources = knowledge bases (DB+CMs+ICs)
Standard (XML-Based) Mediator Architecture USER/Client Query Q ( G (S1,..., Sk) ) Integrated Global (XML) View G Integrated View Definition G(..) S1(..)…Sk(..) MEDIATOR (XML) Queries & Results (XML) View (XML) View (XML) View wrappers implemented as web services Wrapper Wrapper Wrapper S1 S2 Sk
Integrated-DTD := XQuery(Src1-DTD,...) Integrated-CM := CM-QL(Src1-CM,...) “Glue Maps” ontologies, concept spaces Semantics, Constraints in Logic No Semantics / Domain Constraints IF THEN IF THEN IF THEN Structural Constraints (DTDs), Parent, Child, Sibling, ... Classes, Relations, is-a, has-a, ... C1 A = (B*|C),D B = ... C2 R C3 . . .... .... .... XML Elements .... (XML) Objects XMLModels Raw Data Raw Data ConceptualModels Raw Data XML-Based vs. Semantic Mediation CM ~ {Descr.Logic, ER, UML, RDF(S), …} CM-QL ~ {F-Logic, …} 0.0155381,1.54906,2,140,29,Tertiary,Trc,CHINLE FORMATION,59,57
GEON Framework for Interoperability in the Geosciences • Systems level: GEON Grid ... • enable sharing of data and tools via grid services • based on Open Grid Services Architecture (OGSA) • acquisition of cluster endpoints and initial deployment at some sites underway, including SDSC, UTEP, VT, ..., • Syntactic and schema level: Data integration via (meta)data standards (often XML-based) • database mediators create integrated virtual databases => dynamic creation and automatic update of data-warehouses • Semantic level: data integration via “semantic” mediation • Situating 4-D data in context spatio-temporal, thematic, processcontexts can be represented as “concept spaces” • specifically: use of ontologies, and logic-based knowledge representation • development guided/driven by specific scientific data integration problems
Towards Shared Conceptualizations: High-level Domain Ontology & Standard Data Model Adoption of a standard (meta)data model => wrap data sets into unified virtual views Source: NADAM Team (Boyan Brodaric et al.)
Towards Shared Conceptualizations: Data Contextualization via Concept Spaces
Towards Knowledge Sharing: Rock-type “Ontology” Genesis Fabric Composition Texture
Biomedical Informatics Research Network http://nbirn.net Getting Formal: Source Contextualization & Ontology Refinement in Logic
domain knowledge Knowledge representation AGE ONTOLOGY Nevada Show formations where AGE = ‘Paleozic’ (with age ontology) Show formations where AGE = ‘Paleozic’ (without age ontology)
Querying with Multiple Classifications/Ontologies:Age, Composition, Texture, Fabric, Genesis
What to do with the “KR Glue”? • Conceptual-level information, concept spaces, ontologies, and other KR techniques for ... • ... smart data discovery • ... browsing and querying by themes, disciplines, ... • ... defining virtual/mediateddatabases at conceptual level • ... support “plugging together” of “data and information experiments” into Scientific Workflows (a.k.a. Analytical Pipelines in the SEEK ITR) • ... smarter user interfaces • is “find felsic sedimentary rocks” a meaningful (satisfiable) query? • ...
Some enabling operations on “ontology data” • Concept expansion: • what else to look for when asking for ‘Mafic’ Composition
Some enabling operations on “ontology data” • Generalization: • finding data that is “like” X and Y Composition
Towards Knowledge Sharing: Rock-type Ontology Genesis Fabric Composition Texture
DEMO... do NOT click this ... http://kbis.sdsc.edu/GEON/ahm03-demo.html
request response Architecture of Integrated Geologic Map Prototype System Map Definition HTTP Server (Java Server Page) local layer remote layer local layer MapServer (Minnesota) Mediator (Java application) Database (Arizona) Database (Montana) Global Ontology Definitions Rock classification Geologic age
Data Source Wrapping and Integration ABBREV Arizona PERIOD FORMATION AGE Idaho NAME Colorado PERIOD LITHOLOGY Utah TYPE PERIOD Nevada FMATN TIME_UNIT Wyoming NAME Livingston formation FORMATION PERIOD Tertiary-Cretaceous Montana West AGE New Mexico NAME PERIOD LITHOLOGY andesitic sandstone Montana East FORMATION PERIOD
Ontology-Enabled Query Processing User: “Show formations from Cenozoic!” Age Ontology Cenozoic Query Rewriting Quaternary Tertiary select FORMATION where AGE=“Tertiary” or AGE=“Quaternary” PERIOD FORMATION PERIOD LITHOLOGY ABBREV Arizona Montana West Map Rendering Color Definition
MANY! non-available or non-interoperable data “Dirty data”, no controlled vocabularies Many different controlled vocabularies! (“clean data”) What is entailed by a vocabulary? Formal Ontologies Extensible Ontologies Integration Challenges
What’s next? • YOU! • GEON-SCI: • Science questions waiting to be turned into queries! • GEON-KR Working Group activities • guided (if not driven by) geoscientists • marry KR technologies to standards (W3C, Semantic Web: RDF, OWL, ...) • collect GEON-able KR resources (data models, controlled vocabularies, ontologies, ...) • GEON-DEV: • Generalize and merge current KR/semantic mediation architecture with standard Grid architecture • building systems