840 likes | 1.07k Views
Foundations VI: Foundations VI: Discovery, Access and Semantic Integration. Deborah McGuinness and Peter Fox CSCI-6962-01 Week 11, November 10, 2008. Contents. Review of reading, questions, comments
E N D
Foundations VI: Foundations VI: Discovery, Access and Semantic Integration Deborah McGuinness and Peter Fox CSCI-6962-01 Week 11, November 10, 2008
Contents • Review of reading, questions, comments • Semantic Integration using SESDI – Semantically-Enabled Scientific Data Integration as an example • Semantically-Enabled Search – ex. Noesis • Integration using Top Level Science ontologies • Summary • Next week
References 3 Fox, P.; McGuinness, D.L.; Raskin, R.; Sinha, K. A Volcano Erupts: Semantically Mediated Integration of Heterogeneous Volcanic and Atmospheric Data. Proceedings of the First Workshop on Cyberinfrastructure: Information Management in eScience, co-located with the ACM Conference on Information and Knowledge Management, Lisbon, Portugal,November 9, 2007. ftp://ftp.ksl.stanford.edu/pub/KSL_Reports/KSL-07-09.pdf Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves. Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science. NSTC2007. http://esto.nasa.gov/conferences/nstc2007/papers/Ramachandran_Rahul_A3P4_NSTC-07-0084.pdf Boyan Brodaric and Florian Probst. Enabling Cross-Disciplinary e-Science by Integrating Geoscience Ontologies with DOLCE. Under Review. 2008. Yolanda Gil, Ewa Deelman, Mark Ellisman, Thomas Fahringer, Geoffrey Fox, Dennis Gannon, Carole Goble, Miron Livny, Luc Moreau, Jim Myers, "Examining the Challenges of Scientific Workflows," Computer , vol. 40, no. 12, pp. 24-32, December, 2007. http://www.isi.edu/~gil/papers/computer-NSFworkflows07.pdf
Semantic Web Methodology and Technology Development Process • Establish and improve a well-defined methodology vision for Semantic Technology based application development • Leverage controlled vocabularies, et c. Adopt Technology Approach Leverage Technology Infrastructure Science/Expert Review & Iteration Rapid Prototype Open World: Evolve, Iterate, Redesign, Redeploy Use Tools Evaluation Analysis Use Case Develop model/ ontology Small Team, mixed skills
Motivation for Semantic Integration In order to solve problems that are inherently multi-disciplinary, researchers often need data from many varied sources. Consider problems such as global warming or some problems that you suggested 2 classes ago – e.g.impact of earthquakes on transportation, etc.
Semantically Enabled Scientific Data Integration SESDI slides from joint work with Fox, McGuinness, Raskin, Sinha and materials from McGuinness et al, Geoinformatics 2007 Fox et al, ESTO 2008
Mt. Spurr, AK. 8/18/1992 eruption, USGS http://www.avo.alaska.edu/image.php?id=319
Tropopause http://aerosols.larc.nasa.gov/volcano2.swf
Atmosphere Use Case Determine the statistical signatures of both volcanic and solarforcings on the height of the tropopause From paleoclimate researcher – Caspar Ammann – Climate and Global Dynamics Division of NCAR - CGD/NCAR Layperson perspective: - look for indicators of acid rain in the part of the atmosphere we experience… (look at measurements of sulfur dioxide in relation to sulfuric acid after volcanic eruptions at the boundary of the troposphere and the stratosphere)
SESDI Impact: A Better Way to Access Data The Problem • Scientists often only use data from a single instrument because it is difficult to access, process, and understand data from multiple instruments. A typical data query might be: • “Give me the temperature, pressure, and water vapor from the AIRS instrument from Jan 2005 to Jan 2008” • “Search for MLS/Aura Level 2, SO2 Slant Column Density from 2/1/2007” A Solution • Using a simple process, SESDI allows data from various sources to be registered in an ontology so that it can be easily accessed and understood. Scientists can use only the ontology components that relate to their data. An SESDI query might look like: • “Show all areas in California where sulfur dioxide (SO2) levels were above normal between Jan 2000 and Jan 2007” • This query will pull data from all available sources registered in the ontology and allow seamless data fusion.
Components to implement An analysis application Cross-domain terms, concepts and relations Connections to underlying data (registration) Framework to put these together Integration connector
Data Registration Framework Data Discovery Data Integration Level 1: Data Registration at the Discovery Level, e.g. Volcano location and activity Level 2: Data Registration at the Inventory Level, e.g. list of datasets by, types, times, products Level 3: Data Registration at the Item Detail Level, e.g. access to individual quantities Earth Sciences Virtual Database A Data Warehouse where Schema heterogeneity problem is Solved; schema based integration Ontology based Data Integration A.K.Sinha, Virginia Tech, 2006
How to find the data? Think about it the way the data providers do
SEDRE: Semantically Enabled Data Registration Engine • SEDRE: a system that enables scientists to semantically register data sets for optimal querying and semantic integration • SEDRE enables mapping of heterogeneous data to concepts in domain ontologies A. K. Sinha, A. Rezgui,Virginia Tech
Semantic Registration in SEDRE: An Overview • SEDRE is a desktop application • Users download and install SEDRE • SEDRE accesses domain ontologies • Users map data attributes (e.g., SO2) to concepts in ontologies without ‘knowing it’
Example 1: Registration of Volcanic Data • Location Codes: • U - Above the 180° turn at Holei Pali (upper Chain of Craters Road) • L - Below Holei Pali (lower Chain of Craters Road) • UL - Individual traverses were made both above and below the 180° turn at Holei Pali • H - Highway 11 SO2 Emission from Kilauea east rift zone - vehicle-based (Source: HVO) Abreviations: t/d=metric tonne (1000 kg)/day, SD=standard deviation, WS=wind speed, WD=wind direction east of true north, N=number of traverses
Registering Volcanic Data (2) • No explicit lat/long data • Volcano identified by name • Volcano ontology framework will link name to location
Example 2: Registration of Atmospheric Data Satellite data for SO2 emissions Abbreviation: SCD: Slant Column Density (in Dobson Unit (DU))
SEDRE+DIA: Overview DIA: Web-based System for Data Discovery, Integration and Analysis (Developed at Virginia Tech through NSF funding)
SESDI Data Registration Summary Summary Semantic data frameworks technologies are changing the landscape of providing data to scientists Tools for data registration are soon to be available Applications to perform data integration mediated by semantics are available Initial results - applied to two volcanoes - led to correlation of SO2 concentration from volcano and in the atmosphere and relation to H2SO4.
Volcano concept map after the workshop - some linked concepts are circled
Volcano concept map after the workshop - some linked concepts are circled
Packages for an Ontology for Volcanoes Import NASA: Semantic Web for Earth Science Units Ontology Planetary Structure Geologic Time Physical Properties Import NASA: Semantic Web for Earth Science Numerics Ontology Planetary Material GeoImage Import NASA: Semantic Web for Earth Science Physical PropertyOntology PlanetaryLocation Planetary Phenomenon Phenomenon Material Volcanic System Data Types Instruments Climate SWEET GEON IMPORT EXISTING ONTOLOGIES Import NASA: Semantic Web for Earth Science Physical Phenomena Ontology
DOLCE ROCKS: Integrating Foundational and Geoscience Ontologies Preliminary results for the integration of concepts from DOLCE, GeoSciML, and SWEET Boyan Brodaric Natural Resources Canada Florian Probst University of Muenster From SSKI Spring Symposium Series on Semantic Scientific Knowledge Integration
Outline • Foundational ontologies • DOLCE • DOLCE + GeoSciML • DOLCE + SWEET • DOLCE + GeoSciML + SWEET Brodaric / Probst – SSKI 2008
Foundational Ontologies CONTENTS • General concepts and relations that apply in all domains physical object, process, event,…, inheres, participates,… • Rigorously defined formal logic, philosophical principles, highly structured • Examples DOLCE, BFO, GFO, SUMO, CYC, (Sowa) Brodaric / Probst – SSKI 2008
“…and then there was one…” Foundational ontology Geophysics ontology Marine ontology Water ontology Planetary ontology Geology ontology Struc ontology Rock ontology Foundational Ontologies PURPOSE: help integrate domain ontologies Brodaric / Probst – SSKI 2008
“…a place for everything, and everything in its place…” Foundational ontology shale rock formation lithification Foundational Ontologies PURPOSE: help organize domain ontologies Brodaric / Probst – SSKI 2008
Problem scenario • Little work done on linking foundational ontologies with geoscience ontologies • Such linkage might benefit various scenarios requiring cross-disciplinary knowledge, e.g.: water budgets: groundwater (geology) and surface water (hydro) hazards risk: hazard potential (geology, geophysics) and items at threat (infrastructure, people, environment, economic) health: toxic substances (geochemistry) and people, wildlife many others… Brodaric / Probst – SSKI 2008
DOLCE GeoSciML SWEET Project • Objectives evaluate fit between DOLCE, GeoSciML, SWEET evaluate operational benefits: e.g. data discovery, integration,… • Approach extend DOLCE do not alter GeoSciML, SWEET use Protégé / OWL and SeReS • Expected Results unified ontology, internally consistent increased ability to discover and integrate data focus of this talk Brodaric / Probst – SSKI 2008
Endurant participates Perdurant rock body Physical Endurant Physical Object Process Event State Lithificationevent inheres inheres color Quality age Physical Quality Temporal Quality located-in located-in Munsell-space GSA Time-scale brown Ordovician Physical Region Abstract Temporal Region DOLCE 2.1 Lite-Plus, OWL 397 Brodaric / Probst – SSKI 2008
GeoSciML 2.0 beta • GML-UML schema of basic geologic entities • focus: Geologic Unit, Earth Material • some classes, many relations align relations Brodaric / Probst – SSKI 2008
generic-constituent has-quality q_location Physical-Body Amount-Of-Matter Physical Quality Physical Region GeologicUnit • Rock EarthMaterial CompoundMaterial Lithology GeologicUnitType has-q: ParticleType has-q: FabricType has-q: ProcessType has-q: MineralClass has-q: ChemicalClass Formation X Rocktype Shale Particletype Grain Fabrictype Foliated Processtype Sedimentary Mineralclass QAPFregion Chemicalclass TASregion DOLCE + GeoSciML (1) • Benefits full coverage of GeoSciML fragment • Issues classification vs quality vs subtype complex qualities (non-unary) reference spaces (e.g. units of meas.) CompoundMaterial part : EarthMaterial plays: CompositionPart generic-constituent: Particle participant-in: GeologicProcess [0..*] host-of: Fabric[0..*] has_quality: Lithology [1..*] has_quality: CompositionCategory [0..*] has_quality: PhysicalQuality [0..*] has_quality: MetamorphicQuality [0..*] has_quality: ConsolidationDegree Rock UnconsolidatedMaterial Brodaric / Probst – SSKI 2008