250 likes | 266 Views
A Semantically-Enabled Provenance-Aware Water Quality Portal. Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Rensselaer Polytechnic Institute Troy, NY, USA.
E N D
A Semantically-Enabled Provenance-Aware Water Quality Portal Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Rensselaer Polytechnic Institute Troy, NY, USA Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano
Introduction • Real Life Motivation Example: • In 2009, in Bristol County, Rhode Island, Children start getting sick with symptom like diarrhea. The cause was found to be polluted water. • Public concerns: “When did the contamination begin?”, “How did this happen?”, “How can we keep it from happening again?” • We need an environmental informatics systems that can automatically integrate and analyze water quality.
Challenges • Raw data from multiple sources and in different format – difficult to integrate and query. • Semantics of the water quality data are not explicitly encoded in the data – machine can’t process data automatically. • Large amount of data due to large spatial region, long time span, and large number of pollutants and regulated limit – analysis can be time consuming and complex.
TWC-SWQP • Identify point sources of water pollution, including water sites monitored by USGS and polluting facilities regulated by EPA. • Demonstrates the effectiveness of semantic web technologies in addressing the challenges faced by environmental informatics systems. • Enable/Enpower citizens & scientists to better explore water related information.
System Architecture Virtuoso access
SemantAQUA Workflow Publish CSV2RDF4LOD Direct visualize derive derive integrate archive Archive CSV2RDF4LOD Enhance
Ontology • Core TWC Water ontology • Extends existing best practice ontologies, e.g. SWEET, OWL-Time. • Includes terms for relevant pollution concepts • Can use to conclude: “any water source that has a measurement outside of its allowable range” is a polluted water source. Portion of the TWC Water Ontology.
Ontology • Regulation Ontology • model the federal and state water quality regulations for drinking water sources • Can use to define: for example, in California, “any measurement has value 0.01 mg/L is the limit for Arsenic” • Combine with core ontology, we can infer “any water source contains 0.01 mg/L of Arsenic is a polluted water source.” Portion of Cal. Regulation Ontology.
Provenance • Preserves provenance in the Proof Markup Language (PML). • Data Source Level Provenance: • Thecaptured provenance data are used to support provenance-based queries. • Reasoning level provenance: • When water source been marked as polluted, user can access supporting provenance data for the explanations including the URLs of the source data, intermediate data and the converted data.
Visualization • Presents analyzed results with Google Map • Presents explanation of water source pollution • Presents possible health effect of contaminant • Presents “Facet” type filter to select type of data • Presents link to the authority, where user can report problems. 5 4 3 2 1 http://was.tw.rpi.edu/swqp/map.html
Visualization • Time series Visualization: • Presents data in time series visualization for user to explore and analyze the data Violation, measured value: 50 Limit value: 15 http://was.tw.rpi.edu/swqp/trend/epaTrend.html?state=RI&county=3&site=http%3A%2F%2Ftw2.tw.rpi.edu%2Fzhengj3%2Fowl%2Fepa.owl%23facility-110000312135
Data • EPA Data: • Provides measurements of pollutants in the water discharged by the facilities, and also the threshold values for up to five test types for each pollutant. • USGS Data: • Provides measurements of substances contained in water samples collected at USGS data-collection stations • Regulation Data: • Provides lists of pollutants and their maximum contaminant level
Selected Follow-up options Violation Limit
Results • Semantic Data Integration provides an effective and low cost approach for integrating data from various sources. • SWQP integrates data from various sources, including EPA, USGS, and state governments. • Linking to external data: “twcwater:Arsenic”, linked to “dbpedia:Arsenic” using owl:sameAs. • We have generated 89.58 million triples for the USGS datasets and 105.99 million triples for the EPA datasets. Requires only 2-person days.
Results • Query and reasoning supported by semantic technologies improves responsiveness and simplifies the development of web applications. • SPARQL queries narrows down the data, we can reason over only the relevant data on one selected regulation. • Reasoning eases the complexity of queries a developer needs to write for software applications.
Results • Provenance information encoded using semantic web technology supports transparency and trust. • SWQP provides detailed provenance information: • Original data, intermediate data, data source • “What if” Senario: user may trust data from certain authorities only. • User can apply a stricter regulation from another state to a local water source.
Discussion • Future Work • Expand SWQP to support all 50 states. • Add flood/weather information, and their effect on water sources • model the health effects from exposure to the excessive pollutants in water and support reasoning over these effects. • Expand SWQP to other environmental topics: soil quality, air quality • Get community involved: user can put comment on each water source, or report problem to the authorities.
Conclusion • SWQP is a web portal that allows citizens and professionals to easily explore water quality information. • SWQP illustrated benefits of applying semantic web technologies to water quality research. • Data integration, provenance, automatic reasoning. • Architecture of SWQP can be easily apply to other environment topics • Air quality, soil quality, etc.
Questions? http://tw.rpi.edu/web/project/SemantAQUA http://inference-web.org/wiki/Semantic_Water_Quality_Portal
Related work • Other work focuses on facilitating water quality management [13, 14] and wastewater treatment [15] via knowledge sharing and reuse. • [13] presents system that integrates water quality data from multiple sources and retrieves data using semantic relationships among data. • [14] presented an ontology-based Knowledge Management system (KMS) that can be integrated into the numerical flow and water quality modeling to provide assistance on the selection of a model and its pertinent parameters • [15] is an environmental decision-support system for wastewater management, which augments classic rule-based and case-based reasoning with a domain ontology. • SWQP: • SWQP differs from these projects in that it supports provenance based query. • SWQP is built upon standard semantic technologies (e.g. OWL, SPARQL, Pellet, Virtuoso) and thus can be easily replicated or expanded.
Queries for result 2 SELECT * WHERE { ?watersource twcwater:hasMeasurement ?measurement. ?measurement twcwater:hasValue ?value; twcwater:hasCharacteristic ?charactericsitc; twcwater:hasUnit ?unit. (1) ?regulation twcwater:hasValue ?limit; twcwater:hasCharacteristic ?characteristic; twcwater:hasUnit ?unit. ?watersource geo:lat ?lat; geo:long ?long. FILTER( ?value > limit ) } SELECT * WHERE { ?watersource rdf:type twcwater:pollutedWaterSource. geo:lat ?lat; (2) geo:long ?long. }
New Ontology • New Regulation ontology • Reuse sweet:Measurement instead of use owl:sameAs • Defines cardinality restriction • Defines Datatype restriction Portion of the EPA regulation ontology
New Ontology • TWC Environment Monitoring Ontology • Can be extended to use different regulation • Uses sweet ontology • More general ontology: aim for not just monitoring water, but anything relate to environment: air quality.