250 likes | 258 Views
Explore how the Semantic Water Quality Portal integrates diverse data sources, facilitates automatic analysis of polluted water sources, and enhances user interaction for mitigating environmental risks and health hazards. Discover the potential of semantic web technology in solving water quality challenges.
E N D
Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation
Outline • Introduction • Methods • System Architecture • Ontology • Provenance • Visualization • Demo • Claims • Conclusion • Improvements • Future Work • Contributions
Introduction • Semantic Water Quality Project • Continuing swqp project from last semester’s Semantic Escience Class • Goal: Help citizens to identify polluted water sources, and potential pollution sources, therefore, alleviating/controlling adverse health effects. • Credits: Evan, Theodora, Ping, Jin
Motivation Use Case • Use Case: • Children start getting sick: vomiting • Residents request authority perform checks on the water supply. • Authority collects data from various sources: EPA, USGS, State regulation, etc. • Authority analyze the data • Authority reports the analyzed result • And More …
SWQP • Semantic Water Quality Portal can ease the process: • Integrate data from various sources • Perform automatic analysis(reasoning) on polluted water sources and possible sources of pollutants: facilities that violate regulations • Present analyzed results in an user friendly interface.
Research Question How can we use semantic web technology to solve environmental related problems?
Ontology • Two types of Ontology: • Core Ontology • Encode main inference, reasoning rules • Regulation Ontologies • Encode regulations from different states • Reasoning Example: • “any water source has a measurement over certain threshold is a polluted water source” (core ontology) • “any measurement has value 0.01 mg/l of Arsenic is a threshold” (regulation ontology) • “any water source contains 0.01 mg/l of Arsenic is a polluted water source.” (inferred from the above rules)
Provenance • Data Level Provenance • Where are the original data? • Provide provenance based query. • Application Level Provenance • What data did we use in the analysis and reasoning step? • Provide explanation to the user when a water source is marked as polluted water source
Visualization • Map Visualization: • Presents analyzed results with Google Map • Polluted Water Source, Polluting Facility • Presents explanation on why a water source is marked as polluted • Use “Facet” type filter to select type of data • Trend Visualization: • Presents data in trend visualization for user to explore and analyze the data.
Claim I - Problem • Problem: • Data are collected from various sources: • EPA, USGS, etc. • Heterogeneous Data: • Difficult to perform query • Data are stored using different schema, and the semantics of the terms in different schema can be very different from each other
Claim I Semantic Data Integration helps SWQP to integrate data from various sources, eases the process of future data integration, and make it easier to use existing reasoners to perform reasoning.
Claim I Example • Various Data Sources: • Convert into RDF, and load to triple store. • Use Sparql to query data • Use EPA ontology as central schema to encode converted data • Easier for future data integration: • Easier to accommodate schema changes: add equivalent statements, new properties, new classes etc. • Easier to use existing reasoners: • Jena, Pellet, etc.
Claim II - Problem • Problem: • Analysis process of identifying a water source is polluted can be complex and time consuming. • Example: • 10 contaminants in a water source. • Each contaminant has been measured 10 times. • There are 50 regulation limits.
Claim II Automatic inference and reasoning supported by semantic web technologies helps SWQP to perform automatic analysis on water qualities etc.
Claim II Example • Reasoning and Inference: • Identify measured object is a water source • Find all measurements for the water source • Validate measurement is measuring water contaminants. • Perform reasoning on whether the measurement exceeds threshold • What element? What Unit? What Value? • Identify the type of water source: polluted?
Claim III - Problem • Problem: • User may not trust the analyzed result presented by SWQP. • I don’t think Hudson river has been polluted. • User may trust data from certain sources only. • I don’t trust the data collected by a student for his class project.
Claim III Provenance information encoded in semantic web technology helps SWQP solve trust related problems.
Claim III Example • Data Source Based Query: • User can select what data to be analyzed. • Data Source Provenance • Explanation on polluted water source: • Pop out window to show the regulation used and measured value
Improvements • More data: • Regulation data from CA, NY, MASS, EPA • EPA, USGS data for multiple states • Provenance data are captured for both regulation data, and EPA, USGS data. • More Features: • Provenance based data query and analysis • Trend visualization • Speed: • ~ 15 – 30 seconds. • Main draw-back now is real-time inference and reasoning and the large size of the data
Future Work • Provenance: • support building, linking and displaying proof traces that track how the answers are derived from source data. • Health Related Reasoning: • Model the effects of drinking polluted water source. • Identify which polluted water source cause people vomit more quickly. • Flood Reasoning: • Model Flood • Identify which water sources will flood with high probability • Identify possible effects of flood w.r.t water quality • Other Work: • Pollutant based query: e.g. interested in Arsenic
Contributions • Ping: • Use Tim’s converter to convert EPA and USGS Data. • Preprocess regulation data to CSV format • Implement data visualization part of the project • Write part of this final class write up, and present the visualization part of the demo. • Jin: • Write script to convert data to RDF format encoded use Ontology • Design Ontology to support automatic reasoning and inference • Re-implement Jena-Pellet based backend reasoner. • Class related works: since this project is Ping’s out of class project, I am responsible for most of the project related write up, presentation, etc.
Questions Thank you for your attention!