1 / 1

Next Generation Environmental Informatics as exemplified by the Tetherless World Semantic Water Quality Portal

Location-based Information Retrieval. Enabling Context-Sensitive Actions. Provenance-based Query.

jaclyn
Download Presentation

Next Generation Environmental Informatics as exemplified by the Tetherless World Semantic Water Quality Portal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Location-based Information Retrieval Enabling Context-Sensitive Actions Provenance-based Query In order to help users take an active role in monitoring water quality where they live, SemantAqua attempts to identify useful links where users can report problems with their local water supplies. Currently, the portal supports reporting to the EPA and some state departments that are related to environmental preservation and protection (e.g. the California Department of Fish and Game). Work to identify the appropriate links to external authorities that accept reports within their jurisdictions is still ongoing. SemantAqua captures provenance information during the data integration stages and encodes them in the Proof Markup Language (PML) version 2 Provenance Interlingua. The provenance information is being used to support provenance-based queries. For example the system allows users to select and inspect data source information so users can choose to rely only on data from sources they trust. This will be particularly important as the portal expands to include other more varied sources of data (see Future Work). Users input a ZIP Code™ to identify the area for their search. SemantAqua uses Geonames to look up additional information, e.g. city and state, to generate location-based query over the USGS and EPA datasets. The mobile interface also takes advantage of the W3C geolocation APIs to find polluted sites near the user. Next Generation Environmental Informatics as exemplified by the Tetherless World Semantic Water Quality Portal Ping Wang1 (wangp5@rpi.edu), Jin Guang Zheng1 (zhengj3@rpi.edu), Linyun Fu1 (ful2@rpi.edu), Evan W. Patton1 (pattoe@rpi.edu), Timothy Lebo1 (lebot@rpi.edu), Li Ding1 (dingl@cs.rpi.edu), Joanne S. Luciano1 (jluciano@cs.rpi.edu), and Deborah L. McGuinness1 (dlm@cs.rpi.edu) (1Rensselaer Polytechnic Institute 110 8th St., Troy, NY, 12180 United States) Using Ontologies as Facets Regulations are encoded as ontologies, thus making an ontology a potential view of the world. Users can select from a number of different regulation ontologies to classify the data, allowing them to see differences between state regulations and the federal regulations set forth by the EPA. In addition, type information from the water ontology that describes the different types of measurement sites and their polluted counterparts gives the user some control over what information is displayed on the map. Motivation In late 2009 in Bristol County, RI there was a case of E. Coli contaminating the public water supply resulting in illnesses in the population, particularly young children. residents requested information concerning when the contamination began, how it happened, and what measures were being taken to monitor and prevent future occurrences. That event reflected the increasing demand for direct and transparent access to ecological and environmental information, and inspired the Semantic Water Quality Portal (SemantAqua) project. Next Generation Environmental Informatics More customized Queries • Starting with the domain of water quality, we are investigating a general framework called SemantEco that can support dynamic environmental informatics portals via semantically-enabled approaches, including: • capture of the semantics of domain knowledge using a family of modular simple OWL2 ontologies, • integration of environmental monitoring and regulation data from multiple sources following Linked Data principles • preservation of provenance metadata using the Proof Markup Language (PML) version 2 • inference of environment pollution events using OWL2 inference • Combined with distributed sensor networks and incremental OWL2 classification, this work could provide a scaffold for deploying near real-time reporting of pollution events in communities. • The Characteristic, Health Concern and Time Frame facets enable the user to further customer his/her query. The user can issue queries that are the most relevant to his/her interests. • What sites/facilities in this area are polluted with these specific contaminants, e.g. fecal coliform, lead? • What polluted sites/facilities are contaminated with pollutants that could cause the following symptoms or health problems, e.g. Diarrhea? • What sites/facilities were polluted in the past two years? Connecting to Health Issues Data Presentation Aiming at helping citizens investigate health impacts of water pollution, SemantAqua links water quality data to some known health considerations. We have generated an initial ontology describing potential health impacts of overexposure to contaminants. Initial content came from EPA. For example, exposure to E. Coli results in abdominal cramping and diarrhea, and if left untreated can result in high blood pressure and kidney damage. This health information is presented to the user together with the pollution details (see Data Presentation) and also used to customize information retrieval (see More Customized Queries). Different icons are used to differentiate polluted sites from clean sites. Clicking on one of these polluted sites will display a popup window that provides more details about the pollution events: names of contaminants, measured values, limit values, time of measurement, and health effects. SemantAqua Workflow Publish Time Series Visualization CSV2RDF4LOD Direct Reason The time series visualization retrieves water quality data related to a selected water site or facility by querying the triple store and displays the water quality data as a time series. The user selects a particular permit for a facility, the characteristic of the water, and the test type (if any) associated with that particular characteristic. For the EPA data there are up to five different test types that take measurements in different ways and compute the limits differently: Quantity Average, Quantity Max, Concentration Min, Concentration Average, Concentration Max. The visualization on the right is about the quality of the water released by the Southeast Water Pollution Control Plant located in San Francisco. The plot shows the enterococci measurements in green and the regulation defined limit in blue. We can see that there are three severe violations (in red) happened during 2009 and 2010. Access to such information can help citizens be more informed and make requests to the state administrator to improve the handling of the water at the local facilities. Visualize derive derive integrate archive Archive CSV2RDF4LOD Enhance Sponsors: Visit our project page at: http://tw.rpi.edu/web/project/SemantAQUA Future Work Currently, twenty-seven states out of fifty have been encoded in RDF using the SemantEco and SemantAqua ontologies and work continues on converting the remaining states. The current portal contains the regulatory information of four of the fifty states. An effort is underway to encode additional regulatory information from different states as well as identify what states simply defer to the EPA on different pollutants as the EPA regulations have already been encoded. In addition, work on linking contaminants to external resources such as DBpedia and symptom and health effect information from sources such as WebMD will provide the data needed to answer the more interesting questions regarding the health impacts of pollution. We also have initiated work on linking to reporting systems at the federal and state levels so that users can report potential issues in their neighborhoods, thus making this portal a helpful tool for enacting environmental change. Lastly, we plan to augment the portal to generate data reports of user's query results, which could contain query specification, identified pollution events, relevant converted and source data and provenance of these data. These data reports can be useful when users report their findings to authorities or environmental organizations. Poster: IN31B-1438 Glossary: EPA – U.S. Environmental Protection Agency MPN – Most Probable Number PML 2– Proof Markup Language (PML) version 2 RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute USGS – United States Geological Survey Try it out: http://aquarius.tw.rpi.edu/projects/semantaqua/

More Related