1 / 21

Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

Semantics Technology Demonstration Ecoinformatics International Technical Collaboration  April 9, 2008 Research Triangle Park, North Carolina, USA. Bruce Bargmeyer Lawrence Berkeley National Laboratory and University of California, Berkeley Tel: +1 510-495-2905 bebargmeyer@lbl.gov.

danton
Download Presentation

Semantics Technology Demonstration Ecoinformatics International Technical Collaboration 

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantics Technology Demonstration Ecoinformatics International Technical Collaboration  April 9, 2008 Research Triangle Park, North Carolina, USA Bruce Bargmeyer Lawrence Berkeley National Laboratory and University of California, Berkeley Tel: +1 510-495-2905 bebargmeyer@lbl.gov

  2. Topics • Describe challenges to be addressed • Describe the demo scenarios • Describe the initial demo • Describe the technology/infrastructure • Discuss Collaboration

  3. Challenge: Access Dispersed Data. Convey Common Understanding of meaning between Data Creators and Data Users text text data data environ agriculture climate human health industry tourism soil water air ambiente agricultura tiempo salud hunano industria turismo tierra agua aero 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 A common interpretation of what the data represents EEA USGS text data environ agriculture climate human health industry tourism soil water air DoD 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 Users text data environ agriculture climate human health industry tourism soil water air EPA 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 text data 3268 0825 1348 5038 2708 0000 2178 123 345 445 670 248 591 308 ambiente agricultura tiempo salud huno industria turismo tierra agua aero 123 345 445 670 248 591 308 3268 0825 1348 5038 Others . . . Users Information systems Data Creation

  4. Contamination Biological Radioactive Chemical mercury lead cadmium Challenge: Combine Data, Metadata & Concept Systems Inference Search Query: “find water bodies downstream from Fletcher Creek where chemical contamination was over 10 micrograms per liter between December 2001 and March 2003” Concept system: Data: Metadata:

  5. Challenge: Use data from systems that record the same facts with different terms • Reduce the human toil of drawing information together and performing analysis -> shift to computer processing.

  6. Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others DataElementConcept Algeria Belgium China Denmark Egypt France . . . Zimbabwe Same Fact, Different Terms Data Elements Algeria Belgium China Denmark Egypt France . . . Zimbabwe L`Algérie Belgique Chine Danemark Egypte La France . . . Zimbabwe DZ BE CN DK EG FR . . . ZW DZA BEL CHN DNK EGY FRA . . . ZWE 012 056 156 208 818 250 . . . 716 Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org. Steward: Classification: Registration Authority: Others ISO 3166 3-Alpha Code ISO 3166 English Name ISO 3166 French Name ISO 3166 2-Alpha Code ISO 3166 3-Numeric Code

  7. Demo with Microsoft eScience • Collaborate with Microsoft Research, San Francisco Office • Collaboration already ongoing with LBNL and UCB, Berkeley Water Center. • Somewhat like Hydroseek, but with XMDR for concept systems and metadata • Hydroseek accesses EPA STORET and USGS NWIS (Water Data)

  8. Scenarios • Scenario 1 – Semantics enabled data access • Semantics enabled access to data and metadata that may serve as an indicator (or as input to a more complex indicator) • Scenario 2 – Data harmonization • People from different states or countries (political jurisdictions) are interested in water quality. They want to develop a particular indicator of interest based on data that crosses political jurisdictions.

  9. Scenarios (Continued) • Scenario 3 – Simulation models • Use XMDR to document parameters: input data, output data, initialization parameters, etc. for water, air, subsurface, models. So as to support remote simulation model integration. If put a box around some geography, can see if there are models that have been run.

  10. Scenario 1 Semantics enabled access to data and metadata that may serve as an indicator (or as input to a more complex indicator) • Person uses concept systems to find variables of interest, accesses the data for the variables, and views metadata describing the data. • Use concept system to identify possible variables that have data for a specific time and geographic coverage. • Use concept system to create query to access data from multiple sources. • Access/obtain the data • System performs mediation of results from different result sets (simple transformations based on information in metadata registry). • Display data with links to metadata. • User can go to metadata to better understand the data, e.g., provenance, measurement units, collection methods.

  11. Scenario 1 • Use combination of XMDR and Hydroseek-like software • XMDR holds the concept system(s) and metadata • Hydroseek-like software interacts with user, accesses data, and displays results. • Mediation tool is separate from XMDR, but draws on metadata from XMDR. • Also need what is necessary to interact with the external data source (e.g., screen scraping, database access). • Bora currently has concept system that serves as the global ontology for variables in ~25 systems. E.g., STORET and NWIS. He used USGS water words dictionary.

  12. Hydroseek • Hydroseek is an ontology-aided search engine for finding scientific data on water quality and hydrology from approximately 1.9 million sites in the USA. Hydroseek creates a unified view over databases of agencies such us US Geological Survey, Environmental Protection Agency. • It helps researchers to remove the semantic, syntactic and information system heterogeneity barriers, improves the search experience, and reduces time spent on data discovery and preparation prior to processing. Depending on the method of interaction (GUI or web services) and the function invoked, output can be provided using CUAHSI WaterML, Geography Markup Languages Features, or Microsoft Excel. • The system uses Microsoft Virtual Earth map interface with OWL ontologies providing the knowledge base used in supplying the auto-complete keywords and classifying of search results. • Hydroseek follows Services Oriented Architecture (SOA) and most functionalities are available via SOAP webservices. The system also supports queries using NASA's Global Change Master Directory (GCMD) keywords via web services.

  13. Hydroseek • public • Tagging Application Demo • Admin Interface Demo • private • Tagging Application • Admin Interface • User Management Console • other • Registration • Help & Credits

  14. Linking Concept Systems to Data

  15. Little Demo • A little demo to show that what we talked about can be done with XMDR. • Use latitude, longitudes for 3-4 sites and what they measure. • A small ontology with 5-6 concepts and two ontologies for data sources (let’s say USGS and EPA) with 5-6  variables (variable = name of what is being measured i.e. parameter name), measurement method metadata etc. • The idea is showing how these (including mappings between variables) can be stored in XMDR and how can they be discovered. • So it is a matter of getting them into XMDR and putting together some sort of a web interface that gets a keyword and returns a list of sites, relevant measurements etc. • This will be done with samples from US at first and then with content from JRC and WISE.

  16. Sample Data

  17. Small Concept System <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xm="http://www.xmdr.org/"> <xm:concept rdf:about="http://www.xmdr.org/ammoniaNitrogen" dc:title="Ammonia Nitrogen"> <xm:narrowerThan> <xm:concept rdf:about="http://www.xmdr.org/nitrogen" dc:title="Nitrogen"/> </xm:narrowerThan> <xm:hasMedium> <xm:medium rdf:about="http://www.xmdr.org/water"/> </xm:hasMedium> </xm:concept> </rdf:RDF>

  18. Technology Overview XMDR XMDR 8 Metadata: Provenance, etc. XMDR Adapted from a slide from Bora Beran

  19. Third Party Software Modular XMDR Archtitecture USERS Web Browsers…..Client Software Metadata Sources concept systems, data elements Content Loading & Transformation (Lexgrid & custom) Application Program Interface (REST) Human User Interface (HTML fromJSP and javascript; Exhibit) Authentication Service Validation (XML Schema) Mapping Engine Search & Content Serving (Jena, Lucene) Metamodel specs (UML & Editing) (Poseidon, Protege) XMDR data model & exchange format XML, RDF, OWL Logic Indexer (Jana & Pellet) Text Indexer (Lucene) Registry Store standard XMDR files XMDR metamodel (OWL & xml schema) standard XMDR files Text Index Logic Index standard XMDR files standard XMDR files Postgres Database

  20. Video and Discussion • View Video • Discuss

  21. Acknowledgements • John McCarthy, LBNL • Kevin Keck, LBNL • Bora Beran, Microsoft Research • This material is based upon work supported by the National Science Foundation under Grant No. 0637122, USEPA and USDOD. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, USEPA or USDOD.

More Related