1 / 44

The EDEN System

The EDEN System. Jerry Fowler MCC Austin, Texas. Environmental Data Exchange Network (EDEN). Outline. EDEN Project Overview InfoSleuth in a microsecond The Ontology in InfoSleuth Value Mapping and the Environmental Data Registry Further thoughts and work. Live Demonstration. Outline.

malory
Download Presentation

The EDEN System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The EDEN System Jerry Fowler MCC Austin, Texas

  2. Environmental DataExchange Network(EDEN)

  3. Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work Live Demonstration

  4. Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work • EDEN Project Overview

  5. Environmental Data Exchange Network • The challenge: • Acquisition, use and dissemination of environmental information is of increasing strategic importance to EPA, DOD, DOE, and EEA • EDEN is an application of MCC's InfoSleuth technology • Employs intelligent agent technology through the Internet to conduct concept-based searches of heterogeneous, distributed information • The EDEN Project demonstrates how organizations can save time and money: • Provides easy access over intranet or the Internet • Enables users to access information from multiple sources • Simplifies the exchange and sharing of data • Reduces the reporting burden • Brings information together for presentation and analysis

  6. Sponsors’ Common Requirements • Reduce the reporting burden imposed by the parties on each other • Share best available and most timely information • Enable users to access information from multiple sources • Coordinate only the common vocabulary – not the end use of information resources; focus on the inputs with each participant; individually interpreting and communicating outputs

  7. EDEN: Access to Distributed Databases Oak Ridge DOE SMARS Missouri Basel EEA INEEL DOE DEPMAST New Jersey ITT EPA IRDMIS Army CACTS Texas ERPIMS Air Force CERCLIS EPA HazDat CDC MS-Access HTTP Oracle Sybase • Geographically distributed data resources • Differing database software • Differing logical schemas • Not always available

  8. Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work • InfoSleuth in a microsecond

  9. InfoSleuth • Consortial project in the use of agent software for distributed information management • Commercial sponsors: • General Dynamics Information Systems • Rafael • Raytheon • SAIC • Schlumberger • Texas Instruments • TRW

  10. InfoSleuth • System of “competent” agents for dynamic, scalable (SQL-based) access to heterogeneous distributed information sources • Ontology-based information management • Advertise-discover paradigm supported by brokering over semantic constraints

  11. InfoSleuth System • Java-based agents • Knowledge Query Manipulation Language message layer provides speech-act agent interface • Agent conversation shell provides structure for KQML messages • Open KnowledgeBase Connectivity language provides semantic communication layer • Brokering reasoning provided by Logical Data Language, LDL++ (going away…) • XML in the future

  12. Main InfoSleuth Agents • Broker • Matches agents based on semantic constraints • Resource Agents • Translate between application domain ontology and database schemata • Multi-Resource Query Agent • Supports query decomposition and result recomposition • Value Mapper • Translates to/from canonical value domains • Portal Agent • Provides user context and interface

  13. More about InfoSleuth Agents • JDBC Resource agents translate between application domain ontology and database schemata • Multi-resource query agent uses either Oracle or native Java to support query decomposition and result synthesis • Value mapper translates to/from canonical value domains using EDR • Text agent supports ontology-based query • Control agent manages CLIPS rule base for task planning and execution • Sentinel and Deviation Detection agents cooperate to detect complex event patterns

  14. Basic InfoSleuth Application Recipe • 6 cups ontology • 3 cups resource agent configuration • 1-3 cups user interface development • Lightly brown the multi-resource query agent • Pour in other agents out of the box • Stir and Serve... • add or remove resource agents as desired • add other functionality with more configuration effort

  15. Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work • The Ontology in InfoSleuth

  16. Purpose of The Ontology in InfoSleuth • To describe the domain with minimal ambiguity • the structure defines the domain • documentation strings • To be the integration hub for DB schemas • query relaxation through the taxonomy • vertical fragmentation • multi-resource path expressions • To define the preferred value domains of attributes used in communications between agents • Value mapping may be necessary for translation of queries and results

  17. Expressing the Ontology • OKBC (Open Knowledge Base Connectivity): a standard for Knowledge Representation • Classes, Slots, Facets: • (class Site_Contamination) • (template-slot-of averaging_methodSite_Contamination) • (template-facet-value :VALUE-TYPEaveraging_methodSite_Contamination :STRING) • (template-slot-of siteSite_Contamination) • (template-facet-value :VALUE-TYPEsiteSite_Contamination Eden_Site) • Subclass Linkage

  18. Ontology Definitions

  19. Ontological Concept average_concentration_unit site_id source_db recording_date contaminant average_concentration medium averaging_method Site contamination Has attributes

  20. Concept mapping (CERCLIS3) SELECT ref_media.rmedia_desc constituent_contaminant .cc_avg_conc_value_nmbr ref_concentration_units .rconc_units_desc ref_hazardous_substance .rhs_nmbr constituent_contaminant .last_updated_date site.site_epa_id 'Reported in CERCLIS3’ 'cerclis’ FROM site, ... medium, average_concentration, average_concentration_unit, contaminant, recording_date, site_id, averaging_method, source_db

  21. Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work Live Demonstration

  22. Agents communicate to solve queries EDEN System is accessible through Internet browsers Sample Query

  23. Results are merged from several resources into a single output Sample Query Results

  24. Locations of the Agents MCC public private broker hazdat erpims CRYSTAL CITY public_va cerclis, irdmis tnrcc, smars depmast MCC oreis erip vmapper eden query MCC itt basel Browser

  25. Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work • Value Mapping and the Environmental Data Registry

  26. Value mapping requirements • Translate terms in queries • Allow users to choose a coding scheme for querying • Query each database in terms of its own coding scheme • Translate results of queries • Facilitate merging of data from different sources • Display results according to user preference

  27. Quantity Unit Of Measure canonical unit Sampling Point depth Distance Meter unit unit Foot data-type FLOAT The Value Mapping Model

  28. Value mapping and the ontology • A class has one or more slots • Each slot has a conceptual domain • Each slot has a preferred value domain • Resource Agents must advertise in the preferred value domain • possibly translating to/from a different value domain • Users may query and view data in a different value domain • User Agent handles translation to/from preferred value domain

  29. Translated to Value Mapping: Capability Query translation and interpretation: SELECT * FROM site WHERE state = ‘TX’ Translated to SELECT * FROM site WHERE state = ‘Texas’ Results translation:

  30. Query Processing with Value Mapping

  31. EDR Structure A specialized resource agent (map agent) accesses a view of the EDR

  32. Linking EDR to EDEN ontology

  33. View of the EDR CREATE VIEW edr_map (conceptual_domain, cd_id, value_domain, vd_id, preferred_domain, pd_id) AS SELECT emc.conceptual_domain, emc.value_domain, pref.pv_nm, act.pv_nm FROM edr_map_class emc, cd_vm_assoc a, permissible_value pref, permissible_value act WHERE a.cd_id = emc.cd_id AND a.vm_id = act.vm_id AND a.vm_id = pref.vm_id AND emc.vd_id = act.vd_id AND emc.pd_id = pref.vd_id

  34. EDR lookup SELECT preferred_value FROM edr_map WHERE actual_value = ‘Benzene’ AND coding_scheme = ‘chemical_name’ AND conceptual_domain = ‘chemical_substance’

  35. Modifications to EDR • Downloaded files of permissible values for CAS number and Chemical name (Merck index) from EPA site • Assigned value meanings • Created value domains for CAS code, CAS padded, ycode; loaded permissible values • Added 3 extra chemical names because Merck index file was incomplete • Built data-driven value-map for environmental media • De-normalized data for faster retrieval

  36. Value Mapping Enhancements • Functional maps • e.g., case sensitivity (‘ST LOUIS’ vs. ‘St Louis’) • One-to-many maps • e.g., Environmental media mapped • Soil vs. Topsoil, Subsoil, Soil - unspecified, SO, S

  37. Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work Live Demonstration

  38. Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work • Further thoughts and work

  39. Outstanding mapping issues • No match in EDR for database value • differences in case (‘Texas’, ‘TEXAS’) • CAS number format (dashes, leading zeros) • word order (‘n-Propyl benzene’, ‘Benzene, n-Propyl’) • bad data • Improved functional mapping • ‘artificial intelligence’ can be used in functional value mapping • ontology-dependent heuristics... • Approximate string matching

  40. Further adventures in Ontology • Incorporate Terminology Reference System/GEMET into EDEN • Enable Value Map Agent to configure itself dynamically directly from EDR • Expand EDEN ontology to encompass water quality for European 5th Framework EDEN-IW project

  41. Use of XML • InfoSleuth 5.5 will use • XML data transport • XML semantic advertisement • InfoSleuth should use • XML ontology representation • XML browser configuration • XML transport layer • Benefits • One parser, not home-grown • Easier incorporation of data, metadata • Better expressivity • Better interoperability

  42. DENIX EDEN Portal

  43. Summary • The EDEN pilot system shows that InfoSleuth can integrate existing databases • Value mapping (hence EDR) is crucial • EDEN may be useful to its sponsor agencies in identifying data quality issues and data gaps • EDEN has stimulated collaboration on metadata among agencies • EDEN has showcased the utility of the EDR • More work will lead to a better, broader system

  44. ShakespeareonInternet Agent Research I can call spirits from the vasty deep! Aye, and so can I, and so can any man, but … will they come when you do call for them?

More Related