440 likes | 613 Views
The EDEN System. Jerry Fowler MCC Austin, Texas. Environmental Data Exchange Network (EDEN). Outline. EDEN Project Overview InfoSleuth in a microsecond The Ontology in InfoSleuth Value Mapping and the Environmental Data Registry Further thoughts and work. Live Demonstration. Outline.
E N D
The EDEN System Jerry Fowler MCC Austin, Texas
Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work Live Demonstration
Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work • EDEN Project Overview
Environmental Data Exchange Network • The challenge: • Acquisition, use and dissemination of environmental information is of increasing strategic importance to EPA, DOD, DOE, and EEA • EDEN is an application of MCC's InfoSleuth technology • Employs intelligent agent technology through the Internet to conduct concept-based searches of heterogeneous, distributed information • The EDEN Project demonstrates how organizations can save time and money: • Provides easy access over intranet or the Internet • Enables users to access information from multiple sources • Simplifies the exchange and sharing of data • Reduces the reporting burden • Brings information together for presentation and analysis
Sponsors’ Common Requirements • Reduce the reporting burden imposed by the parties on each other • Share best available and most timely information • Enable users to access information from multiple sources • Coordinate only the common vocabulary – not the end use of information resources; focus on the inputs with each participant; individually interpreting and communicating outputs
EDEN: Access to Distributed Databases Oak Ridge DOE SMARS Missouri Basel EEA INEEL DOE DEPMAST New Jersey ITT EPA IRDMIS Army CACTS Texas ERPIMS Air Force CERCLIS EPA HazDat CDC MS-Access HTTP Oracle Sybase • Geographically distributed data resources • Differing database software • Differing logical schemas • Not always available
Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work • InfoSleuth in a microsecond
InfoSleuth • Consortial project in the use of agent software for distributed information management • Commercial sponsors: • General Dynamics Information Systems • Rafael • Raytheon • SAIC • Schlumberger • Texas Instruments • TRW
InfoSleuth • System of “competent” agents for dynamic, scalable (SQL-based) access to heterogeneous distributed information sources • Ontology-based information management • Advertise-discover paradigm supported by brokering over semantic constraints
InfoSleuth System • Java-based agents • Knowledge Query Manipulation Language message layer provides speech-act agent interface • Agent conversation shell provides structure for KQML messages • Open KnowledgeBase Connectivity language provides semantic communication layer • Brokering reasoning provided by Logical Data Language, LDL++ (going away…) • XML in the future
Main InfoSleuth Agents • Broker • Matches agents based on semantic constraints • Resource Agents • Translate between application domain ontology and database schemata • Multi-Resource Query Agent • Supports query decomposition and result recomposition • Value Mapper • Translates to/from canonical value domains • Portal Agent • Provides user context and interface
More about InfoSleuth Agents • JDBC Resource agents translate between application domain ontology and database schemata • Multi-resource query agent uses either Oracle or native Java to support query decomposition and result synthesis • Value mapper translates to/from canonical value domains using EDR • Text agent supports ontology-based query • Control agent manages CLIPS rule base for task planning and execution • Sentinel and Deviation Detection agents cooperate to detect complex event patterns
Basic InfoSleuth Application Recipe • 6 cups ontology • 3 cups resource agent configuration • 1-3 cups user interface development • Lightly brown the multi-resource query agent • Pour in other agents out of the box • Stir and Serve... • add or remove resource agents as desired • add other functionality with more configuration effort
Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work • The Ontology in InfoSleuth
Purpose of The Ontology in InfoSleuth • To describe the domain with minimal ambiguity • the structure defines the domain • documentation strings • To be the integration hub for DB schemas • query relaxation through the taxonomy • vertical fragmentation • multi-resource path expressions • To define the preferred value domains of attributes used in communications between agents • Value mapping may be necessary for translation of queries and results
Expressing the Ontology • OKBC (Open Knowledge Base Connectivity): a standard for Knowledge Representation • Classes, Slots, Facets: • (class Site_Contamination) • (template-slot-of averaging_methodSite_Contamination) • (template-facet-value :VALUE-TYPEaveraging_methodSite_Contamination :STRING) • (template-slot-of siteSite_Contamination) • (template-facet-value :VALUE-TYPEsiteSite_Contamination Eden_Site) • Subclass Linkage
Ontological Concept average_concentration_unit site_id source_db recording_date contaminant average_concentration medium averaging_method Site contamination Has attributes
Concept mapping (CERCLIS3) SELECT ref_media.rmedia_desc constituent_contaminant .cc_avg_conc_value_nmbr ref_concentration_units .rconc_units_desc ref_hazardous_substance .rhs_nmbr constituent_contaminant .last_updated_date site.site_epa_id 'Reported in CERCLIS3’ 'cerclis’ FROM site, ... medium, average_concentration, average_concentration_unit, contaminant, recording_date, site_id, averaging_method, source_db
Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work Live Demonstration
Agents communicate to solve queries EDEN System is accessible through Internet browsers Sample Query
Results are merged from several resources into a single output Sample Query Results
Locations of the Agents MCC public private broker hazdat erpims CRYSTAL CITY public_va cerclis, irdmis tnrcc, smars depmast MCC oreis erip vmapper eden query MCC itt basel Browser
Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work • Value Mapping and the Environmental Data Registry
Value mapping requirements • Translate terms in queries • Allow users to choose a coding scheme for querying • Query each database in terms of its own coding scheme • Translate results of queries • Facilitate merging of data from different sources • Display results according to user preference
Quantity Unit Of Measure canonical unit Sampling Point depth Distance Meter unit unit Foot data-type FLOAT The Value Mapping Model
Value mapping and the ontology • A class has one or more slots • Each slot has a conceptual domain • Each slot has a preferred value domain • Resource Agents must advertise in the preferred value domain • possibly translating to/from a different value domain • Users may query and view data in a different value domain • User Agent handles translation to/from preferred value domain
Translated to Value Mapping: Capability Query translation and interpretation: SELECT * FROM site WHERE state = ‘TX’ Translated to SELECT * FROM site WHERE state = ‘Texas’ Results translation:
EDR Structure A specialized resource agent (map agent) accesses a view of the EDR
View of the EDR CREATE VIEW edr_map (conceptual_domain, cd_id, value_domain, vd_id, preferred_domain, pd_id) AS SELECT emc.conceptual_domain, emc.value_domain, pref.pv_nm, act.pv_nm FROM edr_map_class emc, cd_vm_assoc a, permissible_value pref, permissible_value act WHERE a.cd_id = emc.cd_id AND a.vm_id = act.vm_id AND a.vm_id = pref.vm_id AND emc.vd_id = act.vd_id AND emc.pd_id = pref.vd_id
EDR lookup SELECT preferred_value FROM edr_map WHERE actual_value = ‘Benzene’ AND coding_scheme = ‘chemical_name’ AND conceptual_domain = ‘chemical_substance’
Modifications to EDR • Downloaded files of permissible values for CAS number and Chemical name (Merck index) from EPA site • Assigned value meanings • Created value domains for CAS code, CAS padded, ycode; loaded permissible values • Added 3 extra chemical names because Merck index file was incomplete • Built data-driven value-map for environmental media • De-normalized data for faster retrieval
Value Mapping Enhancements • Functional maps • e.g., case sensitivity (‘ST LOUIS’ vs. ‘St Louis’) • One-to-many maps • e.g., Environmental media mapped • Soil vs. Topsoil, Subsoil, Soil - unspecified, SO, S
Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work Live Demonstration
Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Further thoughts and work • Further thoughts and work
Outstanding mapping issues • No match in EDR for database value • differences in case (‘Texas’, ‘TEXAS’) • CAS number format (dashes, leading zeros) • word order (‘n-Propyl benzene’, ‘Benzene, n-Propyl’) • bad data • Improved functional mapping • ‘artificial intelligence’ can be used in functional value mapping • ontology-dependent heuristics... • Approximate string matching
Further adventures in Ontology • Incorporate Terminology Reference System/GEMET into EDEN • Enable Value Map Agent to configure itself dynamically directly from EDR • Expand EDEN ontology to encompass water quality for European 5th Framework EDEN-IW project
Use of XML • InfoSleuth 5.5 will use • XML data transport • XML semantic advertisement • InfoSleuth should use • XML ontology representation • XML browser configuration • XML transport layer • Benefits • One parser, not home-grown • Easier incorporation of data, metadata • Better expressivity • Better interoperability
Summary • The EDEN pilot system shows that InfoSleuth can integrate existing databases • Value mapping (hence EDR) is crucial • EDEN may be useful to its sponsor agencies in identifying data quality issues and data gaps • EDEN has stimulated collaboration on metadata among agencies • EDEN has showcased the utility of the EDR • More work will lead to a better, broader system
ShakespeareonInternet Agent Research I can call spirits from the vasty deep! Aye, and so can I, and so can any man, but … will they come when you do call for them?