240 likes | 326 Views
Outline. EDEN Project Overview InfoSleuth in a microsecond The Ontology in InfoSleuth Value Mapping and the Environmental Data Registry Virtual demo. Environmental Data Exchange Network. The challenge:
E N D
Outline • EDEN Project Overview • InfoSleuth in a microsecond • The Ontology in InfoSleuth • Value Mapping and the Environmental Data Registry • Virtual demo
Environmental Data Exchange Network • The challenge: • Acquisition, use and dissemination of environmental information is of increasing strategic importance to EPA, DOD, DOE, and EEA • EDEN is an application of MCC's InfoSleuth technology • Employs intelligent agent technology through the Internet to conduct concept-based searches of heterogeneous, distributed information • The EDEN Project demonstrates how organizations can save time and money: • Provides easy access over intranet or the Internet • Enables users to access information from multiple sources • Simplifies the exchange and sharing of data • Reduces the reporting burden • Brings information together for presentation and analysis
Common Set of Requirements • Reduce the reporting burden imposed by the parties on each other • Sharing of best available and most timely information • Enable users to access information from multiple sources • Coordinate only the common vocabulary – not the end use of information resources; focus on the inputs with each participant; individually interpreting and communicating outputs
Pilot Databases • CERCLIS-3: EPA Superfund (Oracle, VA) • ITT: EPA Remediation Technology (MS-Access, TX) • HazDat: EPA Hazardous Substances (Sybase, GA) • ERPIMS: Air Force Env. Restoration (Oracle, TX) • EEA: Basel Convention (Ms-Access, TX) • IRDMIS: Army Installation Restoration (Oracle, MD) • DOE INEEL (Oracle, ID) • DOE ORNL (Oracle, TN)
InfoSleuth • System of “competent” agents for dynamic, scalable (SQL-based) access to heterogeneous distributed information sources • Ontology-based information management • Advertise-discover paradigm supported by brokering over semantic constraints
InfoSleuth System • Java-based agents • Knowledge Query Manipulation Language message layer provides speech-act agent interface • Agent conversation shell provides structure for KQML messages • Open KnowledgeBase Connectivity language provides semantic communication layer • Brokering reasoning provided by Logical Data Language, LDL++
More InfoSleuth Agents • JDBC Resource agents translate between application domain ontology and database schemata • Multi-resource query agent uses either LDL++ or Oracle to support query decomposition and result recomposition • Value mapper translates to/from canonical value domains • Text agent supports ontology-based query • Task execution agent manages CLIPS rule base for task planning and subscription maintenance • Sentinel and Deviation detection agents cooperate to detect complex event patterns
Basic InfoSleuth Application Recipe • 6 cups ontology • 3 cups resource agent configuration • 1-3 cups user interface development • Lightly brown the multi-resource query agent • Pour in other agents out of the box • Stir • Serve • ... • add or remove resource agents as desired • add other functionality with more configuration effort
resource agent Viewer Applets Viewer Applets Viewer Applets User Resources Resources resource agent mapping info mapping info SQL text User resource agent valuemap agent ontology agent broker agent broker agent multi- query agent multi- query agent task agent task agent user agent user agent Refined Data User A Distributed Query
Purpose of The Ontology in InfoSleuth • To describe the domain with minimal ambiguity • the structure defines the domain • documentation strings • To be the integration hub for the DB schema • query relaxation through the taxonomy • vertical fragmentation • multi-resource path expressions • To provide the language of the queries and the language of expression of the results • value mapping
Expressing the Ontology • OKBC (Open Knowledge Base Connectivity): a standard for Knowledge Representation • Classes, Slots, Facets: • (class Observed_Contamination) • (template-slot-of analysis_methodObserved_Contamination) • (template-facet-value :VALUE-TYPEanalysis_methodObserved_Contamination :STRING) • (template-slot-of siteObserved_Contamination) • (template-facet-value :VALUE-TYPEsiteObserved_Contamination Eden_Site) • Subclass and Instance-Of Links
Ontology Features • Value Mapping Modelling Quantity Unit Of Measure canonical unit height Person Distance Meter unit unit Foot data-type STRING
Value mapping requirements • Translate terms in queries • Allow users to choose a coding scheme for querying • Query each database in terms of its own coding scheme • Translate results of queries • Facilitate merging of data from different sources • Display results according to user preference
Value mapping and the ontology • A class has one or more slots • Each slot has a conceptual domain name • Each slot has preferred value domain • Resource Agents must advertise in the preferred value domain • possibly translating to/from a different value domain • Users may query and view data in a different value domain • User Agent handles translation to/from preferred value domain
EDR contents We use a specialized resource agent (map agent) to access the EDR
Additions to EDR • Downloaded files of permissible values for CAS number and Chemical name (Merck index) from EPA site • Assigned value meanings • Created value domains for CAS code, CAS padded, ycode; loaded permissible values • Added 3 extra chemical names because Merck index file was incomplete
View of the EDR CREATE VIEW edr_map (conceptual_domain, cd_id, value_domain, vd_id, preferred_domain, pd_id) AS SELECT emc.conceptual_domain, emc.value_domain, pref.pv_nm, act.pv_nm FROM edr_map_class emc, cd_vm_assoc a, permissible_value pref, permissible_value act WHERE a.cd_id = emc.cd_id AND a.vm_id = act.vm_id AND a.vm_id = pref.vm_id AND emc.vd_id = act.vd_id AND emc.pd_id = pref.vd_id
Query translation SELECT name FROM site WHERE state = ‘Texas’’ translated to SELECT name FROM site WHERE state = ‘TX’
Result translation Translated to
EDR lookup SELECT preferred_value FROM edr_map WHERE actual_value = ‘Benzene’ AND coding_scheme = ‘chemical_name’ AND conceptual_domain = ‘chemical_substance’
Outstanding issues • No match in EDR for database value • differences in case (‘Texas’, ‘TEXAS’) • CAS number format (dashes, leading zeros) • word order (‘n-Propyl benzene’, ‘Benzene, n-Propyl’) • bad data • Functional mapping needed • Approximate string matching