150 likes | 169 Views
Semantic Integration of Information Systems Pilot Program Proposal. Lockheed Martin Advanced Technology Laboratories 3 Executive Campus Cherry Hill, NJ 08002 Jon Darvill Brian Boesch {jdarvill,bboech}@atl.lmco.com. Summary of Proposed Approach.
E N D
Semantic Integration of Information Systems Pilot Program Proposal Lockheed MartinAdvanced Technology Laboratories 3 Executive Campus Cherry Hill, NJ 08002 Jon Darvill Brian Boesch {jdarvill,bboech}@atl.lmco.com
Summary of Proposed Approach Harmonize heterogeneous DHS data models through Ontrapro automated ontology alignment Map local ontologies for legacy systems to NIEM Standard Ontology NIEM Standard Ontology Enable alignment-based translation for high-priority systems Refine NIEM Standard Ontology by degree of alignment with local ontologies Ontology Alignment <xml:ns=“htto:www.w3c.org/…”> <element name=treport Attributetes=> <complex…> Manual Ontology Engineering from Legacy Schemas Local Ontology Information Sources Ontology Extraction DHS Applications • Generate local ontologies for legacy systems through: • Natural language processing of related documentation • Inference over instance data • Analysis of data exploitation applications • SME review Databases Intel Report Oct 2, 2005 Iraq, Iran, Iraq, Syria, Iraq, Iran, Iraq, Syria, Soldier, Tank, UAV Related Documentation
Ontology-enabled DB Federation • Global integration heterogeneous information resources is an unrealized goal for federal agencies • Currently, critical information either cannot be delivered or is delivered with unacceptable latency because it is distributed among resources with incompatible data models • The National Information Exchange Model (NIEM) is being developed--and an NIEM Ontology has been proposed--to facilitate data integration • There are two fundamental technical challenges to achieving integration through the NIEM Ontology: • Ontology Extraction • Ontology Alignment LocalDB LocalOntology LocalDB LocalOntology NIEM Ontology LocalDB LocalOntology
Related Text and Documentation Challenge 1: Ontology Extraction • Ontology Extraction is the process of creating a hierarchical, abstract representation of the data contained in a data source • Discovering these abstractions is critical for aligning heterogeneous systems in a scalable way • Ontological descriptions of data enable complex federated query, (semantic) web services OntologyExtraction Local DB LocalOntology DB Schema NIEM Ontology BackgroundKnowledge
HAL Prototype learns of abstractions from semi-structured data Generates new knowledge from free text documents with semantic markup Uses machine learning to organize and categorize statements into related groups and then identify relationships between groups to identify new knowledge not explicitly stated in the documents Applies to any semi-structured data (e.g., relational DB) Approach Generates semantic markup using Focused Knowledge Base semantic (RDF) extraction tool Generates rules for creating user-specified abstractions on markup using Aleph inductive logic programming tool Applies background knowledge to influence rule induction AbstractionManager InductionEngineAleph tool Text corpus Hierarchical Abstraction Learning OntologyRepresentation BackgroundKnowledgeStore Fact Extractor FKB tool
HAL Components • FKB Prototype • Learns extraction model through training • Extracts RDF assertions from free text • Allows editing of the assertions • Provides persistent storage in the form of the Jena2 RDF knowledge base • Enables query by an inference model that computes RDF-supported inferences • Alephize Prototype • Learns abstraction rules from user-driven positive and negative examples • Applies abstraction rules on new input assertions
One of the critical technical barriers to interoperability is that information needed by users is distributed among resources with heterogeneous data models Even with standard data models such as NIEM, legacy systems must be mapped The current practice of bridging these semantic gaps is to write static query and command translation rules manually, which is onerous, error-prone, and insufficient to support dynamic, scalable semantic integration Challenge 2: Ontology Alignment NIEM Ontology OntologyAlignment LocalDB LocalOntology “Alignability” with existing data models is also a key metric in the design of NIEM itself.
Ontology Translation Protocol • Ontrapro Prototype automatically discovers semantic correspondences between elements in heterogeneous ontologies • Key innovations include: • Dynamically composable alignment algorithms that compare a comprehensive range of features (e.g., syntactical, lexical, phonetic, and structural) between data models to identify semantic similarities • Tunable filters that maximize alignment precision and recall • Intuitive graphical interface for viewing alignments
Metadata Directory Data Element Repository Ontrapro Data Model Harmonization: CDR • Common Data Repository • LM IS&S, System Integration, Systems Engineering and Technologies • Requirements modeling tools • System Architect (SA) • OPNET Simulation Tool • Example question: What network architectures use protocol X, regardless of what they call it? • X is being used in unproven situations • X is being replaced / upgraded • New devices can only connect through X • New protocol Y could interfere with X OpNet Operator SA Operator ONTRAPRO CDR
CDR Challenge: Different Syntax and Semantics Architecture Designed by Group A with System Architect Architecture Designed by Group B with OpNet @prefix SA: <SA.n3t> . #PC connects to Hub via ETHERNET SA:DELLPC SA:connects_to SA:CISCOETHER2500 . SA:DELLPC SA:connects_via SA:ETHERNET . #Hub connects to Router via GIGABIT Ethernet SA:CISCOETHER2500 SA:connects_to SA:CISCOGATEWAY . SA:CISCOETHER2500 SA:connects_via SA:GIGABITETHERNET . @prefix OPNET: <OPNET.n3> . OPNET:WLAN_WKSTN OPNET:connection_peer OPNET:ETHERNET32_HUB . OPNET:WLAN_WKSTN OPNET:connection_protocol OPNET:10BASET . OPNET:ETHERNET32_HUB OPNET:connection_peer OPNET:FR4_ETHERNET2_GTWY . OPNET:ETHERNET32_HUB OPNET:connection_protocol OPNET:1000BASEX . ONTRAPRO aligns different terms with the same descriptive meaning
CDR: Ontrapro Finds the Map Yields alignment File Example Alignment: SA’s ROUTER is OPNET’s GATEWAY Ontrapro User Interface
Summary of Innovations • Lockheed Martin Advanced Technology Laboratories (LM ATL) will aid the development, deployment, continued use, and overall benefit of the NIEM and NIEM Ontology by leveraging technologies that • Extract ontologies from local data sources • Aligning local ontologies with the NIEM Ontology • This technology will increase the quality of the standard at the same time as reduce the overhead required to implement the standard by clients • LM ATL will apply its considerable experience in ontology-based integration to deliver technology that enables the sustainable interoperability of heterogeneous information resources
Information Interpretation and Integration Conference • Experiment Participants Jerome Pierson (INRIA) John Li (Teknowledge) Lewis Hart (AT&T) Marc Ehrig (University of Karlsruhe) Todd Hughes (LM ATL) • Guest Speakers Bill Andersen (Ontology Works) Mike Pool (Information Extraction and Transport) Yun Peng (University of Maryland Baltimore County) Mike Gruningner (University of Maryland)
LM ATL also co-organized the Evaluation of Ontology Tools Workshop (at ISWC) and the Integrating Ontologies Workshop (at KCAP) I3CON results demonstrate that Ontrapro is consistently effective across a range of problem types I3CON: Experiment Results