1 / 38

Semantic Interoperability: Automatically Resolving Vocabularies

Explore semantic interoperability, automated vocabulary resolution, and data services for integrating views of data from various sources in an enterprise environment. Learn how to achieve data consistency, reuse data services, and maximize interoperable information services.

belljose
Download Presentation

Semantic Interoperability: Automatically Resolving Vocabularies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Interoperability: Automatically Resolving Vocabularies Chuck Mosher 8500 Leesburg Pike Vienna, VA cmosher@metamatrix.com 4th Semantic Interoperability Conference February 10, 2006

  2. Interoperable Information Backbone Enterprise Data Service Layer • Enterprise-wide data abstraction layer for applications • Integrated views of data from multiple sources • Relational databases, applications, files • Re-useable Data Services for data consistency • Metadata-driven data management and integration • Complements other data integration tools (ETL, EAI, quality, etc.) Applications MetaMatrix Data Sources

  3. Data Services • A type of Web Service • Does all of the work to transform any data in any format to a W3C compliant service • Implements all of the logic to effect the transformation • Provides access to data sources, regardless of source API, technology • Does not implement application logic • Decouples the data from the application while making the data discoverable and accessible

  4. Model-Based Approach Maximizes Re-use Exposed Information Services Reusable Integrated Business Objects Packaged Apps SOAP ODBC JDBC <sale/> <value/> </ sale > <WSDL> (contract) <WSDL> (contract) <WSDL> (contract) Data Abstraction Without Coding Enterprise Information Sources (EIS) Information Consumers Web Services,Business Processes databases services warehouses Custom Apps xml spreadsheets Reporting, Analytics geo-spatial EAI, Data warehouses rich media …

  5. Meta Object Facility (MOF) Meta- model Model Data

  6. MetaMatrix MetaBase Modeler • Model disparate information sources • Relational DBs • Content Management Systems • Files • Services • Applications • Uses and retains domain-specific modeling terminology • Relational models have “Tables”, “Foreign Keys”, “Columns”, etc. • UML models have “Packages”, “Classes”, “Attributes”, etc.

  7. MetaMatrix MetaBase Modeler • Define reusable data services/ business objects • Transformations defined with: • Selects • Joins • Criteria • Unions • Functions • User defined • Perform schema and semantic matching, data type conversion

  8. Semantic Mediation: The Problem Business Intelligence Applications Portal Applications Web Services ODBC/JDBC JDBC SOAP • Aggregate Data Services: • Relational or XML • Application-specific • Access via ODBC, JDBC, or SOAP APIs Virtual XML Document <a> … <b> </b> </a> T T T Logical Data Model • Enterprise-wide or • COI-driven Data Model • Rationalization and • Semantic mediation • Layer • Harmonization • Data Catalog/Dictionary Location_ID Location_Type T T T bldg_type bldg_id Depot_Number SITENUM Facility_ID • Data Sources • - Authoritative • Redundant • Overlapping Multiple Internal/External Information Sources

  9. Building Enterprise Semantic Model(s) J-1 Manpower / Personnel J-7 Operational Plans J-4 Logistics (GCSS) J-8 Force Structure J-5 Plans & Policy J-2 Intelligence J-3 Operations J-6 C4CS Business Intelligence Applications Portal Applications Web Services ODBC/JDBC JDBC SOAP • Enterprise-wide or • COI-driven Data Models • Rationalization • Harmonization • Data Catalogs T T T • Data Sources • - Authoritative • Redundant • Overlapping Multiple Internal/External Information Sources

  10. Biggest Challenge in Creating Data Services? • Semantics!!! • Structural differences are straightforward • Differing definitions among data sources • Differing vocabularies among COI’s • Established, emerging, and evolving data standards • C2IEDM, JC3IEDM, GJXDM, NIEM, GFM, many more • Not addressed by ETL, EAI, SOA

  11. A Previously Intractable Problem • TWPDES has 1000+ core entities • NIEM has 100,000+! • Even a limited program with a dozen data sources could yield 10’s of 1000’s of potential mappings • Humans cannot address this without help • Indeed, it has stopped many data integration/reconciliation programs in their tracks.

  12. Automated Semantic Matching

  13. DISCLAIMER • Semantic matching can't really be done automatically yet! • Requires intelligence to understand the context and semantics. • So use computers to do most of the work but then have the user confirm or check the result.

  14. The Matching Problem • Given two symbols, calculate a measure of the relationship between them: Doesn’t seem so hard… amount quantity

  15. The Matching Problem • Given two symbols, calculate a measure of the relationship between them: This is what a computer “sees.” ftuqky aqfkyeyr

  16. The Matching Problem • Even after extracting likely symbols, matching is a difficult problem. • Symbols alone are not enough to generate good matches: • “ID” -> “SocialSecurityNumber” or “NY” • The solution relies on context: • “NJ”,”MA”,”CA”,”ID” • “Ego”, “SuperEgo”, “ID” • MatchIt provides that context

  17. MatchIT 1.0 • Integrated component of the MetaMatrix Semantic Data Services product • Based on ontology-driven semantic knowledge base • Word relationships, dictionaries, lexicons, thesauri • Plug-in architecture • Standards-compliant: • OWL • RDF • Inference engines • OSGI • Eclipse • JDBC

  18. (Semi-)Automated Semantic Mediation *An extensible semantic knowledge base provides a dictionary and thesaurus like information on “words”, their “meanings”, and their relationships to other words. Ontology “Sex” semantically related to “Gender” *A sophisticated set of matching algorithms provides string similarity matches and semantic matches with confidence ratings and explanations. Gender ID Matched (Confidence of 90%) Person Sex Code Data Source Services FBI CBP NYC NY NJ

  19. Matching Techniques • MatchIT uses two types of matching techniques: • String Matching • Attempts to determine string similarity based on the lexical distance between them. • Semantic Matching • Attempts to determine string similarity based on the ontological distance between them within a semantic ontology. • Generate Match Sets • Can be run individually or in combinations • Pluggable architecture allows for algorithmic extendibility

  20. String Matching • What is the lexical distance between two symbols? • “PUZZLE”, “PUZZ” • “ID”,”IDENTIFIER” • “STRONG”,”SONG”

  21. Semantic Matching • How semantically similar are two concepts?

  22. Semantic Matching Objectives • Find and rank the potential matches, but let the user review and decide for sure. • I.e., eliminate 99+% of the things that don't match, and let the user review the <1%. • Many times, a user can visually scan a small list of the top 1% and very quickly agree or disagree with the results. • Favor false positives over false negatives.

  23. Semantic Matching in MetaMatrix XML XML XML Enterprise Information Sources Conceptual/Logical/Physical Data Models Representations Relational Domain [UML/ER] XML Ontologies [OWL/RDF] Custom XML JDBC File System Any Source RDBMS MetaMatrix Connector Framework MetaMatrix Importer Framework Data/ Content Access Semantic Knowledge Base Import Export MatchIt MetaBase Modeler Ontology Find Matches Schema-level Match Instance-level Match Data Harmonization Complete Ontological Semantics Access Fact Repository Metadata Access • Analyze • Visualize • Collaborate • Transform MetaBase Repository Onomasticons Models & Files [versioned] Lexicons Search Index Web Reporting

  24. Example

  25. Overall process • Import two nontrivial vocabularies • ERwin model of large data warehouse • TWPDES XML schema • Extract symbols • Schema-specific tokenization algorithms • Assign semantics to each • Symbols are keys into dictionaries • Perform semantic matching between them • Analyze results

  26. ERwin Data Warehouse Model

  27. TWPDES XML Schema Mapping Classes for each XML frag in hierarchy

  28. Generated Symbol Dictionary (TWPDES)

  29. Generated Symbol Dictionary (ERwin model)

  30. Editing the Dictionary Modify Definition

  31. Editing the Semantics Control Senses

  32. Target Model Match Results

  33. Examine Details

  34. Match Details

  35. Matches Used to Build Mappings

  36. From Pat Cassidy & COSMO The Integrating Function of the Common Semantic Model –via Domain-level Mapping GenericObligation SameAs Obligation SameAs Duty

  37. A way to use ontologies in a world where nearly 100% of what already exists is not in an ontology. Map connections between ontologies that are being built and artifacts currently in use: RDBMs schemas XML and XSD files Spreadsheet data More coming, including ontologies! Map an imported model to a Vocabulary, and a Vocabulary to an Ontological structure MatchIt Semantic Matching Tool

  38. Thank you

More Related