380 likes | 398 Views
Explore semantic interoperability, automated vocabulary resolution, and data services for integrating views of data from various sources in an enterprise environment. Learn how to achieve data consistency, reuse data services, and maximize interoperable information services.
E N D
Semantic Interoperability: Automatically Resolving Vocabularies Chuck Mosher 8500 Leesburg Pike Vienna, VA cmosher@metamatrix.com 4th Semantic Interoperability Conference February 10, 2006
Interoperable Information Backbone Enterprise Data Service Layer • Enterprise-wide data abstraction layer for applications • Integrated views of data from multiple sources • Relational databases, applications, files • Re-useable Data Services for data consistency • Metadata-driven data management and integration • Complements other data integration tools (ETL, EAI, quality, etc.) Applications MetaMatrix Data Sources
Data Services • A type of Web Service • Does all of the work to transform any data in any format to a W3C compliant service • Implements all of the logic to effect the transformation • Provides access to data sources, regardless of source API, technology • Does not implement application logic • Decouples the data from the application while making the data discoverable and accessible
Model-Based Approach Maximizes Re-use Exposed Information Services Reusable Integrated Business Objects Packaged Apps SOAP ODBC JDBC <sale/> <value/> </ sale > <WSDL> (contract) <WSDL> (contract) <WSDL> (contract) Data Abstraction Without Coding Enterprise Information Sources (EIS) Information Consumers Web Services,Business Processes databases services warehouses Custom Apps xml spreadsheets Reporting, Analytics geo-spatial EAI, Data warehouses rich media …
Meta Object Facility (MOF) Meta- model Model Data
MetaMatrix MetaBase Modeler • Model disparate information sources • Relational DBs • Content Management Systems • Files • Services • Applications • Uses and retains domain-specific modeling terminology • Relational models have “Tables”, “Foreign Keys”, “Columns”, etc. • UML models have “Packages”, “Classes”, “Attributes”, etc.
MetaMatrix MetaBase Modeler • Define reusable data services/ business objects • Transformations defined with: • Selects • Joins • Criteria • Unions • Functions • User defined • Perform schema and semantic matching, data type conversion
Semantic Mediation: The Problem Business Intelligence Applications Portal Applications Web Services ODBC/JDBC JDBC SOAP • Aggregate Data Services: • Relational or XML • Application-specific • Access via ODBC, JDBC, or SOAP APIs Virtual XML Document <a> … <b> </b> </a> T T T Logical Data Model • Enterprise-wide or • COI-driven Data Model • Rationalization and • Semantic mediation • Layer • Harmonization • Data Catalog/Dictionary Location_ID Location_Type T T T bldg_type bldg_id Depot_Number SITENUM Facility_ID • Data Sources • - Authoritative • Redundant • Overlapping Multiple Internal/External Information Sources
Building Enterprise Semantic Model(s) J-1 Manpower / Personnel J-7 Operational Plans J-4 Logistics (GCSS) J-8 Force Structure J-5 Plans & Policy J-2 Intelligence J-3 Operations J-6 C4CS Business Intelligence Applications Portal Applications Web Services ODBC/JDBC JDBC SOAP • Enterprise-wide or • COI-driven Data Models • Rationalization • Harmonization • Data Catalogs T T T • Data Sources • - Authoritative • Redundant • Overlapping Multiple Internal/External Information Sources
Biggest Challenge in Creating Data Services? • Semantics!!! • Structural differences are straightforward • Differing definitions among data sources • Differing vocabularies among COI’s • Established, emerging, and evolving data standards • C2IEDM, JC3IEDM, GJXDM, NIEM, GFM, many more • Not addressed by ETL, EAI, SOA
A Previously Intractable Problem • TWPDES has 1000+ core entities • NIEM has 100,000+! • Even a limited program with a dozen data sources could yield 10’s of 1000’s of potential mappings • Humans cannot address this without help • Indeed, it has stopped many data integration/reconciliation programs in their tracks.
DISCLAIMER • Semantic matching can't really be done automatically yet! • Requires intelligence to understand the context and semantics. • So use computers to do most of the work but then have the user confirm or check the result.
The Matching Problem • Given two symbols, calculate a measure of the relationship between them: Doesn’t seem so hard… amount quantity
The Matching Problem • Given two symbols, calculate a measure of the relationship between them: This is what a computer “sees.” ftuqky aqfkyeyr
The Matching Problem • Even after extracting likely symbols, matching is a difficult problem. • Symbols alone are not enough to generate good matches: • “ID” -> “SocialSecurityNumber” or “NY” • The solution relies on context: • “NJ”,”MA”,”CA”,”ID” • “Ego”, “SuperEgo”, “ID” • MatchIt provides that context
MatchIT 1.0 • Integrated component of the MetaMatrix Semantic Data Services product • Based on ontology-driven semantic knowledge base • Word relationships, dictionaries, lexicons, thesauri • Plug-in architecture • Standards-compliant: • OWL • RDF • Inference engines • OSGI • Eclipse • JDBC
(Semi-)Automated Semantic Mediation *An extensible semantic knowledge base provides a dictionary and thesaurus like information on “words”, their “meanings”, and their relationships to other words. Ontology “Sex” semantically related to “Gender” *A sophisticated set of matching algorithms provides string similarity matches and semantic matches with confidence ratings and explanations. Gender ID Matched (Confidence of 90%) Person Sex Code Data Source Services FBI CBP NYC NY NJ
Matching Techniques • MatchIT uses two types of matching techniques: • String Matching • Attempts to determine string similarity based on the lexical distance between them. • Semantic Matching • Attempts to determine string similarity based on the ontological distance between them within a semantic ontology. • Generate Match Sets • Can be run individually or in combinations • Pluggable architecture allows for algorithmic extendibility
String Matching • What is the lexical distance between two symbols? • “PUZZLE”, “PUZZ” • “ID”,”IDENTIFIER” • “STRONG”,”SONG”
Semantic Matching • How semantically similar are two concepts?
Semantic Matching Objectives • Find and rank the potential matches, but let the user review and decide for sure. • I.e., eliminate 99+% of the things that don't match, and let the user review the <1%. • Many times, a user can visually scan a small list of the top 1% and very quickly agree or disagree with the results. • Favor false positives over false negatives.
Semantic Matching in MetaMatrix XML XML XML Enterprise Information Sources Conceptual/Logical/Physical Data Models Representations Relational Domain [UML/ER] XML Ontologies [OWL/RDF] Custom XML JDBC File System Any Source RDBMS MetaMatrix Connector Framework MetaMatrix Importer Framework Data/ Content Access Semantic Knowledge Base Import Export MatchIt MetaBase Modeler Ontology Find Matches Schema-level Match Instance-level Match Data Harmonization Complete Ontological Semantics Access Fact Repository Metadata Access • Analyze • Visualize • Collaborate • Transform MetaBase Repository Onomasticons Models & Files [versioned] Lexicons Search Index Web Reporting
Overall process • Import two nontrivial vocabularies • ERwin model of large data warehouse • TWPDES XML schema • Extract symbols • Schema-specific tokenization algorithms • Assign semantics to each • Symbols are keys into dictionaries • Perform semantic matching between them • Analyze results
TWPDES XML Schema Mapping Classes for each XML frag in hierarchy
Editing the Dictionary Modify Definition
Editing the Semantics Control Senses
Target Model Match Results
From Pat Cassidy & COSMO The Integrating Function of the Common Semantic Model –via Domain-level Mapping GenericObligation SameAs Obligation SameAs Duty
A way to use ontologies in a world where nearly 100% of what already exists is not in an ontology. Map connections between ontologies that are being built and artifacts currently in use: RDBMs schemas XML and XSD files Spreadsheet data More coming, including ontologies! Map an imported model to a Vocabulary, and a Vocabulary to an Ontological structure MatchIt Semantic Matching Tool