220 likes | 241 Views
MatchIT 1.1: Data Integration with Semantic Mapping Technologies. Michael Schidlowsky Sr. Software Architect. Data Integration. Motivated by: Organizational Changes Mergers and Acquisitions Internal reorganizations (e.g., DHS) Data Mining Standards Conformance Migration Efforts
E N D
MatchIT 1.1: Data Integration with Semantic Mapping Technologies Michael Schidlowsky Sr. Software Architect
Data Integration • Motivated by: • Organizational Changes • Mergers and Acquisitions • Internal reorganizations (e.g., DHS) • Data Mining • Standards Conformance • Migration Efforts • Legacy Systems • Decouple data sources from application code
Data Integration • Challenges for integration specialist include: • Domain-specific terms • Unfamiliarity with source schemas • Large size of schema set • Semantics often not captured • Captured semantics • Stored in ad-hoc formats • Cannot be reused to facilitate future data integration efforts
Background: Acme Inc., merges with CompuGlobalHyperMeganet. Technical Challenge: Need “Virtual Database” of all sales for all stores in real-time. Which fields represent customers? CUSTOMERID CUST_ID SSN Which fields represent ‘Price’? Sale_Amt Total_Sale What if your database has 10,000 columns? Data Integration: Example
Background: HR needs to use employee information for new company portal. Technical Challenge: Data must be in XML and conform to standard HR schema. Find all fields related to Address? RESIDENCE PREV_RESIDENCE What if your database has 10,000 columns? Data Integration: Example
Ideal Matching Solution • Finds lexical relationships • Captures semantic information • Finds semantic relationships • Provides programmatic access to results (API) • Fast • Scalable • Human Involvement
MatchIT Philosophy • Best Matching tool already exists! What is meant by “ID”?
MatchIT Philosophy • Best Matching tool already exists! What is meant by “ID”? • “PLEASE PRESENT ID”
MatchIT Philosophy • Best Matching tool already exists! What is meant by “ID”? • “PLEASE PRESENT ID” • NY, NJ, ID
MatchIT Philosophy • Best Matching tool already exists! What is meant by “ID”? • “PLEASE PRESENT ID” • NY, NJ, ID • SUPEREGO, EGO, ID
MatchIT 1.1 • - MatchIT is a semantic and lexical matching tool. • Session Outline: • Import and process schemas • Perform lexical matching • Create and manage a semantic vocabulary • Perform semantic matching • Demonstrate 3rd Party integration with Data Integration tool (MetaMatrix)
Import & Process Schemas • Revelytix Models are RDF/OWL • Flexible model architecture • Extensible • Interoperable • Current Importers: • JDBC • XML Schema • MetaMatrix XMI Models Importer Demo
Lexical Matching • Uses lexical distance measures to determine lexical similarity. • Fastest matching technique • Requires no work other than importing schemas • Often yields interesting results Lexical Matching Demo
Create Vocabulary from Schemas • A Vocabulary is • A set of symbols • Occurrences of those symbols in your schemas • Binding of each symbol to one or more semantic concepts • Created by MatchIT from schemas using tokenization algorithms. • Reusable
Tokenization Algorithms • Different schemas require different tokenization techniques. • Tokenization algorithms determine how symbols are extracted from schemas: • Capitalization • Delimiters • English Language Vocabulary Demo
Matching Techniques • MatchIT currently uses two types of matching techniques: • Lexical Matching • Attempts to determine similarity based on the lexical distance between them. • Semantic Matching • Attempts to determine similarity based on the ontological distance between them within a semantic knowledge base.
Semantic Matching • How semantically similar are two concepts?
Semantic Matching • Uses knowledge base distance measures to determine semantic similarity. • Presents ranked candidate matches • Based on semantics captured in Vocabularies • The only way to effectively find relationships between lexically dissimilar symbols: GenderCode SexCode Provider Supplier Amount Quantity Semantic Matching Demo
3rd Party Integration • MatchIT Integration • MatchIT Java API • Stand-alone application • Embeddable application (as Eclipse plug-ins). • Hides unapproved matches • Useful for various 3rd Party applications: • Data Integration • Data Discovery • Ontology Mediation • Search • Metadata Management • Data Cleansing MetaMatrix Demo
Questions? MatchIT 30-day trial available at http://www.revelytix.com Michael Schidlowsky michaels@revelytix.com