180 likes | 420 Views
A Survey of Approaches to Automatic Schema Matching. Erhard Rahm Philip A. Bernstein. The VLDB Journal 10:334-350 (2001). The Problem. Schema matching Input schemas Output mappings Motivations Manual schema matching Generic and customizable schema matching. Application Domains.
E N D
A Survey of Approaches to Automatic Schema Matching Erhard Rahm Philip A. Bernstein The VLDB Journal 10:334-350 (2001)
The Problem • Schema matching • Input schemas • Output mappings • Motivations • Manual schema matching • Generic and customizable schema matching
Application Domains • Schema Integration: Structures and Terminological relationships • Data warehouses: Source-to-warehouse Transformation • E-commerce: Message Translation • Semantic query processing: A Run-time Scenario
The Match Operator • Representations of Input Schemas and Output Mapping • Schema representation • Schema elements • Structure • Mapping representation • Mapping elements • Mapping expressions • Matching Function • Mathematically unsatisfying • Heuristics
Architecture for Generic Match Tool 2 (E-business schemas) Tool 1 (Portal schemas) Tool 3 (Data warehousing schemas) Global libraries (dictionaries, schemas, …) Schema import/export Generic Match Implementation Internal schema representation
Classification of Approaches • Individual matchers • Instance vs Schema • Element vs Structure Matching • Language vs Constraint • Matching Cardinality (1:1, 1:n, n:1, and n:m) • Auxiliary Information • Combinations of multiple matchers
Schema-level Approaches • Granularity of match (element-level vs. structure-level) • Match cardinality • Linguistic approaches • Constraint-based approaches • Reusing schema and mapping information
Linguistic Approaches • Name Matching • Equality of names • Equality of canonical name representations • Equality of synonyms • Equality of hypernyms • Similarity of names based on common substrings, edit distance, pronunciation, and soundex • User provided name matches • Description Matching • Ex. S1: empn //employee name • Ex. S2: name //name of employee
Instance-level Approaches • Linguistic characterization • Information retrieval techniques • Ex. Extracting keywords and themes • Constraint-based characterization • Numeric value ranges • Numeric value averages • Character patterns (PhoneNr, ISBNs,, SSNs…)
Combining Different Matchers • Hybrid matchers • Hard-wired combination of multiple matching criteria • Better performance • Composite matchers • Independent basic matchers • Flexible execution order
Sample Approaches • SEMINT • LSD • SKAT • TranScm • DIKE • ARTEMIS • CUPID
Sample Approaches • SEMINT • LSD • SKAT • TranScm • DIKE • ARTEMIS • CUPID
SEMINT LSD TranScm Cupid BYU Approach Schema Type Relational, files XML SGML, OO XML, relational OSM Metadata representation Attribute-based XML Labeled graph Extended ER OSM Match granularity 1:1 1:1 1:1 1:1 and 1:n 1:1 and n:m Schema-level match Name-based * * * * Constraint-based * * * * Structure matching * * * * Instance-level match Text-oriented * * Constraint-oriented * * * Reuse/auxiliary information used * * * * Combination of matches Hybrid Composite Hybrid Hybrid Composite Manual work/ user input * * * * * Application area Data integration Data Integration Data Translation Generic Generic Remarks Neural network
Conclusion • Propose a taxonomy that covers many of the existing approaches • Suggest quantitative work on the relative performance and accuracy of different approaches