220 likes | 479 Views
Generic Schema Matching with Cupid. Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference. Schema Matching. Schema Matching (Cont.). Definition: Finding a mapping between those elements of two schemas that semantically correspond to each other
E N D
Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27th VLDB Conference
Schema Matching (Cont.) • Definition: Finding a mapping between those elements of two schemas that semantically correspond to each other • Applications • Schema integration • Data translation • XML message mapping • Data warehouse loading • Goal
Taxonomy • Schema vs. Instance based • Element vs. Structure granularity • Linguistic based • Constraint based • Matching cardinality • Auxiliary information • Individual vs. Combinational
Cupid • Schema-based • Automated linguistic-based matching • Both element-based and structure-based • Biased toward similarity of atomic elements • Exploits internal structure • Exploits keys, referential constraints and views • Makes context-dependent matches of a shard type • 1:n mapping
Similarity Coefficient Computation • First Phase: Linguistic matching • Names • Data types • Domains • Linguistic similarity coefficient:lsim • Second Phase: Structural matching • Contexts • Linguistic similarity coefficients • Structural similarity coefficient:ssim • Hybrid (wsim = w_struct * ssim + (1-w_struct) * lsim)
Linguistic Matching • Normalization • Tokenization • Expansion • elimination • Categorization • Data types • Schema hierarchy • Linguistic contents • Comparison—Linguistic Similarity Coefficient (lsim) • Thesaurus • Sub-string matching
Bottom-up Mutually Recursive Structural Matching
General Schemas • Schema Graphs • Elements • Relationships(containment, aggregation, and IsDerivedFrom) • Matching Shard Types (context dependent mappings) • Matching Referential Constraints
Other Features • Optionality • Views • Initial Mappings • Lazy Expansion • Pruning Leaves
Comparative Study • Algorithms • MOMIS • DIKE • Cupid • Canonical Examples • Real World Example
Canonical Examples • Identical schemas • Atomic elements with same names, but different data types • Atomic elements with same data types, but different names (a prefix or suffix is added) • Different class names, but atomic elements same names and data types • Different nesting of the data – similar schemas with nested and flat structures • Type substitution or context dependent mapping
Experimental Conclusions • Linguistic matching • Thesaurus • Linguistic similarity with no structure similarity • Granularity of similarity computation • Leaves • Structure information beyond the immediate vicinity • Context-dependent mappings • Performance parameters
Future Work • A Truly Robust Solution • Machine learning applied to instances • Natural language technology • Pattern matching to reuse known matches • Immediate Challenges • Off-the-shelf thesaurus • Schema annotations • Automatic tuning of the control parameters • Scalability analysis and testing • More comparative analysis of algorithms