240 likes | 424 Views
Agenda . Introduction Schema Matching in General Why How Who MOMIS Architecture The Global Schema Common Thesaurus Clustering of Elements Construction of the Global Schema. Introduction. Industrial Electronics and Automatic Control TU-Wien (1983-1988) with Siemens since 1988
E N D
Agenda • Introduction • Schema Matching in General • Why • How • Who • MOMIS • Architecture • The Global Schema • Common Thesaurus • Clustering of Elements • Construction of the Global Schema
Introduction • Industrial Electronics and Automatic Control TU-Wien (1983-1988) • with Siemens since 1988 • Expertsystem Prototypes for Payload Monitoring • SW for Power Utilities • Datamodel for Hydroscheduling • Loadforcast • Cargo-Billing System for the ÖBB • Customizing Product Lifecycle Management System • Thesis on „Schema Matching“ • rainer.dobiasch@siemens.com
Schema Matching in General - Why • Enterprise Integration • Billing System • Customer Care System • Intranet • Webdata extraction • .....
Schema Matching in General - How • (Analyse Data to generate Schemas) • Try to find „affinity“ between the Elements of the involved Schemas • Lexical Affinity (Acronyms, Abbreviations, Derivations) • Structural Affinity • Refine „Affinity“ Values • (Generate a Global Schema) • Set up Rules for the Mapping • (Generate Wrappers for the Data Sources)
Schema Matching in General - Who • MOMIS: Sonja Bergamaschi - University of Modena • ARTEMIS: Silvana Castano – University of Milano • Similarity Flooding: Sergey Melnik – Stanford University • Cupid: Philip A. Bernstein – Microsoft • COMA: Eduard Rahm – University of Leipzig • http://www.ifi.unizh.ch/stff/pziegler/IntegrationProjects.html
MOMIS Mediator envirOnment for Multiple Information Sources • ARTEMIS Analysis and Reconcilation Tool Environment for Multiple Information Sources
MOMIS Architecture Artemis Mediator User Global Schema Builder ODB-Tools Query Manager ODLI3 Wrapper Wrapper Wrapper FileSystem DB
Global Schema Builder Architecture source schemata Global Schema Builder SLIM Source Lexical Integrator Module intensional inter-schema relationships WordNet intensional intra-schema relationships Designer SI-Designer SIM Source Integrator Module ODB-Tools inferred relationships clusters Common Thesaurus inferred relationships Global Schema & Mapping Tables Artemis
SI-Designer Architecture Wrapper Local Schemata Local Schemata Acquisition Wrapper Intensional intra/inter schema relationships extraction SLIM SIM Acquisition of relations provided by the designer Designer Inferred and validated relationships ODB-Tools Common Thesaurus Global Clusters & Mapping Tables Generation Cluster Generation Artemis
Schema Concept • Schemata: S = {S1, S2,... SN} • Si={e1i, e2i, ..., emi} • eji=< n(eji), SP(eji), DP(eji) > • P(eji) =SP(eji) DP(eji ) • pk P(eji) • pk = < npk, dpk, (mc, MC) pk > • dpk PRE dpk REF • PRE = {integer, smallint,decimal, float, char[n]}
Building the Global Schema • Common Thesaurus • Clustering of Elements • Construction of the Global Schema
Building the Common Thesaurus • MOMIS makes use of WordNet • Wordnet contains • Synsets (Collection of Words associated to a meaning) • Relations between Synsets (BT/NT/RT) • Relations between Words • MOMIS+User assign schema elements to meanings • Term = < e, meaning> • Common Thesaurus • Terms • Associations from Wordnet • Derived Associations • User-Inserted Associations
Affinity Calculation - Weights • Association Type Weights: 0 < ≤ 1 • SYN ≤ BT/NT • SYN = 1 • BT/NT = 0.8
Affinity Calculation – Name Affinity • Name Affinity • Name Affinity Coefficient
Affinity Calculation – Structural Affinity – Affinity Classes • Properties grouped in Affinity Classes • Well-formed Affinity Classes
Clustering of Elements • Affinity Matrix • Initial Cluster for each element • Iteration till Dimension of Matrix = 1 • Search for clusters with highest Affinity • Merge the clusters • Compute new Affinity values
Hierarchical Clustering Algorithm Let SE be the set of schema elements to be clustered Let k be the number of schema elements in SE 1. /* Compute k*(k-1)/2 Global Affinity coefficients For i:=1 to k do M[i,i] := 1 For j :=1 to k do M[j,i]:=M[i,j]:= GA(ei,ej) 2. Place each schema element ei into a cluster Cli 3. Repeat Select the pair of Clusters Cli, Cljof current clusters such that M[j,i] is max of all M Cli:= Cli Clj For l:=1 to k do If l ≠ j then M[l,i]:=M[i,l]:=max{M[l,i],M[l,j]} remove row and colum j from M k:= k-1 until k=1
Unification Rules • Name
Unification Rules (cont) • Domain • Cardinality
Tuning Parameters Summary • Weights for Associations SYN ≤ BT/NT • Threshold for Name Affinity • Weight for Affinity Computation (NA<->SA) SA+ SA=1