100 likes | 213 Views
MOMIS Query Manager Prototipo di un query manager per la gestione di query globali. D. Beneventano, S. Bergamaschi, F. Mandreoli Università degli Studi di Modena e Reggio Emilia. D2I Integrazione, Warehousing e Mining di sorgenti eterogenee
E N D
MOMIS Query Manager Prototipo di un query manager per la gestione di query globali D. Beneventano, S. Bergamaschi, F. Mandreoli Università degli Studi di Modena e Reggio Emilia D2I Integrazione, Warehousing e Mining di sorgenti eterogenee Tema 1: Integrazione di dati provenienti da sorgenti eterogenee ROMA, 11 OTTOBRE 2002
Example Local classes (relational) L1(firstn,lastn,year,e_mail) L2(name,e_mail,dept_code,s_code) INTEGRATION Global Class: G Global Class Schema: G S(G) = (Name,E_mail,Year,Dept,Section) Local Class Schemata w.r.t. Global Class: S(L1) = (Name,E_mail,Year) S(L2) = (Name,E_mail,Dept,Section)
Data cleaning and reconciliation • Integration at the extensional level • the data returned by various sources need to be converted/reconciled • interpretation and merging of the data provided by the sources Schema Translation • (example: firstn and lastnto Name) Data conversion • (example: ‘Rita’ + ‘Verde’ to ‘Rita Verde’) L1 L2
Redundancy and Reconcilation Hypothesis Instances of the same object in different local class must have the same value for a common attribute L2 L1 O1 O O2 L2 L1 O1 O O O2
Object fusion To identify instances of the same object and fuse them:JoinMap - join criteria among classes L2 L1 O1 O O2 O1 O O O2 JoinMap JM(L1,L2) L1.Name=L2.Name
Object fusion : indirect map L1 L2 O1 O2 O3 O1 O2 O2 O3 JoinMap JMCS.S,UNI.RS
Global Class Instance • GAV with “Single database property” • (Lenzerini - Data Integration: A Theoretical Perspective, PODS 2002) • The computation is based on “FULL DISJUNCTION” • (Rajarama, Ullman - Integrating Information by Outerjoins • and Full Disjunctions. PODS 1996) • “Computing the natural outerjoin of many relations in a way that preserves all possible connections amon facts” L1 L2 G: select S(G) from L1 outer join L2 on JM(L1,L2) G
FULL DISJUNCTION COMPUTATION • Question: when a full disjunction can be computed by some sequence of natural outerjoins • Answer: there is a natural outerjoin sequence producing the full disjunction if and only if the set of relation schemes forms a connected, -acyclic hypergraph (Fagin - 1983) A Global class with n local classes, n >2 :-cyclic hypergraph L1 JM(L1,L3) JM(L1,L2) New Method JM(L2,L3) L3 L2 Example: n = 3 : G: select S(G) from (L1 outer join L2 on JM(L1,L2)) outer join (L1 outer join L3 on JM(L1,L3)) on JM(L2,L3)
Query rewiting method Global query (in DNF) : Q1 Local query for the class L : Q1_L where-condition of Q1_L : all factors of DNF which can be solved in L residual factors of Q1 : factors not included in all local where-condition select-list of Q1_L : attributes of the select-list of Q1 + residual factors +JoinMap Global query reformulation full disjunction based on the JoinMap + residual factors
Query rewiting example Global query Q1: select E_mail from G where (E_mail like ’*.it' and Dept='Dept1') or (E_mail like ’*.it' and Year=2) Local queries Q1_L1: select Name, Year, E_mail from L1 where (E_mail like ’*.it' or Year=2) Q1_L2: select Name, Dept, E_mail from L2 where (E_mail like ’*.it' or Dept='Dept1') Global query reformulation: Q1: select E_mail from Q1_L1 outer join Q1_L2 on JM where (Dept='Dept1' or Year=2) residual factor