200 likes | 286 Views
Schema Mediation in Peer Data Management Systems ( Alon Y. Halevy , Zachary G. Ives , Dan Suciu , Igor Tatarinov ). Presented by Jiwen Sun, Lihui Zhao 24/3/2004. Introduction. Why Peer to Peer Integration with Semantics Flexibility, extensible The paper’s Contribution Piazza Project
E N D
Schema Mediation in Peer Data Management Systems(Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor Tatarinov ) Presented by Jiwen Sun, Lihui Zhao 24/3/2004
Introduction • Why Peer to Peer • Integration with Semantics • Flexibility, extensible • The paper’s Contribution • Piazza Project • A peer mapping language • Algorithm for query answering
Introduction • Traditional Integration Formalisms • Global as View (GAV) • Mediated schema as views over data sources • Local as View (LAV) • Data sources as views of mediated schema GAV: T :- S1, S2, S3 LAV: S1 T Med. Schema T S1 S2 S3
Introduction • GAV and LAV in Piazza (P2P) environment • Define semantic relations locally • Answer queries globally
Peer Peer Relations Stored Relations Introduction • Properties of a peer Peer Description Peer Description Storage Description
Introduction • Emergency Response Example
Problem Definition • PPL – Peer-Programming Language • Storage Description - Mappings between stored relations and peer relations A : R Q • Peer Mapping - Mappings between peer relations • Inclusion: Q1(A1) Q2(A2) • Definitional: P(x) :- P1(x)
Problem Definition • GAV-like Definition – Definition in Datalog 9DC : SkilledPerson(PID, “Doctor”): - H :Doctor(SID, h, l, s, e) 9DC : SkilledPerson(PID, “EMT”) : - H : EMT(SID, h, vid, s, e) 9DC : SkilledPerson(PID, “EMT”) : - FS : Schedule(PID, vid), FS : FirstResponse(vid, s, l, d), FS : Skills(PID, “medical”)
Problem Definition • LAV-like Definition – Inclusion Definition LH : CritBed(bed, hosp, room, PID, status) H : CritBed(bed, hosp, room), H : Patient(PID, bed, status) LH : EmergBed(bed, hosp, room, PID, status) H : EmergBed(bed, hosp, room), H : Patient(PID, bed, status)
Problem Definition • Query Answering in a PDMS • A peer answers a query with local stored data • Reformulate the query and forward to neighbour peers • Query is answered by chaining of mapped peers • Mappings is expanded with a rule-goal tree
Complexity of Query Answering • Restrictions on peer mappings decides complexity • In general, finding all certain answers is undecidable. • Acyclic Peer Mappings • Only Inclusion Mappings are used => polynomial time • Cyclic Peer Mappings • Replication type cycle only => polynomial time • Comparison Predicates helps reduce complexity
Query Reformulation Algorithm • AlgorithmOverview • Building a rule-goal tree • Expand tree by combining and interleaving GAV and LAV • Leaves in Storage Description forms are the query results • Tree size may be huge
Q(f1,f2) Q(f1,f2) q q SameEngine(f1,f2,e) Skill(f1,s) Skill(f2,s) Building a rule-goal tree 1.Make the query root of tree 2. Find views cover the query, expand the query use the views Q(f1,f2) :- SameEngine(f1,f2,e),Skill(f1,s),Skill(f2,s)
Q(f1,f2) q r0 r1 r1 SameSkill(f2,f1) AssignedTo(f1,e) AssignedTo(f2,e) SameSkill(f1,f2) Building a rule-goal tree 3. Mappings between peer schemas r0: SameEngine(f1, f2, e) :- AssignedTo(f1,e), AssignedTo(f2,e) r1: SameSkill(f1, f2)Skill(f1,s), Skill(f2,s) SameEngine(f1,f2,e) Skill(f1,s) Skill(f2,s)
Q(f1,f2) q r1 r1 r3 r3 r2 r0 r2 SameSkill(f2,f1) AssignedTo(f1,e) AssignedTo(f2,e) SameSkill(f1,f2) SamEngine(f1,f2,e) Skill(f1,s) S1(f2,e,_) S1(f1,e,_) S2(f1,f2) S2(f2,f1) Skill(f2,s) Building a rule-goal tree 4. Repeat until all leaves are storage relations Reformulated query: Q’(f1,f2) :- S1(f1,e,_), S1(f2,e,_), S2(f1,f2) S1(f1,e,_), S1(f2,e,_), S2(f2,f1)
Query Reformulation Algorithm • Optimizations • Techniques for Pruning Rule-goal Tree Branches • Memorization of nodes • Constraint on nodes, which contradict query • Redundancy detection • Maximizing the techniques • Orderfor building tree is important • Prioritize node in Piazza system
Experiment • Bottleneck is finding rewritings from tree • Tree depth matters, not number of nodes
Related Work • Answering Queries Using Views (AQUV) • Answering queries using views [Halevy] • Minicon: A scalable algorithm for answering queries using views [Pottinger & Halevy] • PDMS vs. Database Federation • DB federation – mapping between stored relations • Loose relationship => scales better • Peers can play different roles • Chaining through peer mappings to locate data
Summary • PDMS is superior over data integration systems • Ad-hoc, scalable • Decentralized • PPL describes mappings using GAV/LAV • A query reformulation algorithm produces practical results