440 likes | 726 Views
Containment of Partially Specified Tree-Pattern Queries. Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Pawel Placek (NJIT, USA) Stefanos Souldatos (NTUA, GREECE) Timos Sellis (NTUA, GREECE).
E N D
Containment of Partially Specified Tree-Pattern Queries Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Pawel Placek (NJIT, USA) Stefanos Souldatos (NTUA, GREECE) Timos Sellis (NTUA, GREECE)
IntroductionData ModelAdditional ConceptsQuery ContainmentExperimentsConclusion
r GREECE USA ATHENS YAMAHA BMW HONDA YAMAHA BMW ON-OFF TRAVEL TRAVEL ON-OFF TRAVEL 200cc F650GS 650cc VARADERO 200cc 650cc SERROW F650GS F650 NJ 125cc 1000cc SERROW Motivating Example () • Tree structure (e.g. XML) with motorbike spare parts. • We search for spare parts. • BUT… Stefanos Souldatos - HDMS 2006
r GREECE USA ? ATHENS YAMAHA BMW HONDA YAMAHA BMW ON-OFF TRAVEL TRAVEL ON-OFF TRAVEL 200cc F650GS 650cc VARADERO 200cc 650cc SERROW F650GS F650 NJ 125cc 1000cc SERROW Motivating Example () • Dimitri Theodoratos lives in NJ. • He has a Yamaha Serrow motorbike in Greece. • He searches for spare parts in Greece or USA. structural difference Stefanos Souldatos - HDMS 2006
r GREECE USA ATHENS YAMAHA BMW HONDA YAMAHA BMW ON-OFF TRAVEL TRAVEL ON-OFF TRAVEL 200cc F650GS 650cc VARADERO 200cc 650cc SERROW F650GS F650 NJ 125cc 1000cc SERROW Motivating Example () • Theodore Dalamagas has a BMW motorbike. • He looks for spare parts worldwide. structural inconsistency ../650cc/F650GS ../F650GS/650cc Stefanos Souldatos - HDMS 2006
r GREECE USA ATHENS YAMAHA BMW HONDA YAMAHA BMW ON-OFF TRAVEL TRAVEL ON-OFF TRAVEL 200cc F650GS 650cc VARADERO 200cc 650cc SERROW F650GS F650 NJ 125cc 1000cc SERROW Motivating Example () • Stefanos Souldatos has a Honda Varadero. • But, he is not fully aware of the tree structure. unknown structure Stefanos Souldatos - HDMS 2006
r r r GREECE GREECE GREECE USA USA USA ATHENS ATHENS ATHENS YAMAHA YAMAHA YAMAHA BMW BMW BMW HONDA HONDA HONDA YAMAHA YAMAHA YAMAHA BMW BMW BMW ON-OFF ON-OFF ON-OFF TRAVEL TRAVEL TRAVEL TRAVEL TRAVEL TRAVEL ON-OFF ON-OFF ON-OFF TRAVEL TRAVEL TRAVEL 200cc 200cc 200cc F650GS F650GS F650GS 650cc 650cc 650cc VARADERO VARADERO VARADERO 200cc 200cc 200cc 650cc 650cc 650cc SERROW SERROW SERROW F650GS F650GS F650GS F650 F650 F650 NJ NJ NJ 125cc 125cc 125cc 1000cc 1000cc 1000cc SERROW SERROW SERROW Motivating Example () • Pawel Placek wants to buy a motorbike that he can easily find spare parts for. • He searches in many different tree structures. source integration Stefanos Souldatos - HDMS 2006
Motivation Querying tree-structured data BUT structure is not always strictly defined user does not always deal with structure: Find Honda spare parts in Greece. Stefanos Souldatos - HDMS 2006
IntroductionData ModelAdditional ConceptsQuery ContainmentExperimentsConclusion
R C L E B M T r GREECE USA ATHENS YAMAHA BMW HONDA YAMAHA BMW ON-OFF TRAVEL TRAVEL ON-OFF TRAVEL 200cc F650GS 650cc VARADERO 200cc 650cc SERROW F650GS F650 NJ 125cc 1000cc SERROW Dimension Graph dimension graph = summary of the tree structure DIMENSIONS R (oot) C (ountry) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - HDMS 2006
R C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T Partially Specified Tree-pattern Query • Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info) DIMENSIONS R (oot) C (ountry) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - HDMS 2006
R C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T PSP p1 PSP *p2 Partially Specified Tree-pattern Query • Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info) DIMENSIONS R (oot) C (ountry) partially specified paths (PSP) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - HDMS 2006
R C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T PSP p1 PSP *p2 Partially Specified Tree-pattern Query • Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info) DIMENSIONS R (oot) C (ountry) output path (*) partially specified paths (PSP) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - HDMS 2006
R C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T PSP p1 PSP *p2 Partially Specified Tree-pattern Query • Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info) parent child ancestor descendant DIMENSIONS R (oot) C (ountry) output path (*) partially specified paths (PSP) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - HDMS 2006
R C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T PSP p1 PSP *p2 Partially Specified Tree-pattern Query node sharing expression (NSE) • Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info) parent child ancestor descendant DIMENSIONS R (oot) C (ountry) output path (*) partially specified paths (PSP) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - HDMS 2006
IntroductionData ModelAdditional ConceptsQuery ContainmentExperimentsConclusion
C = {Greece} C = {Greece} B = {BMW} B = {BMW} M = ? E = ? PSP p1 PSP *p2 Additional Concepts Full Form Query Stefanos Souldatos - HDMS 2006
R C = {Greece} C = {Greece} R C L B = {BMW} B = {BMW} C = {Greece} E B B = {BMW} M = ? E = ? M T T PSP p1 PSP *p2 E M Additional Concepts Full Form Query Dimension Trees DIMENSION TREES = QUERY + GRAPH Stefanos Souldatos - HDMS 2006
IntroductionData ModelAdditional ConceptsQuery ContainmentExperimentsConclusion
Absolute Containment Each result of Q1 is a result of Q2. Q1 Q2 Stefanos Souldatos - HDMS 2006
Absolute Containment Each result of Q1 is a result of Q2. Q1 Q2 homomorphism from Q2 to Q1 Stefanos Souldatos - HDMS 2006
C C C C B B M M E E Absolute Containment Each result of Q1 is a result of Q2. Q1 Q2 homomorphism from Q2 to Q1 Q1 Q2 PSP *p1 PSP p2 PSP *p3 PSP p4 Stefanos Souldatos - HDMS 2006
Relative Containment (w.r.t. G) Each result of Q1 in G is a result of Q2 in G. Q1 G Q2 Stefanos Souldatos - HDMS 2006
Relative Containment (w.r.t. G) Each result of Q1 in G is a result of Q2 in G. Q1 G Q2 homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1 Stefanos Souldatos - HDMS 2006
R R C C B B T T M E E Relative Containment (w.r.t. G) Each result of Q1 in G is a result of Q2 in G. Q1 G Q2 homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1 A dimension tree of Q1 A dimension tree of Q2 Stefanos Souldatos - HDMS 2006
Relative Containment Heuristic 100msec Relative Containment (RC) 1msec Absolute Containment (AC) Stefanos Souldatos - HDMS 2006
Relative Containment Heuristic sound but not complete • extract structural information from the Dimension Graph • insert it in the query Q1 • check Q1 Q2 instead of Q1 G Q2 Relative Containment Heuristic (RCH) 100msec Relative Containment (RC) 1msec Absolute Containment (AC) Stefanos Souldatos - HDMS 2006
R C L E B M T Relative Containment Heuristic • Example Q1 Q2 Q1 Q2 C = ? B = ? B = ? T = ? PSP *p1 PSP *p2 Stefanos Souldatos - HDMS 2006
R C L E B M T Relative Containment Heuristic • Example Q1 Q2 B=>T : R->C, C=>B Q1 Q2 C = ? B = ? B = ? T = ? PSP *p1 PSP *p2 Stefanos Souldatos - HDMS 2006
R C L E B M T Relative Containment Heuristic • Example Q1 Q2 B=>T : R->C, C=>B Q1 Q2 R = ? C = ? C = ? B = ? B = ? Q1 G Q2 T = ? PSP *p1 PSP *p2 Stefanos Souldatos - HDMS 2006
IntroductionData ModelAdditional ConceptsQuery ContainmentExperimentsConclusion
Experiments • We measured… • execution time for • Absolute Containment (AC) • Relative Containment (RC) • Relative Containment Heuristic (RCH) • accuracy for RCH • …for various graph sizes • …for various query sizes Stefanos Souldatos - HDMS 2006
Time Graph dimensions: 30 Graph dimensions: 40 Graph dimensions: 20 RC RC RC RCH RCH RCH Time (msec) AC AC AC Graph paths: 10 - 80 Graph paths: 15 - 120 Graph paths: 20 - 160 Query PSPs: 1 Query PSPs: 2 RC RC Time (msec) RCH RCH AC AC Nodes per PSP: 3 - 6 Nodes per PSP: 3 - 6 Stefanos Souldatos - HDMS 2006
Accuracy of RCH • 80% for graphs of common sizes • based on XML benchmarks (XMach, XMark, etc.) • 50% for graphs of higher density Stefanos Souldatos - HDMS 2006
IntroductionData ModelAdditional ConceptsQuery ContainmentExperimentsConclusion
Conclusion • Query Containment forPartially Specified Tree-Pattern Queries (PSTPQs). • Soundtechnique for checking Relative Query Containment • Time: one order of magnitude • Accuracy: over 80% Stefanos Souldatos - HDMS 2006
A A B B C C PSP p1 PSP p2 PSP *p3 Future Work • Heuristics for checking Relative Containment • precomputed and on-the-fly • trade-off between time and accuracy • Special forms of queries, e.g. swings: Stefanos Souldatos - HDMS 2006
Links Introduction (2-9) Data Model (10-17) Additional Concepts (18-20) Query Containment (21-32) Experiments (33-36) Conclusion (37-41) Appendix (42-46) Stefanos Souldatos - HDMS 2006
Who defines thedimensions? • Automatic • XML tags (dimension graph = “path summary”, “path index”, “structural summary”) • Semi-automatic • Graph administrator + XML tags (dimension = group of XML tags) • Graph administrator + ontology • Manual • Graph administrator Stefanos Souldatos - HDMS 2006
R C = {Greece} C = {Greece} C L B = {BMW} B = {BMW} E B M = ? E = ? M T PSP p1 PSP *p2 Inference Rules INFERENCE RULES (IR1) |- R[p1] R[p2] (IR2) A[p1] A[p2], A[p2] A[p3] |- A[p1] A[p3] (IR3) a structural expression that involves A[p] |- R[p] => A[p] (IR4) A[p] B[p] |- A[p] => B[p] (IR5) A[p] => B[p], B[p] => C[p] |- A[p] => C[p] (IR6) A[p] B[p], A[p => C[p] |- B[p] => C[p] (IR7) A[p] B[p], C[p] => B[p] |- C[p] => A[p] (IR8) A[p1] B[p1], B[p1] B[p2] |- A[p2] B[p2] (IR9) A[p1] => B[p1], B[p1] B[p2] |- A[p2] => B[p2] (IR10) A[p1] => B[p1], A[p1] A[p2], R[p2] => B[p2] |- A[p2] => B[p2] (IR11) A[p1] => B[p1], B[p1] B[p2] |- A[p1] A[p2] (IR12) A[p1] B[p1], C[p2] B[p2], D[p1] D[p2] |- D[p1] => A[p1] (IR13) A[p1] B[p1], A[p2] C[p2], D[p1] D[p2] |- D[p1] => A[p1] (IR14) A[p1] => B[p1], B[p2] => A[p2], C[p1] C[p2] |- C[p1] => A[p1] 1. Full Form Query Stefanos Souldatos - HDMS 2006
R R C = {Greece} C = {Greece} R C = {Greece} C L B = {BMW} B = {BMW} C = {Greece} B = {BMW} E B B = {BMW} M = ? E = ? T M T T R R PSP p1 PSP *p2 M E C = {Greece} C = {Greece} M E B = {BMW} B = {BMW} T T E M E M E M Dimension Trees r/Greece/BMW/ *T[*E]/*M r/Greece/BMW/ *T/*M [*E] r/Greece/BMW/ *T/*E/*M r/Greece/BMW/ *T[*M/*E]/*E*M Stefanos Souldatos - HDMS 2006
Previous Approaches • Keyword-based search approach • Absence of structure • Naive approach • All possible query patterns are generated (Honda=>Greece, Greece=>Honda) • Approximation techniques • Relax the query more answers • Traditional integration approach • Global structure and mapping rules Stefanos Souldatos - HDMS 2006