1.01k likes | 1.03k Views
Explore innovative query processing and evaluation methods in XML data with specific focus on Creta and Athens islands. Experiment and evaluate different query structures and languages efficiently.
E N D
ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML ΣΤΕΦΑΝΟΣ ΣΟΥΛΔΑΤΟΣ
ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion
Hotels Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Creta
Hotels Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Search problem Name: Xiaoying Wu Place:Athens Center, Heraklio Purpose:Sightseeing Problem: structural difference Parthenon (438 BC) Phaistos’ Disk (1700 BC) Creta
Hotels Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Search problem Name:Theodore Dalamagas Place:Islands Purpose:Sea sports Problem: structural inconsistency Windsurf Jet ski Creta
Hotels Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Search problem Name:Dimitri Theodoratos Place:Heraklio Purpose:HDMS Conference Problem: unknown structure HDMS 2008 Creta
Difficulties on Querying XML Data Search problem Name:Stefanos Souldatos Place:Any island Purpose:Escape from PhD! Problem: multiple sources Creta 1400 islands theHotel.gr hotels.gr holidays.gr
Hotels Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Can we use existing query languages (XPath, XQuery) to express our queries? Can we use existing techniques to evaluate our queries? Creta
Hotels 1 2 3 Hotels Hotels City City City Athens Athens Athens 0% structure 100% 4 5 Hotels Hotels City City City City Athens Island Athens Island Partial Queries in XPath Path queries Tree-pattern queries 1. //Hotels[descendant-or-self::*[ancestor-or-self::City][ancestor-or-self::Athens]] 2. //Hotels[/City[descendant-or-self::*[ancestor-or-self::Athens]]] 3. //Hotels[/City//Athens] 4. //Hotels[/City[descendant-or-self::*[ancestor-or-self::Athens]]][//City [descendant-or-self::*[ancestor-or-self::Island]]] 5. //Hotels[/City//Athens][/City//Island]
r a c c b a d Partial Queries root node (optional) query node labelled by “a” child relationship descendant relationship r a
Conclusions (up to now) • Need for queries with partial structure • We introduce partial queries • Partial queries can be expressed in XPath
ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion
r r a c c c a b a a d b d Query Processing QUERY PROCESSING QUERY EVALUATION partial path query partial path query in canonical form
r a c c b a d Query Processing • Full form • Satisfiability • Redundant nodes • Canonical form
IR1 r a c c b a d Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form
IR4 r a c c b a d Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form
r a c c b a d IR4 Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form
IR6 r a c IR8 c b a d Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form
r c c a a b d Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form
r x y c c a a b d Query Processing • Full form • Satisfiability • Redundant nodes • Canonical form A query is unsatisfiable if its full form contains a trivial cycle:
r c a a x x y b d y y y y y z y z Query Processing A node y is redundant if one of the following patterns occur: • Full form • Satisfiability • Redundant nodes • Canonical form a) c) c b)
r c a a b d Query Processing • Full form • Satisfiability • Redundant nodes • Canonical form canonical form of satisfiable query = full form – IR2 – IR3 – redundant nodes
r r d b d b c e c e Canonical Form partial path query directed acyclic graph with same-path constraint partial tree-pattern query directed acyclic graph with same-path constraints
Conclusions (up to now) • Need for queries with partial structure • We introduce partial queries • Partial queries can be expressed in XPath • We can process any partial query dag
ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion
r r d b d b c e c e Evaluation Algorithms Partial Path Queries PQGen: Produce path queries PathJoin: Decompose into paths PartialMJ: Dec. into spanning tree paths PartialPathStack: novel holistic Partial Tree-Pattern Queries TPQGen: Produce TPQs PPJoin: Decompose into PPs PartialTreeStack: novel holistic
r r r r r b b d d d d d b b e c e c e b r e c e c c d b c e Partial Path Queries: PQGen Producing all possible path queries… 1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results
r r r r r b b d d d d d b b e c e c e b r e c e c c d b c e Partial Path Queries: PQGen Producing all possible path queries… 1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results
r r r r r b b d d d d d b b e c e c e b r e c e c c d b c e Partial Path Queries: PQGen Producing all possible path queries… 1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results
r r r d d b c c e r d b c e Partial Path Queries: PathJoin Decomposing into root-to-leaf paths… 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity , path )
r r r d d b c c e r d b c e Partial Path Queries: PathJoin Decomposing into root-to-leaf paths… 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity , path )
r r r d d b c c e r d b c e Partial Path Queries: PathJoin Decomposing into root-to-leaf paths… 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity , path )
r r d b c e r d b c e Partial Path Queries: PartialMJ Using a spanning tree… r d b c e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity , structural , path )
r r d b c e r d b c e Partial Path Queries: PartialMJ Using a spanning tree… r d b c e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity , structural , path )
r r d b c e r d b c e Partial Path Queries: PartialMJ Using a spanning tree… r d b c e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity , structural , path )
r r d b c e r d b c e Partial Path Queries: PartialMJ Using a spanning tree… r d b c e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity , structural , path )
r b d c e r d b c e Sr Sr Sb Sb Sd Sd Sc Sc Se Se Partial Path Queries: PartialPathStack leaf node tree PathStack r b1 d1 Results: leaf nodes PartialPathStack c1 e1 d2 c2 e2 Results:
r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack r r b1 d1 Results: leaf nodes PartialPathStack c1 e1 r d2 Sr Sb Sd Sc Se c2 e2 Results:
r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack r r b1 b1 d1 Results: leaf nodes PartialPathStack c1 e1 r b1 d2 Sr Sb Sd Sc Se c2 e2 Results:
r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack r r b1 d1 b1 d1 Results: leaf nodes PartialPathStack c1 e1 d1 r b1 d2 Sr Sb Sd Sc Se c2 e2 Results:
r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack r c1 r b1 d1 b1 d1 Results: leaf nodes PartialPathStack c1 e1 d1 r b1 c1 d2 Sr Sb Sd Sc Se c2 e2 Results:
r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack r c1 e1 r b1 d1 b1 d1 Results:ra1b1d1c1e1 leaf nodes PartialPathStack c1 e1 d1 r b1 c1 e1 d2 Sr Sb Sd Sc Se c2 e2 Results:ra1b1d1c1e1
r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack d2 r c1 r b1 d1 b1 d1 Results: ra1b1d1c1e1 leaf nodes PartialPathStack c1 e1 d2 d1 r b1 c1 e1 d2 Sr Sb Sd Sc Se c2 e2 Results: ra1b1d1c1e1
r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack d2 c2 r c1 r b1 d1 b1 d1 Results: ra1b1d1c1e1 leaf nodes PartialPathStack c1 e1 c2 d2 d1 r b1 c1 e1 d2 Sr Sb Sd Sc Se c2 e2 Results: ra1b1d1c1e1,ra1b1d1c2e1
r b d c e r d b c e Sr Sb Sd Sc Se Partial Path Queries: PartialPathStack leaf node tree PathStack d2 c2 r c1 r b1 d1 e2 b1 d1 Results: ra1b1d1c1e1, ra1b1d1c1e2 leaf nodes PartialPathStack c1 e1 c2 d2 e2 d1 r b1 c1 e1 d2 Sr Sb Sd Sc Se c2 e2 Results: ra1b1d1c1e1,ra1b1d1c2e1,ra1b1d1c1e2
r b d c e r d b c e Partial Path Queries: PartialPathStack tree PathStack [Bruno et al, 2002] r Optimal for path queries: O(input + output) b1 d1 [Souldatos et al, 2007] PartialPathStack c1 e1 Optimal for partial path queries: O(input*indegree+output*outdegree) d2 c2 e2
r r d b d b c e c e Evaluation Algorithms Partial Path Queries PQGen: Produce path queries PathJoin: Decompose into paths PartialMJ: Dec. into spanning tree paths PartialPathStack: novel holistic Partial Tree-Pattern Queries TPQGen: Produce TPQs PartialPathJoin: Decompose into PPs PartialTreeStack: novel holistic
r r b d d e b c e r c d b c e Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results
r r b d d e b c e r c d b c e Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results