770 likes | 960 Views
Evaluation of Partial Path Queries on XML Data. Stefanos Souldatos (NTUA, GREECE) Xiaoying Wu (NJIT, USA) Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Timos Sellis (NTUA, GREECE). Evaluation of Partial Path Queries on XML Data. Partial path queries Query processing
E N D
Evaluation of Partial Path Queries on XML Data Stefanos Souldatos (NTUA, GREECE) Xiaoying Wu (NJIT, USA) Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Timos Sellis (NTUA, GREECE)
Evaluation of Partial Path Queries on XML Data Partial path queries Query processing Query evaluation Experiments Conclusion
theHotel.gr Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Creta
theHotel.gr Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Search problem Name: Xiaoying Wu Place:Athens Center, Heraklio Purpose:Sightseeing Problem: structural difference Parthenon (438 BC) Phaistos’ Disk (1700 BC) Creta
theHotel.gr Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Search problem Name:Theodore Dalamagas Place:Islands Purpose:Sea sports Problem: structural inconsistency Windsurf Jet ski Creta
theHotel.gr Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Search problem Name:Dimitri Theodoratos Place:Heraklio Purpose:HDMS Conference Problem: unknown structure HDMS 2008 Creta
Difficulties on Querying XML Data Search problem Name:Stefanos Souldatos Place:Any island Purpose:Escape from PhD! Problem: multiple sources Creta 1400 islands theHotel.gr hotels.gr holidays.gr
theHotel.gr Athens City Island Creta Athens Creta Location Island City City Center Poros Chania Heraklio Difficulties on Querying XML Data Can we use existing query languages (XPath, XQuery) to express our queries? Can we use existing techniques to evaluate our queries? Creta
theHotel.gr City Island theHotel.gr theHotel.gr City City Island Island Path Queries in XPath no structure (keywords) full structure (path patterns) partial path queries //theHotel.gr [descendant-or-self::* [ancestor-or-self::City] [ancestor-or-self::Island]] //theHotel.gr//City [descendant-or-self::* [ancestor-or-self::Island]] /theHotel.gr/City//Island
r a c c b a d Partial Path Queries root node (optional) query node labelled by “a” child relationship descendant relationship r a partial path query
r r a c c c a b a a d b d Partial Path Queries QUERY PROCESSING QUERY EVALUATION partial path query partial path query in canonical form
Evaluation of Partial Path Queries on XML Data Partial path queries Query processing Query evaluation Experiments Conclusion
r a c c b a d Query Processing • Full form • Satisfiability • Redundant nodes • Canonical form
IR1 r a c c b a d Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form
IR4 r a c c b a d Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form
r a c c b a d IR4 Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form
r c c a a b d Query Processing INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b • Full form • Satisfiability • Redundant nodes • Canonical form
r c c x y a a b d Query Processing • Full form • Satisfiability • Redundant nodes • Canonical form A query is unsatisfiable if its full form contains a trivial cycle:
r c a a x x y b d y x y y y y z y y z y z Query Processing A node y is redundant if one of the following patterns occur: • Full form • Satisfiability • Redundant nodes • Canonical form a) d) b) c c)
r c a a b d Query Processing • Full form • Satisfiability • Redundant nodes • Canonical form canonical form of satisfiable query = full form – IR2 – IR3 – redundant nodes The canonical form of a query is a directed acyclic graph (dag)
Evaluation of Partial Path Queries on XML Data Partial path queries Query processing Query evaluation Experiments Conclusion
Evaluation Algorithms • Based on PathStack [Bruno et al. ’02] • Produce all possible path queries… • Decompose into root-to-leaf paths… • PartialMJ: Decompose a spanning tree into paths… • Extending PathStack [Bruno et al. ’02] • PartialPathStack: Produce a topological order of the query nodes and extend PathStack to handle it…
r a b c d e f g Based on PathStack 1. Producing all possible path queries… r r r r a a a a b b b c c c b c d d d d e e e e f f f f g g g g
r r r r r a a a a a b b c c b c d c c b b e f d d d d g e e f f f f g g g g e e Based on PathStack 1. Producing all possible path queries…
r a b c d e f g Based on PathStack 1. Producing all possible path queries… Problems: too many queries to evaluate multiple traversal of the XML tree
r r r r a a a a c b b c d d d d e e f f g g Based on PathStack 2. Decomposing into root-to-leaf paths…
r r r r a a a a c b b c d d d d e e f f g g Based on PathStack 2. Decomposing into root-to-leaf paths… PathStack
r r r r a a a a c b b c d d d d e e f f g g Based on PathStack 2. Decomposing into root-to-leaf paths… Problems: path overlaps more than one components to evaluate intermediate results
r r r a c a a b b d d e f g Based on PathStack PartialMJ. Using a spanning tree… Remove edges to create a spanning tree
r r r r a a b c c a a d b b e f d d e f g g Based on PathStack PartialMJ. Using a spanning tree…
r r r r a a b c c a a d b b e f d d e f g g Based on PathStack PartialMJ. Using a spanning tree… PathStack
r r r r a a b c c a a d b b e f d d e f g g Based on PathStack PartialMJ. Using a spanning tree… Join conditions (identity, structural, path)
r r r r a a b c c a a d b b e f d d e f g g Based on PathStack PartialMJ. Using a spanning tree… Join conditions (identity, structural, path)
r r r r a a b c c a a d b b e f d d e f g g Based on PathStack PartialMJ. Using a spanning tree… Join conditions (identity, structural, path)
r r r r a a b c c a a d b b e f d d e f g g Based on PathStack PartialMJ. Using a spanning tree…
r a b c d e f g Based on PathStack PartialMJ. Using a spanning tree… Problems: path overlaps more than one components to evaluate intermediate results
r a b c d e f g Extending PathStack PartialPathStack. Employ a topological order… r a b c d e f g
r r a a b b c d c e f d g e f g Extending PathStack PartialPathStack. Employ a topological order… PartialPathStack
r Sr Sa Sb Sd Sc Se a d b c e PartialPathStack Example tree query results r a1 b1 d1 d1 sink nodes c1 e1 d2 c2 e2
r Sr Sa Sb Sd Sc Se a d b c e PartialPathStack Example tree query results r a1 b1 d1 d1 sink nodes c1 e1 r d2 c2 e2
r Sr Sa Sb Sd Sc Se a d b c e PartialPathStack Example tree query results r a1 b1 d1 d1 sink nodes c1 e1 r a1 d2 c2 e2
r Sr Sa Sb Sd Sc Se a d b c e PartialPathStack Example tree query results r a1 b1 d1 d1 sink nodes c1 e1 r a1 b1 d2 c2 e2
r Sr Sa Sb Sd Sc Se a d b c e PartialPathStack Example tree query results r a1 b1 d1 d1 sink nodes c1 e1 r a1 b1 d1 d2 c2 e2
r Sr Sa Sb Sd Sc Se a d b c e PartialPathStack Example tree query results r a1 b1 d1 d1 sink nodes c1 e1 r a1 b1 d1 c1 d2 c2 e2
r Sr Sa Sb Sd Sc Se a d b c e PartialPathStack Example tree query results r a1 b1 OUTPUT!!! d1 d1 sink nodes c1 e1 r a1 b1 d1 c1 e1 d2 c2 e2
r Sr Sa Sb Sd Sc Se a d b c e PartialPathStack Example tree query results r a1 b1 OUTPUT!!! d1 d1 sink nodes c1 e1 r a1 b1 d1 c1 e1 d2 c2 e2
r Sr Sa Sb Sd Sc Se a d b c e PartialPathStack Example tree query results r a1 b1 OUTPUT!!! d1 d1 sink nodes c1 e1 r a1 b1 d1 c1 e1 d2 c2 e2
r Sr Sa Sb Sd Sc Se a d b c e PartialPathStack Example tree query results r a1 b1 OUTPUT!!! d1 d1 sink nodes c1 e1 r a1 b1 d1 c1 e1 d2 c2 e2
r Sr Sa Sb Sd Sc Se a d b c e PartialPathStack Example tree query results r a1 b1 OUTPUT!!! d1 d1 sink nodes c1 e1 r a1 b1 d1 c1 e1 d2 c2 e2
r Sr Sa Sb Sd Sc Se a d b c e PartialPathStack Example tree query results ra1b1d1c1e1 r a1 b1 OUTPUT!!! d1 d1 sink nodes c1 e1 r a1 b1 d1 c1 e1 d2 c2 e2