270 likes | 405 Views
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data. by Raghav Kaushik , Pradeep Shenoy , Philip Bohannon and Ehud Gudes. Outline. No Outline No Confusing Syntax No Pseudocode Examples Results. XML as Data Graph. oid. label(3). value(13).
E N D
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by RaghavKaushik, PradeepShenoy, Philip Bohannon and Ehud Gudes Abdullah Mueen
Outline • No Outline • No Confusing Syntax • No Pseudocode • Examples • Results Abdullah Mueen
XML as Data Graph oid label(3) value(13) Non-tree edges: model IDREF relationships in the document Abdullah Mueen
Some Notations • node path: • 1.2.3.7.14 • label path: • ROOT.metro.cultural.museum.name • 1.2.3.7 matchesROOT.metro.cultural.museum • 2.3.7 does not matchmetro.cultural.museum.name • 7 and 6 both matchesROOT.etro.cultural.museum • k-path: • Label Path of length ≤ k Abdullah Mueen
Path Expression matches with any label alteration repetition • ROOT.metro.cultural.museum • 6,7 • ROOT.(-.-.-).name • 12,14,16,19,22,24 • ROOT.-*.hotel • All hotel nodes • ROOT.metro.neighborhoods.neighborhood. (-|-.-)?.(hotel|museum).name • 12,14,16,19 label sequencing optional Xpath and other Query Languages that use Path Expressions • http://saxon.sourceforge.net/saxon6.5.3/expressions.html • http://www.w3.org/1999/09/ql/docs/xquery.html Abdullah Mueen
The Problem • Given a graphGand a path expression P, what are the labels of the nodes that match with P. • Possible Solution is to evaluate the path expression query using the data graph. • But data graphcan be Very Large to fit in the main memory and can be Very Large to search completely even if it fits. Abdullah Mueen
Indexing Data Graph • No Schema • No Keys • Only Structural Information is there which can be summarized by a smaller graph I(G). This summary graph serves as an Index for the whole data graph. Abdullah Mueen
Indexing Data Graph : Example(1) R 0 Precise Index eg. DataGuide, 1-index R Extent 11 C A B 1 3 2 C A B 12 14 13 C {3} B D {1} 4 5 6 {2,4} D 15 C 17 C D 7 8 {6} {5,7} D ext(17) = {5,7} ext(13) = {2,4} 18 D 9 {8,9} index graph I(G) data graph G R.A.-*.C = {5,7} R.-.B = {4,2} R.A.-*.C = {5,7} R.-.B = {4,2} Abdullah Mueen
Indexing Data Graph : Example(2) R 0 R 11 C A B C 1 3 2 A B 12 14 13 {3,5,7} {1} C {2,4} B D 4 5 6 D 15 {6,8,9} C D 7 8 index graph I(G) D 9 Safe Index data graph G R.A.-*.C = {3,5,7} R.-.-*.B = {2,4} R.A.-*.C = {5,7} R.-.-*.B = {4} Abdullah Mueen
Indexing Data Graph : Example(3) R 0 R 11 C A B C 1 3 2 A B 12 14 13 {3,5,7} {1} C {2,4} B D 4 5 6 D 15 {6,8,9} C D 7 8 index graph I(G) D 9 Unsafe Index data graph G R.A.-*.C = {5,7} R.-.-*.B = {2} R.A.-*.C = {3,5,7} R.-.-*.B = { } Abdullah Mueen
Bisimilarity R 0 Two nodes u and v are called bisimilar(u ≈b v) if label(u) = label(v) every incoming label path from ROOT to u matches with at least one incoming path from ROOT to v and vice versa. C A B 1 3 2 C B D 4 5 6 • 2,4 are bisimilar. • 5,7 are bisimilar • 8,9 are bisimilar • 6,8 are Not bisimilar C D 7 8 D 9 • ≈b defines an equivalence class over the set of nodes in G • Needs O(m log n) time to find the partitions data graph G R.A.-*.C = {5,7} R.-.B = {4,2} Abdullah Mueen
Equivalence Classb → The 1-index R 0 R 11 C A B 1 3 2 C A B 12 14 13 C {3} B D {1} 4 5 6 {2,4} D 15 C 17 C D 7 8 {6} {5,7} D 18 D 9 {8,9} index graph I(G) data graph G R.A.-*.C = {5,7} R.-.B = {4,2} R.A.-*.C = {5,7} R.-.B = {4,2} Abdullah Mueen
Revisiting Bisimilarity • 1-index is upper bounded by the size (number of nodes) of the data graph • For real large documents it is almost 45% of the size of the data graph Bisimilarity partitions nodes by considering all incoming paths from ROOT which is a global comparison between nodes. Abdullah Mueen
k-bisimilarity R 0 Two nodes u and v are called k-bisimilar(u ≈k v) if label(u) = label(v) every incoming label path of length≤kto u matches with at least one incoming path of length≤kto v and vice versa. C A B 1 3 2 C B D 4 5 6 C D 7 8 D 9 • ≈k defines an equivalence class over the set of nodes in G • The algorithm for computing k-bisimulation will be shown later • 2,4 are 0-bisimilar. • 5,7 are 1-bisimilar • 8,9 are 2-bisimilar • 6,8 are 1-bisimilar Abdullah Mueen
Equivalence Class0 → A(0) index R 0 R 11 C A B C 1 3 2 A B 12 14 13 {3,5,7} {1} C {2,4} B D 4 5 6 D 15 {6,8,9} C D 7 8 D Label grouping / Label partition 9 data graph G index graph A(0) Abdullah Mueen
Equivalence Class1 → A(1) index R 0 R 11 C A B 1 3 2 C A B 12 14 13 C {1} B D {3} 4 5 {2} 6 C B D 15 16 17 C D 7 8 {5,7} {6,8,9} {4} D 9 data graph G index graph A(1) Abdullah Mueen
A(k) index family R 0 R 11 R 11 C A C B A 12 14 B 13 1 3 {3,5,7} 2 {1} {2,4} C D 15 A B 12 14 13 {6,8,9} C {1} B D {3} 4 5 6 {2} A(0) A(1) C R 11 B D R 11 15 16 17 C D {5,7} {6,8,9} 7 8 A C C {4} A B B 12 14 12 14 13 13 {1} {1} {3} {3} {2} {2} C D C B 9 B D D 15 16 17 15 16 17 {4} {5} {5} {6} {4} {6} D data graph G D C 18 C 19 18 19 {8} {7} {8,9} {7} D 18 A(2) A(3) = 1-index {9} Abdullah Mueen
Properties of A(k) index R 0 R 11 C A B 1 3 2 C A B 12 14 13 {1} C B D {3} 4 5 6 {2} C B D 15 16 17 C D 7 8 {5,7} {6,8,9} {4} D 9 A(1) Abdullah Mueen
Properties of A(k) index R 0 R 11 C A B 1 3 2 C A B 12 14 13 {1} C B D {3} 4 5 6 {2} C B D 15 16 17 C D 7 8 {5,7} {6,8,9} {4} D 9 A(1) Abdullah Mueen
How to compute A(1) index R 0 Label partition {1} {2,4} {3,5,7} {6,8,9} Lookup: {1} {2,4} {3,5,7} {6,8,9} C {1} {2} {4} {3,5,7} {6,8,9} A Refining: {1} {2,4} {3,5,7} {6,8,9} B 1 3 2 {1} {2,4} {3,5,7} {6,8,9} C B D {1} {2} {4} {3} {5,7} {6,8,9} 4 5 6 {1} {2,4} {3,5,7} {6,8,9} C D 7 8 {1} {2} {4} {3} {5,7} {6,8,9} D 9 {1} {2,4} {3,5,7} {6,8,9} {1} {2} {4} {3} {5,7} {6,8,9} 1-bisimilar partition Abdullah Mueen
How to compute A(2) index R 0 1-bisimilar partition {1} {2} {4} {3} {5,7} {6,8,9} Lookup: {1} {2} {4} {3} {5,7} {6,8,9} C {1} {2} {4} {3} {5,7} {6,8,9} Refining: {1} {2} {4} {3} {5,7} {6,8,9} A B 1 3 2 {1} {2} {4} {3} {5,7} {6,8,9} {1} {2} {4} {3} {5} {7} {6,8,9} C B D 4 5 6 {1} {2} {4} {3} {5,7} {6,8,9} {1} {2} {4} {3} {5} {7} {6,8,9} C D 7 8 {1} {2} {4} {3} {5,7} {6,8,9} D 9 {1} {2} {4} {3} {5} {7} {6} {8,9} {1} {2} {4} {3} {5,7} {6,8,9} 2-bisimilar partition {1} {2} {4} {3} {5} {7} {6} {8,9} Abdullah Mueen
Query Evaluation : Fwd or Bckwd R 11 C A B 12 14 13 {1} {3} R A {2} - C B D 15 16 17 {5,7} {6,8,9} C {4} R.A.-*.C = {5,7} • Repeated state is prevented • O(|A|*m) • Backward evaluation using label-group Abdullah Mueen
Query Evaluation : Validation R 11 R A C B A B 12 14 13 {1} {3} D C {2} C B D 15 16 17 {5,7} {6,8,9} {4} R.A.B.C.D = {6,8,9} • Repeated state is prevented • O(|A|*m) Abdullah Mueen
Avoiding Validation R 11 R.-*.C.D= {6,8,9} C A B 12 14 13 {1} {3} {2} For Queries like R.-*.p, we can safely avoid validation on A(k) if p is a k-path. C B D 15 16 17 {5,7} {6,8,9} {4} A(1) Abdullah Mueen
Results Abdullah Mueen
Results Abdullah Mueen
Conclusion • A(k) index is smaller than precise indexes and have their advantages, such as faster execution time with significant accuracy. • Future presentations • Change of the indexes with updates. • Incorporating more complex queries. Abdullah Mueen