260 likes | 380 Views
Incremental Maintenance of XML Structural Indexes. Ke Yi 1 , Hao He 1 , Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2 IBM T. J. Watson Research Center. Motivation. XML is gaining tremendously in popularity in recent years
E N D
Incremental Maintenance of XML Structural Indexes Ke Yi1, Hao He1,Ioana Stanoi2and Jun Yang1 1Department of Computer Science, Duke University 2IBM T. J. Watson Research Center
Motivation • XML is gaining tremendously in popularity in recent years • Used to represent many kinds of data • Major DB vendors are rushing to incorporate solutions for native XML repositories and retrieval • IBM DB2, Oracle , Microsoft SQL Server • Tamino, Natix, X-Hive, …
Overview paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses
Label Path Expressions paper /paper/section/algorithm 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses
Structural Indexes • Why do we need them? • Speedup the evaluation of path expressions • Provides a structural summary of the data graph • Structural indexes • DataGuide [Goldman & Widom 97] • 1-index [Milo & Suciu 99] • A(k)-index [Kaushik et al. 02], D(k)-index [Qun et al. 03],M(k)-index [He & Yang 04] • Integration of structural indexes and inverted lists[Kaushik et al. 04] • Focus on maintenance • Has a major effect on index efficiency • Remains an overlooked issue
Outline paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses
1-Index: Definition • Constructed by using bisimilarity • Definition based on stability • Partition data nodes into index nodes • dnode (v) and inode (I[v]) • I[u] is v’s index parent if uis v’s parent • An inode is stable if all of its dnodes have the same index parents • In a 1-index, all inodes are stable I[u] u I[v] v
1-Index: Example paper paper 1 1 13 section section title 14 section 2 2,4,8,13 section 8 4 section 3 15 exp exp title exp algorithm title algorithm 16 10 15,16 3,5,9,14 6,10 6 9 algorithm title 5 title 18 about proof proof 17 11 17,18 proof 7 7 11 about about uses proof 12 12 /paper/section/algorithm uses data graph 1-index
1-Index: Quality paper • Assigning dnodes that are bisimilar into different inodes • does not affect correctness, • but does affect efficiency • The quality of an index 1 section 2,4 2,4,8,13 8,13 exp title algorithm 15,16 3,5,9,14 6,10 proof 11 17,18 # inodes 7 − 1 X 100% about proof # inodes in the minimum 1-index 12 uses Ideal: quality = 0%
Previous Results • Construction • The PT algorithm [Paige & Tarjan 87], in time O(m log n) • m – # edges, n - # nodes • Edge changes • The propagate algorithm [Kaushik et al. 02] • Quality of the 1-index after update • No guarantee on the quality of the resulted index • 3 ~ 5% after 500 edge insertions in experiments • Subgraph addition • Index-reconstruction
Edge Insertion: An Example (1) R R R A B A B A B C1 C2 C3 C1, C2 C3 C1 C2 C3 D1 D2 D3 D1, D2 D3 D1, D2 D3 Data Graph 1-Index Split 1
Edge Insertion: An Example (2) R R R A B A B A B C1 C2 C3 C1 C2, C3 C1 C2, C3 D1 D2 D3 D1 D2 D3 D1 D2, D3 Split 2 Merge 1 Merge 2 Indeed the minimum 1-index for the data graph after update Not a coincidence!
Minimum & Minimal Indexes • Minimum: with the smallest number of inodes • Minimal: no two inodes can be merged R R R A1 A2 A1 A2 A1,A2 B1 B2 B1,B2 B1 B2 Data graph Minimum 1-index Minimal 1-index
Quality Guarantee • Theorem: The split/merge algorithm always maintains a minimal 1-index • Lemma: For acyclic data graphs, there is a unique minimal 1-index • The minimum 1-index is always maintained • For cyclic data graphs, there could be more than one minimal 1-index • One of them is maintained
Outline paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses
A(k)-Index: Definition • k-bisimilarity • Definition based on stability • A(0)-index: partition by label • … • A(k)-Index • An inode in A(k)-index is stable if all of its dnodes have the same index parents in A(k-1)-index • Only interested in paths of length ≤k • Shown to be much smaller and more efficient than 1-index [Kaushik et al. 02] • But, no efficient maintenance algorithms are known!
A(k)-index: Example R R R R A B A B A B A B C1 C2 C3 C1 C2,C3 C1 C2,C3 C1,C2,C3 C4,C5,C6 C4 C5 C6 C4 C5,C6 C4,C5,C6 Data graph A(2) (=1-index) A(1) A(0) Maintenance of A(i)-index requires the information in A(i-1)-index
A(k)-index: Refinement Tree R R R R A B A B A B A B C1 C2 C3 C1 C2,C3 C1 C2,C3 C1,C2,C3 C4,C5,C6 C4 C5 C6 C4 C5,C6 C4,C5,C6 Data graph A(2) (=1-index) A(1) A(0)
A(k)-index: Refinement Tree R R R R A B A B A B A B C1 C2 C3 C C C C C C4 C5 C6 C C C Data graph A(2) A(1) A(0) • Reduce storage cost • Reduce maintenance cost 0.5% ~ 13% additional storage
Quality Guarantee • Theorem: The split/merge algorithm always maintains A(k)-index • Lemma: There is a unique minimal A(k)-index for any data graph, acyclic or cyclic the minimum a minimal
Outline paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses
Experiments on Edge Changes • Datasets • Real-life: IMDB (272,000 nodes) • Benchmark: XMark (198,000 nodes) • Setup • First delete a portion of existing ID-REF links • Then do random mixed insertions/deletions • Compare with • 1-index: propagate (+ reconstruction) • A(k)-index: recompute affected portion (+ reconstruction)
Experiment Results: A(k)-index running times
Conclusions • The first solutions for the maintenance (edge & subgraph additions/deletions) of 1-index and A(k)-index that are both effective and efficient • Effective: quality guarantee on the resulted index • Efficient: the algorithms themselves are fast Thank you!
Graphical Illustration size valid 1-index merge split index the index can only grow in size due to splitting, if merging is not enforced