1 / 26

Incremental Maintenance of XML Structural Indexes

Incremental Maintenance of XML Structural Indexes. Ke Yi 1 , Hao He 1 , Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2 IBM T. J. Watson Research Center. Motivation. XML is gaining tremendously in popularity in recent years

Download Presentation

Incremental Maintenance of XML Structural Indexes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Incremental Maintenance of XML Structural Indexes Ke Yi1, Hao He1,Ioana Stanoi2and Jun Yang1 1Department of Computer Science, Duke University 2IBM T. J. Watson Research Center

  2. Motivation • XML is gaining tremendously in popularity in recent years • Used to represent many kinds of data • Major DB vendors are rushing to incorporate solutions for native XML repositories and retrieval • IBM DB2, Oracle , Microsoft SQL Server • Tamino, Natix, X-Hive, …

  3. Overview paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses

  4. Label Path Expressions paper /paper/section/algorithm 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses

  5. Structural Indexes • Why do we need them? • Speedup the evaluation of path expressions • Provides a structural summary of the data graph • Structural indexes • DataGuide [Goldman & Widom 97] • 1-index [Milo & Suciu 99] • A(k)-index [Kaushik et al. 02], D(k)-index [Qun et al. 03],M(k)-index [He & Yang 04] • Integration of structural indexes and inverted lists[Kaushik et al. 04] • Focus on maintenance • Has a major effect on index efficiency • Remains an overlooked issue

  6. Outline paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses

  7. 1-Index: Definition • Constructed by using bisimilarity • Definition based on stability • Partition data nodes into index nodes • dnode (v) and inode (I[v]) • I[u] is v’s index parent if uis v’s parent • An inode is stable if all of its dnodes have the same index parents • In a 1-index, all inodes are stable I[u] u I[v] v

  8. 1-Index: Example paper paper 1 1 13 section section title 14 section 2 2,4,8,13 section 8 4 section 3 15 exp exp title exp algorithm title algorithm 16 10 15,16 3,5,9,14 6,10 6 9 algorithm title 5 title 18 about proof proof 17 11 17,18 proof 7 7 11 about about uses proof 12 12 /paper/section/algorithm uses data graph 1-index

  9. 1-Index: Quality paper • Assigning dnodes that are bisimilar into different inodes • does not affect correctness, • but does affect efficiency • The quality of an index 1 section 2,4 2,4,8,13 8,13 exp title algorithm 15,16 3,5,9,14 6,10 proof 11 17,18 # inodes 7 − 1 X 100% about proof # inodes in the minimum 1-index 12 uses Ideal: quality = 0%

  10. Previous Results • Construction • The PT algorithm [Paige & Tarjan 87], in time O(m log n) • m – # edges, n - # nodes • Edge changes • The propagate algorithm [Kaushik et al. 02] • Quality of the 1-index after update • No guarantee on the quality of the resulted index • 3 ~ 5% after 500 edge insertions in experiments • Subgraph addition • Index-reconstruction

  11. Edge Insertion: An Example (1) R R R A B A B A B C1 C2 C3 C1, C2 C3 C1 C2 C3 D1 D2 D3 D1, D2 D3 D1, D2 D3 Data Graph 1-Index Split 1

  12. Edge Insertion: An Example (2) R R R A B A B A B C1 C2 C3 C1 C2, C3 C1 C2, C3 D1 D2 D3 D1 D2 D3 D1 D2, D3 Split 2 Merge 1 Merge 2 Indeed the minimum 1-index for the data graph after update Not a coincidence!

  13. Minimum & Minimal Indexes • Minimum: with the smallest number of inodes • Minimal: no two inodes can be merged R R R A1 A2 A1 A2 A1,A2 B1 B2 B1,B2 B1 B2 Data graph Minimum 1-index Minimal 1-index

  14. Quality Guarantee • Theorem: The split/merge algorithm always maintains a minimal 1-index • Lemma: For acyclic data graphs, there is a unique minimal 1-index • The minimum 1-index is always maintained • For cyclic data graphs, there could be more than one minimal 1-index • One of them is maintained

  15. Outline paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses

  16. A(k)-Index: Definition • k-bisimilarity • Definition based on stability • A(0)-index: partition by label • … • A(k)-Index • An inode in A(k)-index is stable if all of its dnodes have the same index parents in A(k-1)-index • Only interested in paths of length ≤k • Shown to be much smaller and more efficient than 1-index [Kaushik et al. 02] • But, no efficient maintenance algorithms are known!

  17. A(k)-index: Example R R R R A B A B A B A B C1 C2 C3 C1 C2,C3 C1 C2,C3 C1,C2,C3 C4,C5,C6 C4 C5 C6 C4 C5,C6 C4,C5,C6 Data graph A(2) (=1-index) A(1) A(0) Maintenance of A(i)-index requires the information in A(i-1)-index

  18. A(k)-index: Refinement Tree R R R R A B A B A B A B C1 C2 C3 C1 C2,C3 C1 C2,C3 C1,C2,C3 C4,C5,C6 C4 C5 C6 C4 C5,C6 C4,C5,C6 Data graph A(2) (=1-index) A(1) A(0)

  19. A(k)-index: Refinement Tree R R R R A B A B A B A B C1 C2 C3 C C C C C C4 C5 C6 C C C Data graph A(2) A(1) A(0) • Reduce storage cost • Reduce maintenance cost 0.5% ~ 13% additional storage

  20. Quality Guarantee • Theorem: The split/merge algorithm always maintains A(k)-index • Lemma: There is a unique minimal A(k)-index for any data graph, acyclic or cyclic the minimum a minimal

  21. Outline paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses

  22. Experiments on Edge Changes • Datasets • Real-life: IMDB (272,000 nodes) • Benchmark: XMark (198,000 nodes) • Setup • First delete a portion of existing ID-REF links • Then do random mixed insertions/deletions • Compare with • 1-index: propagate (+ reconstruction) • A(k)-index: recompute affected portion (+ reconstruction)

  23. Experiment Results: 1-index

  24. Experiment Results: A(k)-index running times

  25. Conclusions • The first solutions for the maintenance (edge & subgraph additions/deletions) of 1-index and A(k)-index that are both effective and efficient • Effective: quality guarantee on the resulted index • Efficient: the algorithms themselves are fast Thank you!

  26. Graphical Illustration size valid 1-index merge split index the index can only grow in size due to splitting, if merging is not enforced

More Related