1 / 42

Branch Code: A Labeling Scheme for Efficient Query Answering on Trees

Branch Code: A Labeling Scheme for Efficient Query Answering on Trees. Yanghua X iao, Ji Hong , Wanyun Cui, Zhenying He, Wei Wang, Guodong Feng April 2012. Background. Tree is widely used data model XML data File directory Spanning tree in graphs

torgny
Download Presentation

Branch Code: A Labeling Scheme for Efficient Query Answering on Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Branch Code: A Labeling Scheme for Efficient Query Answering on Trees YanghuaXiao, Ji Hong, Wanyun Cui, Zhenying He, Wei Wang, GuodongFeng April 2012

  2. Background • Tree is widely used data model • XML data • File directory • Spanning tree in graphs • One typical task on tree data is querying structural relationships • PC: Parent/Child • AD: Ancestor/Descendant • SR: Sibling Relation • LCA: Lowest Common Ancestor

  3. Previous Labeling Schemes • Interval-based • A triple <start, end, level>, generated by pre-order/post-order traverse • Can not support SR • Hard to compute LCA • Hard to update • Prefix-based • Dewey Code and its variety • Storage costly for deep trees • Hard to update • Prime-based (Integer-based) • Use primes to encode (X. Wu, etc. , ICDE’04) • Storage costly

  4. Our Labeling Schemes: Brach codes • Support various queries efficiently • PC, AD in constant time • LCA in O(d), where d is the depth of tree • Space efficient • Exact labeling cost O(Nd) spaces, but in most cases is less space than other labelings • Approximate labeling allows us to tradeoff accuracy for space cost • Support update on trees • Amortized O(logN) modification cost by Splay tree

  5. Outline • Original Idea • Definition of BranchCode • Addressing Update Operations on Trees • Compression Method • Experimental Evaluation • Conclusion

  6. Basic Idea • Prefix-based A : * B : *.0 C : *.1 D : *.0.0 E : *.0.1 F : *.0.1.0 • Prime-based A : 2 B : 3 × A C : 5 × A D : 7 × B E : 11 × B F : 13 × E Our Idea

  7. Representation of Numbers Complex Radix Digit Vector: D = <d0,d1,d2,…dn> Radix Vector: R = <r0,r1,r2,…rn> S(D,R) = , where Simple Radix • Decimal (10-based): 123, 78, 23472, … • Binary (2-based): 0, 1, 101, 1010, 1101,…

  8. Complex Radix • The representation of complex radix can be formalized in recursive style: ,where , Prefix form

  9. Outline • Original Idea • Definition of BranchCode • Addressing Update Operations on Trees • Compression Mechanism • Experimental Evaluation • Conclusion

  10. Definition of BranchCode • Definition I: B Code. B code for a node b in an ordered tree T, is a function defined as: ,where . Here d(v) is the depth of v and p(v) is the parent of v; x(v) and y(v) are the degree of v and the order (from 0) of v among its siblings.

  11. Example [3 , -] [3 , 1] [2 , 1] [- , 1] • R = <2, 3, 3> • D = <1,1,1> • b(n) = S(D, R) = 1 + 2 × (1 + 3 × 1) = 13

  12. Query Answering 1. Ancestor/Descendant (AD) Determination is the descendant of and 2. Navigability 3. Lowest Common Ancestor (LCA) Stems from Navigability. Sibling Relationship , are siblings and .

  13. Outline • Original Idea • Definition of BranchCode • Addressing Update Operations on Trees • Compression Mechanism • Experimental Evaluation • Conclusion

  14. BranchCode for Dynamic Trees • S(D,R), where D = <d0,d1,d2,…dn> R = <r0,r1,r2,…rn> • S’(D’,R’), where D’ = <d0,d1,d2,…,di’,…dn> R’ = <r0,r1,r2,…,ri’,…,rn> • Delta = |S’ – S| • How to calculate Delta?

  15. Incremental Update of BranchCode • Lemma 6 (Effect on g function): If a new node is inserted as a child of node s, then for any node k except the newly added node in the subtreeTs, the increment of g(k) satisfies the following equation:

  16. Incremental Update of BranchCode • Lemma 7 (Effect on h function (degree change)): For node s in a tree T, if its degree increases by one, then for any node k in the subtreeTs, the increment of h(k) caused by the degree change of s satisfies the following equation:

  17. Incremental Update of BranchCode • Lemma 8 (Effect on h function (order change)): If the order of a node s in tree T increases by one, then for each of its descendant k in Ts the increment of k’s h function caused by the order change of s is:

  18. Incremental Update of BranchCode • Theorem 10 (Increment of BranchCode): If we insert a new node as a child of s, for any node k in Ts except s, the increment of , i.e. is given by:

  19. Example

  20. Affect Nodes after Update • When we insert (or delete) a child of a particular node, all its descendants will be affected. • According to mathematical proofs, in expection O(n) nodes can be affected after an insertion operation in some bad cases, here n is the size of the tree.

  21. Affect Nodes after Update (Cont’d) • Post-order traversal on trees. Seq = {2, 3, 6, 7, 4, 5, 1} • Two properties of post-order sequence: • All descendants of a single node are consecutive in the post-order sequence. • All descendants of a set of consecutive siblings are consecutive in the post-order sequence. Use Splay Tree to maintain the sequence.

  22. Update and query based on splay tree • Update Based on Splay Tree

  23. Maintainance of Buffered Marks

  24. Outline • Original Idea • Definition of BranchCode • Addressing Update Operations on Trees • Compression Mechanism • Experimental Evaluation • Conclusion

  25. Compressed BranchCode • Definition of Compressed Code:

  26. Property of Compressed Code • Congruence: • CA Determination:

  27. Outline • Original Idea • Definition of BranchCode • Addressing Update Operations on Trees • Compression Mechanism • Experimental Evaluation • Conclusion

  28. Accuracy of Compressed Code

  29. Results on Real Data • Data sets:

  30. Results on Real Data (Cont’d)

  31. Results on Synthetic Data

  32. Outline • Original Idea • Definition of BranchCode • Addressing Update Operations on Trees • Compression Mechanism • Experimental Evaluation • Conclusion

  33. Conclutions • We systematically explore the basic properties about branch code and construct conditions for correctly determining the relationships of nodes in trees. • The compressed BranchCodereduces the storage cost to linear complexity. • We also design an incremental approach (of O(logN) amortized update cost and query cost) based on splay tree to maintain branch codes on dynamic trees.

  34. Open Question • Compressed Code False Positive (FP) Answers • Multiple ModulosReduce Possibility of FP How to theoretically estimate the possibility of FP given particular modulo set?

  35. Thank you for your attention!

  36. Motivation of Problem • Why you study this problem?

  37. Related works • How did people solve this problem in the previous works? • Survey of any other related works • Problems that is similar to your works • Techniques that used in your solution • Any other related works

  38. Problem definition • Formal definition • Property of proposed problem • Is this problem novel • Difference of this problem to the related problem • Does this problem deserve our research efforts? • Challenges of this problem • Is this problem NP-hard? If so, give the proof

  39. Baseline Solution • What is the naive solution to solve this problem • Why this solution is unacceptable? • Complexity • Salability • Or any other issues

  40. Your solution • Basic idea of your solution • Example if exists • Algorithm framework of your solution

  41. Key technique of your solution • For each technique, give the following • Rationality of this technique • Procedure of the technique • Can we prove the efficiency or effectiveness of your solution?If so, give them • Optimization of your technique when handle large data or dynamic data

  42. Planning of next step • What you plan to do as the next step? • Checkpoint • Delivery

More Related