1 / 60

Full Disjunctions : Polynomial-Delay Iterators in Action

This paper presents novel algorithms and optimizations for computing full disjunctions in relational databases, extending natural joins to combine data from multiple relations, introducing efficient methods for full disjunction evaluation.

lesure
Download Presentation

Full Disjunctions : Polynomial-Delay Iterators in Action

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sara Cohen Itzhak Fadida Yaron Kanza Technion Israel Technion Israel University of Toronto Canada Benny Kimelfeld Yehoshua Sagiv Hebrew University Israel Hebrew University Israel VLDB 2006 Seoul, Korea Full Disjunctions:Polynomial-Delay Iterators in Action

  2. Computing Full Disjunctions • The full disjunction is a relational operator that maximally combines data from several relations • It extends the natural join by allowing incompleteness • It extends the binaryouterjoin to many relations • This paper presents algorithms and optimizations for computing full disjunctions • Theoretically, full disjunctions are more tractable than previously known • Practically, a significant improvement over the state-of-art, an iterator-like evaluation

  3. Contents • Full Disjunctions • Complexity • Contributions • Algorithms • Algorithm NLOJ for Tree-Structured Schemes • Algorithm PDelayFD for General Schemes • Algorithm BiComNLOJ − Main Algorithm • Experimental Results • Conclusion

  4. Contents • Full Disjunctions • Complexity • Contributions • Algorithms • Algorithm NLOJ for Tree-Structured Schemes • Algorithm PDelayFD for General Schemes • Algorithm BiComNLOJ − Main Algorithm • Experimental Results • Conclusion

  5. The Natural Join Operator Climates Accommodations Sites ClimatesAccommodationsSites

  6. The Natural Join Misses Information Climates Accommodations Sites Bahamas is not in Sites, so the natural join misses it ClimatesAccommodationsSites

  7. The Natural Join Misses Information Empty space means nullvalue Climates Accommodations Bahamas is not in Sites, so the natural join misses it Mouth Logan is not in a city, hence missed ClimatesAccommodationsSites

  8. The Natural Join Misses Information A looser notion of join is needed—one that enables joining tuples from some of the tables Climates Accommodations Bahamas is not in Sites, so the natural join misses it Mouth Logan is not in a city, hence missed ClimatesAccommodationsSites

  9. The Natural Join Operator A tuple of the join corresponds to a set of tuples from the source relations Climates Accommodations Sites Join consistent Connected No Cartesian product Complete One tuple from each relation ClimatesAccommodationsSites

  10. Join-Consistent Sets of Tuples A set T of tuples is join-consistent if every two tuples of T are join-consistent Two tuples t1 and t2 are join-consistent if for every common attribute A: 1. t1[A] and t2[A] are non-null 2.t1[A] = t2[A]

  11. Connected Sets of Tuples A set of tuples is connected if its join graph is connected The join graph of a setT of tuples: • The nodes are the tuples of T • An edge between every two tuples with a common attribute

  12. Natural Join (w/o Cartesian Product) T is join consistent 1. JCC 2. 3. T is connected No Cartesian product T is complete One tuple from each relation Each tuple of the result corresponds to a set T of tuples from the source relations

  13. FullDisjunction (Galindo-Legaria 1994) JCC 2. 3. 3. T is connected No Cartesian product T is complete One tuple from each relation T is maximal Not properly contained in any JCC set Each tuple of the result corresponds to a set T of tuples from the source relations T is join consistent 1.

  14. An Example of a Full Disjunction Climates Accommodations Sites R FD(R)

  15. An Example of a Full Disjunction Climates Accommodations Sites R FD(R)

  16. An Example of a Full Disjunction Climates Accommodations Sites R FD(R)

  17. An Example of a Full Disjunction Climates Accommodations Sites R FD(R)

  18. An Example of a Full Disjunction Climates Accommodations Sites R FD(R)

  19. An Example of a Full Disjunction Climates Accommodations Sites R FD(R)

  20. Padding Joined Tuple Sets with Nulls

  21. The Outerjoin Operator R1R2 The natural joinR1 R2 and, in addition, all dangling tuplespadded with nulls Theouterjoinof two relations R1andR2

  22. Example of an Outerjoin Climates Accommodations Climates Accommodations

  23. Combining Relations using Outerjoins The outerjoin operator is not associative For more than two relations, the result depends on the order in which the outerjoin is applied In general, outerjoins cannot maximally combine relations (no matter what order is used) Outerjoin is not suitable for combining more than two relations!

  24. Contents • FullDisjunctions • Complexity • Contributions • Algorithms • Algorithm NLOJ for Tree-Structured Schemes • Algorithm PDelayFD for General Schemes • Algorithm BiComNLOJ − Main Algorithm • Experimental Results • Conclusion

  25. Efficiency of Evaluation The full-disjunction operator (as well as other operators like the Cartesian product or the natural join) can generate an exponential(in the input size)number of tuples Polynomial running time is not a suitable yardstick The usual notion: Polynomial time in the combined size of the input and the output

  26. History of Algorithms for Full Disjunctions Source Time Databases g-acyclic RU96 O(n+F2) KS03 O(n5N2F2) general O(n3NF2)“incremental polynomial” CS05 general This paper:linear dependence on F number of relations number of tuples in the DB number of tuples in the FD F is typically very large Can be exponential in the size of the database n: N: F:

  27. Polynomial Delay time One way to obtain an evaluation with a running time linear in the output is to devise an algorithm that acts as an iterator with an efficient next() operator, that is, An enumeration algorithm that runs with polynomial delay An enumeration algorithm runs with polynomial delay if the time between every two successive answers is polynomial in the size of the input

  28. Other Benefits of Polynomial Delay • Incremental evaluation • First tuples are generated quickly • Full disjunctions are large, yet the user need not wait for the whole result to be generated • Suitable for Web applications, where users expect to get the first few pages quickly • In addition, the user can decide anytime that enough information has been shown • Enable parallel query processing • While one processor generates the FD tuples, other processors apply further processing

  29. Contents • Full Disjunctions • Complexity • Contributions • Algorithms • Algorithm NLOJ for Tree-Structured Schemes • Algorithm PDelayFD for General Schemes • Algorithm BiComNLOJ − Main Algorithm • Experimental Results • Conclusion

  30. Main Contributions Substantial improvement over the state-of-art is proved theoretically and experimentally 1.First algorithm for computing full disjunctions withpolynomial delay 2. First algorithm for computing full disjunctions in time linear in the output 3. A general optimizationtechniquefor computing full disjunctions Division into biconnected components

  31. Contents • FullDisjunctions • Complexity • Contributions • Algorithms • Algorithm NLOJ for Tree-Structured Schemes • Algorithm PDelayFD for General Schemes • Algorithm BiComNLOJ − Main Algorithm • Experimental Results • Conclusion

  32. Our Algorithms Combine Algorithm NLOJ Tree Schemes Algorithm PDelayFD GeneralSchemes Division into Biconnected Components Optimization Algorithm BiComNLOJ Main Algorithm− GeneralSchemes

  33. Contents • Full Disjunctions • Complexity • Contributions • Algorithms • Algorithm NLOJ for Tree-Structured Schemes • Algorithm PDelayFD for General Schemes • Algorithm BiComNLOJ − Main Algorithm • Experimental Results • Conclusion

  34. Tree Schemes R1 R5 R2 R3 R6 R7 R4 Scheme graphs w/o cycles In the scheme graph, the relation schemes are the nodes and there is an edge between every two schemes with one or more common attributes

  35. Left-Deep Sequence of Outerjoins AlgorithmNLOJ (Nested Loop OuterJoin) R: a set of relations with a tree scheme R1,…,Rn: a connected-prefix order of R Proposition: FD(R) = (…((R1R2) R3) …) Rn 1. Compute a connected-prefix order of R 2. Apply outerjoins in a left-deep order

  36. Connected-Prefix Order of Relations Aconnected-prefixorder of relations: Each prefix forms a (connected) subtree R1 R5 R2 R3 R6 R7 R4 R1 R3 R2 R7 R4 R5 R6

  37. Achieving Polynomial Delay R2 R3 Rn-1 Rn Already exponential size! AlgorithmNLOJ (Nested Loop OuterJoin) 1. Compute a connected-prefix order of R 2. Apply outerjoins in a left-deep order R1 … Problem: exp. delay Solution: use iterators

  38. Iterators To obtain polynomial delay, we use iterators • Operate on top of an enumeration algorithm • Implement next() by controlling the execution Algorithm Iterator next()

  39. Using Iterators for Outerjoins Rn-1 Iterator 1 Iterator 2 R2 R3 Rn Iterator n-1 Iterator n R1 …

  40. Outerjoins are not Always Applicable It is not always possible to formulate a full disjunction as a left-deep sequence of outerjoins Rajaraman and Ullman[PODS 96]: Some full disjunctions cannot be formulated as expressions of outerjoins (i.e., with arbitrary placement of parentheses)

  41. Contents • Full Disjunctions • Complexity • Contributions • Algorithms • AlgorithmNLOJfor Tree-Structured Schemes • Algorithm PDelayFDforGeneralSchemes • Algorithm BiComNLOJ − Main Algorithm • Experimental Results • Conclusion

  42. About the Algorithm • Unlike NLOJ, the next algorithm, PDelayFD, is applicable to all schemes (and not just trees) • Algorithm PDelayFD has a polynomial delay, but the delay islargerthan that of NLOJ • Nevertheless, PDelayFD by itself is a significant improvement over the state-of-art

  43. Shifting a Maximal JCC Tuple Set T t-shifting T: T 1.Add t to T 2.Extractmax. JCC subset containing t 3. Extend to a maximal JCC set t-shift of T t t t

  44. Algorithm PDelayFD Theorem: Validate that the t-shift is not already in Q orC 1. Generate a max. JCC set T0 2.Insert T0 into Q PDelayFD(R) computes FD(R)with polynomial delay C Q Repeat until Q is empty: 1. Move some T from Q toC 2.Print the join ofT, padded with nulls 3.Insert into Qa t-shift of Tfor all tuples t in the database … Output:

  45. Contents • Full Disjunctions • Complexity • Contributions • Algorithms • AlgorithmNLOJfor Tree-Structured Schemes • AlgorithmPDelayFD for General Schemes • AlgorithmBiComNLOJ− Main Algorithm • Experimental Results • Conclusion

  46. NLOJ vs. PDelayFD R3 R3 R2 R7 R2 R7 R8 R10 R1 R8 R10 R1 R6 R4 R5 R9 R6 R4 R5 R9 R3 R2 R7 R8 R10 R1 R6 R4 R5 R9 ? PDelayFD NLOJ • Shorter delays • Less space • Simpler to impl. Our approach: divide and conquer

  47. Biconnected Components R1 R2 R5 R4 R3 R6 R8 R7 R9 R1 R5 R2 R3 R8 R6 R7 R4 Biconnected component: A maximal subset Bof relations, s.t. the scheme graph has two (or more) disjoint paths between every two relations ofB

  48. Left-Deep Sequence of Outerjoins R: a set of relations Theorem: There exists an (efficiently computable) order B1,…,Bk of the biconnected components ofR, s.t. FD(R) = (…((FD(B1) FD(B2)) …) FD(Bk) Optimized Algorithm: 1. Compute the biconnected components of R 2. Compute the full disjunction of each component 3. Apply outerjoins in a suitable order

  49. BiComNLOJ: a Naïve Attempt Iterator Iterator Iterator Each FD(Bi) can be exponential in the input 1.DivideRinto biconnected components →B1,…Bk in a suitable order Non-polynomial delay! 2.ComputeFD(B1),…,FD(Bk) − using PDelayFD 3. Using NLOJ, compute (…((FD(B1) FD(B2)) …) FD(Bk) Solution:

  50. Retaining Polynomial Delay: 1st Problem R2 R6 For simplification, assume only two components R1 R3 R5 R7 R4 R8 B1 B2 • After generating a tuple t of FD(B1), we need to generate all tuples of FD(B2) that can join t • Non-polynomial delay if all of FD(B2) is computed for finding these tuples! • Solution: • PDelayFD can be modified so that it generates only those tuples of FD(B2) that can join t Details in the proceedings…

More Related