1 / 23

RankSQL: Query Algebra and Optimization for Relational Top-k Queries

RankSQL: Query Algebra and Optimization for Relational Top-k Queries. AUTHORS: Chengkai Li Kevin Chen-Chuan Chang Ihab F. Ilyas Sumin Song Presenter: Roman Yarovoy October 3, 2007. Before RankSQL.

norton
Download Presentation

RankSQL: Query Algebra and Optimization for Relational Top-k Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RankSQL: Query Algebra and Optimization for Relational Top-k Queries AUTHORS: Chengkai Li Kevin Chen-Chuan Chang Ihab F. Ilyas Sumin Song Presenter: Roman Yarovoy October 3, 2007

  2. Before RankSQL • Ranking (top-k) queries: Query result is sorted by rank and limited to top k results. • Support for ranking was lacking from RDBMS. • Previously, isolated cases of top-k query processing were studied. • No way to integrate top-k operations with other relational operations.

  3. Previous (traditional) approach • Query processing without ranking support: • Evaluate select-project-join (SPJ) query and materialize the result. • Sort the result according to a given ranking function. • Take only top k tuples. • Associated problems: • No interest in total order of all the results. • Evaluating ranking function(s) can be expensive.

  4. Key contribution Li et al. proposed: Extending relational algebra to support ranking as a first-class database construct. Consequence: Rank-aware relational query engine  Rank-aware query optimization.

  5. R T Top-k query: Example 1

  6. Example 1 (cont’d) SELECT * FROM R r, T t WHERE r.a1=t.b1 AND r.a2>t.b2 ORDER-BY p1+p2+p3+p4+p5 LIMIT 2 (where F = p1+p2+p3+p4+p5)

  7. Rank-relational algebra • There was no way to express such query in relational algebra. • Extend relational algebra by adding rank as a first-class operation. • Based on the observations of first-class constructs (eg. selection), two requirements are needed to support ranking: • Splitting – Predicate-by-predicate rank evaluation. • Interleaving – Swapping rank operator with other operators (i.e. ranking is not only applied after filtering).

  8. Ranking Principle • Def: Given a ranking function F and a set of evaluated predicates P={p1, p2, … , pn}, maximal-possible score of a tuple t is defined as: • Ranking Principle: If FP[t1] > FP[t2], then t1 must be ranked before t2.

  9. Rank-Relation Def: For monotonic scoring function F(p1, …, pn) and a subset P of {p1, …, pn}, a relation R augmented with ranking induced by P is called a rank-relation, denoted by RP. • Implicit attribute of RP is the score of tuple t, that is FP[t]. • Order relationship of RP : • For all t1, t2ЄRP : t1 < RP t2↔FP[t1] < FP[t2]

  10. Operators of rank-relations • Rank (or μ) operator “adds” a predicate p to set P. • i.e. μp(RP) ≡R P U{p}. • Example 2: μp1(R{p2}) ≡ R{p1, p2}, where F=∑(p1, p2, p3).

  11. Extended operators

  12. Example 3: Extended Join πa1,a2,b2(σc (R{p1, p2 p3} JOIN T{p4, p5})) SELECT r.a1, r.a2, t.b1 FROM R r, T t WHERE c ORDER-BY F LIMIT 2 (F = ∑ P and c = r.a1+r.a2 < t.b1)

  13. Extended operators (cont’d) • Note: • Cartesian product is defined similarly to join, but not discussed in the paper. • Projection operator π has not changed. • Computation is based on both Boolean and ranking logical properties. • Perform Boolean operations and maintain the order induced by all given ranking predicates.

  14. Equivalence relations • In the extended rank-relational model, ranking is a first-class construct. • Can derive algebraic equivalences from the definitions of operators (Proofs are omitted). • Example 4: • σc(RP) ≡ (σcR)P • RP1 ∩ TP2 ≡ (R ∩ T)P1 U P2 • Thus, we can interleave the rank operator with other operators (i.e. push μ down across operators).

  15. Equivalence relations (cont’d)

  16. Equivalence relations (cont’d) • Note: • Proposition 1 states that ranking can be done in stages (i.e. one predicate at the time). • By Propositions 2, 3, and 4, the relations hold commutative and associative laws. • By Propositions 4 and 5, μ can be swapped with other operators.

  17. Incremental execution • Blocking operators (eg. sort) lead to materialization of intermediate results. • Goal: To avoid materialization and implement a pipelining execution strategy. • We want to split rank computation into stages and to reduce the number of tuples considered in the upcoming stages. • We can output (i.e. advance to the next stage) a tuple t, whenever t has a score which is greater or equal to the score of any future tuple t′′ .

  18. Incremental execution (cont’d) • Apply μp to RP and maintain priority queue ordered by P U{p}. • Let X = set of tuples from preceding stage. • Draw t′ from X. • If FP U{p}[t] ≥ FP[t′] andFP[t′] ≥ FP[t′′] for any future t′′ drawn from x, then FP U{p}[t] ≥ FP U{p}[t′′] and t can be output (proceed to next stage).

  19. Example 5: Top 2 of W • Given F = AVG(p6, p7, p8) • idxScanp6(W) μp7μp8

  20. Different evaluation plans • There exist algorithms to implement rank-aware operators as well as incremental evaluation. • Efficiency of query evaluation will now depend not only on the regular operators, but also on the rank-aware operators. • Due to algebraic equivalence laws, we can define additional evaluation plans. • Hence, we want a query optimizer to take additional execution plans into consideration.

  21. Rank-aware optimizer • Extended algebra  Extended search space. • Impact on enumeration algorithm: • Li et al. designed a 2-dimension enumeration algorithm: Dimension 1 = Join size, Dimension 2 = Ranking predicates. • The algorithm is exponential in both dimensions. • Heuristics applied to reduce search space. • Impact on cost model: • For ranking queries, it is more difficult to estimate the query cardinality of the intermediate results, whose accuracy is the core of the cost model. • Authors proposed to estimate cardinality by randomly sampling tuples.

  22. Critique • Erroneous examples. • No example of “tie-breaking” function. • Bad explanation of incremental evaluation.

  23. Future research directions • Cardinality estimation: New/improved techniques for random sampling over joins. • Dynamically determined/chosen k. • Exploring physical properties of rank-aware execution plans.

More Related