RankSQL: Query Algebra and Optimization for Relational Top-k Queries

RankSQL: Query Algebra and Optimization for Relational Top-k Queries AUTHORS: Chengkai Li Kevin Chen-Chuan Chang Ihab F. Ilyas Sumin Song Presenter: Roman Yarovoy October 3, 2007

Before RankSQL • Ranking (top-k) queries: Query result is sorted by rank and limited to top k results. • Support for ranking was lacking from RDBMS. • Previously, isolated cases of top-k query processing were studied. • No way to integrate top-k operations with other relational operations.

Previous (traditional) approach • Query processing without ranking support: • Evaluate select-project-join (SPJ) query and materialize the result. • Sort the result according to a given ranking function. • Take only top k tuples. • Associated problems: • No interest in total order of all the results. • Evaluating ranking function(s) can be expensive.

Key contribution Li et al. proposed: Extending relational algebra to support ranking as a first-class database construct. Consequence: Rank-aware relational query engine  Rank-aware query optimization.

R T Top-k query: Example 1

Example 1 (cont’d) SELECT * FROM R r, T t WHERE r.a1=t.b1 AND r.a2>t.b2 ORDER-BY p1+p2+p3+p4+p5 LIMIT 2 (where F = p1+p2+p3+p4+p5)

Rank-relational algebra • There was no way to express such query in relational algebra. • Extend relational algebra by adding rank as a first-class operation. • Based on the observations of first-class constructs (eg. selection), two requirements are needed to support ranking: • Splitting – Predicate-by-predicate rank evaluation. • Interleaving – Swapping rank operator with other operators (i.e. ranking is not only applied after filtering).

Ranking Principle • Def: Given a ranking function F and a set of evaluated predicates P={p1, p2, … , pn}, maximal-possible score of a tuple t is defined as: • Ranking Principle: If FP[t1] > FP[t2], then t1 must be ranked before t2.

Rank-Relation Def: For monotonic scoring function F(p1, …, pn) and a subset P of {p1, …, pn}, a relation R augmented with ranking induced by P is called a rank-relation, denoted by RP. • Implicit attribute of RP is the score of tuple t, that is FP[t]. • Order relationship of RP : • For all t1, t2ЄRP : t1 < RP t2↔FP[t1] < FP[t2]

Operators of rank-relations • Rank (or μ) operator “adds” a predicate p to set P. • i.e. μp(RP) ≡R P U{p}. • Example 2: μp1(R{p2}) ≡ R{p1, p2}, where F=∑(p1, p2, p3).

Extended operators

Example 3: Extended Join πa1,a2,b2(σc (R{p1, p2 p3} JOIN T{p4, p5})) SELECT r.a1, r.a2, t.b1 FROM R r, T t WHERE c ORDER-BY F LIMIT 2 (F = ∑ P and c = r.a1+r.a2 < t.b1)

Extended operators (cont’d) • Note: • Cartesian product is defined similarly to join, but not discussed in the paper. • Projection operator π has not changed. • Computation is based on both Boolean and ranking logical properties. • Perform Boolean operations and maintain the order induced by all given ranking predicates.

Equivalence relations • In the extended rank-relational model, ranking is a first-class construct. • Can derive algebraic equivalences from the definitions of operators (Proofs are omitted). • Example 4: • σc(RP) ≡ (σcR)P • RP1 ∩ TP2 ≡ (R ∩ T)P1 U P2 • Thus, we can interleave the rank operator with other operators (i.e. push μ down across operators).

Equivalence relations (cont’d)

Equivalence relations (cont’d) • Note: • Proposition 1 states that ranking can be done in stages (i.e. one predicate at the time). • By Propositions 2, 3, and 4, the relations hold commutative and associative laws. • By Propositions 4 and 5, μ can be swapped with other operators.

Incremental execution • Blocking operators (eg. sort) lead to materialization of intermediate results. • Goal: To avoid materialization and implement a pipelining execution strategy. • We want to split rank computation into stages and to reduce the number of tuples considered in the upcoming stages. • We can output (i.e. advance to the next stage) a tuple t, whenever t has a score which is greater or equal to the score of any future tuple t′′ .

Incremental execution (cont’d) • Apply μp to RP and maintain priority queue ordered by P U{p}. • Let X = set of tuples from preceding stage. • Draw t′ from X. • If FP U{p}[t] ≥ FP[t′] andFP[t′] ≥ FP[t′′] for any future t′′ drawn from x, then FP U{p}[t] ≥ FP U{p}[t′′] and t can be output (proceed to next stage).

Example 5: Top 2 of W • Given F = AVG(p6, p7, p8) • idxScanp6(W) μp7μp8

Different evaluation plans • There exist algorithms to implement rank-aware operators as well as incremental evaluation. • Efficiency of query evaluation will now depend not only on the regular operators, but also on the rank-aware operators. • Due to algebraic equivalence laws, we can define additional evaluation plans. • Hence, we want a query optimizer to take additional execution plans into consideration.

Rank-aware optimizer • Extended algebra  Extended search space. • Impact on enumeration algorithm: • Li et al. designed a 2-dimension enumeration algorithm: Dimension 1 = Join size, Dimension 2 = Ranking predicates. • The algorithm is exponential in both dimensions. • Heuristics applied to reduce search space. • Impact on cost model: • For ranking queries, it is more difficult to estimate the query cardinality of the intermediate results, whose accuracy is the core of the cost model. • Authors proposed to estimate cardinality by randomly sampling tuples.

Critique • Erroneous examples. • No example of “tie-breaking” function. • Bad explanation of incremental evaluation.

Future research directions • Cardinality estimation: New/improved techniques for random sampling over joins. • Dynamically determined/chosen k. • Exploring physical properties of rank-aware execution plans.

RankSQL: Query Algebra and Optimization for Relational Top-k Queries

RankSQL: Query Algebra and Optimization for Relational Top-k Queries

Presentation Transcript

Chapter 3: Relational Model

RELATIONAL ALGEBRA and Tuple Calculus

Chapter 3: Relational Model

The History of Algebra

Data Management: Databases and Organizations Richard Watson

CSE544 Query Execution

CMPT 454

SQL – Structured Query Langauge

CT455: Computer Organization Boolean Algebra

Algebra 1

AM18 ASA INTERNALS: QUERY EXECUTION AND OPTIMIZATION

Chapter 3: Relational Model

Chapter 2: Relational Model

Chapter 2

Lecture 11 Introduction to Relational Database

Chapter 23

Module 3: Relational Model

Temple University – CIS Dept. CIS616– Principles of Database Systems

Chapter 3 relational model

Introduction to Database Systems Queries in SQL