E N D
Lecture 5 on Query Optimization This lecture demonstrates the techniques of optimizing SQL query command for efficiency performance. The optimizer will transform SQL query into an equivalent SQL query in the form of relational algebra with less cost. It shows how to apply heuristics rules in reducing the cost of query.
Query Optimizer We can interpret an expression of relational algebra not only as the specification of the semantics of a query, but also as the specification of sequence of operations. From this viewpoint, two expressions with the same semantics can describe two different sequences of operations. Given: Relation EMP(Empnum, Name, Sal, Tax, Mgrnum, Deptnum) Relation DEPT(Deptnum, Name, Area, Mgrnum) Relation SUPPLIER(Snum, Name, City) Relation SUPPLY(Snum, Pnum, Deptnum, Quan) PJ NAME, DEPTNUM SL DEPTNUM=15 EMP = SLDEPTNAM=15 PJ NAME, DEPTNUM EMP condition SL DEPTNUM PJ DEPTNUMEMP (projected data must be in selected data) Are equivalent expressions but define two different sequences of operations.
The operator tree of an expression of relational algebra can be regarded as the parse tree of the expression itself, assuming the following grammar: R -> identifier R -> (R) R -> un_op R R -> R bin_op R Un_op -> SLF | PJA Bin_op -> CP | UN | DF | JNF | NJNF | SJF | NSJF Two relations are equivalent when their tuples represent the same mapping from attribute names to values, even if the order of attributes is different.
Commutativity of unary operations: U1U2R <->U2U1R • Commutativity of operands of binary operations R B S <-> S B R • Associativity of unary operations: U R <-> U1 U2 R • Distributivity of unary operations with respect to binary operations: U (R B S) -> U(R) B U(S) • Factorization of unary operations (this tranforsmation is the inverse distributivity): U (R) B U(S) -> U (R B S)
Commutativity of unary operations SLF1 SLF2 R → SLF2 SLF1 R SLF1 PJA2 R → PJA2 SLF1 R Attr(F1)A2 Commutativity and Associativity of binary operations R UN S → S UN R R CP S → S CP R R JNF S → S JNF R (R UN S) UN T → R UN (S UN T) (R CP S) CP T → R CP (S CP T) Idempotence of unary operations PJA R → PJA1 PJA2 R AA1 AA2 SLF R → SLF1 SLF2 R F=F1 F2
Distributivity of unary operations SLF (R UN S) → (SLF R) UN (SLF S) SLF (R DF S) → (SLF R) DF (SLF S) SLF (R SJF3 S) → (SLF R) SJF3 (SLFS S) FS=true => result is not empty PJA (R CP S) → (PJAR R) CP (PJAS S) AR=A-Attr(S)=R.A AS=A-Attr(R)=S.A Factorization of unary operations from binary operations (PJAR R) CP (PJAS S) → PJA (R CP S) A=ARAS (SLFR R) JNF1 (SLFS S) → SLF (R JNF1 S) F=FR ∧ FS
Qualified Relations A qualified relation is a pair [R: qR] where R is a relation called the body of the qualified relation and qRis a predicate called the qualification of the qualified relation, for example horizontal fragments are qualified relations in which the qualification corresponds to the partitioning predicate. As a result, [R: qR] is an unary operation such as selection (horizontal fragmentation) and/or projection (vertical fragmentation) to relation R.
Algebra of qualified relations The application of an unary operation to qualified relation R Un_op[R: qR1] → Un_op R: qR2 produces a relation un_op R as its body and the predicate qR2 as its qualifications. On the left hand side, we apply qualification qR1 followed by un_ary operation. On the right hand side, we apply Un_ary operation first, followed by qualification qR2. Rule 1: SLF[R: qR] → [SLFR: F ANDqR] (horizontal fragmentation) F holds all the tuples as well as qR
Rule 2: PJA[R: qR] → [PJAR: qR] (vertical fragmentation)
Horizontal Fragmentation Vertical Fragmentation
Rule 3: [R: qR] CP [S: qS] → [R CP S: qR AND qS] Two qualifications apply to disjoint attributes of R CP S:
Properties to simplify an operator tree • R NJN R R • R UN R R • R DF R 0 • R NJN SLF R SLF R • R UN SLF R R • R DF SLF R SLNOT F R • (SLF1 R) NJN (SLF2 R) SLF1 AND F2 R • (SLF1 R) UN (SLF2 R) SLF1 OR F2 R • (SLF1 R) DF (SLF2 R) SLF1 AND NOT F2 R
Sub_expression of a query is empty • SLF (0) 0 • PJA (0) 0 • R CP 0 0 • R UN 0 R • R DF 0 R • 0 DF R 0 • R JNF 0 0 • R SJF 0 0 • 0 SJF R 0 Where 0 is an empty set
Criterions for Query Optimization • Use idempotence of selection and projection to generate appropriate selection and projection for each operand relation. • Push selections and projections down in the tree as far as possible. • Push selections down to the leaves of the tree, and then apply them using the algebra of qualified relations, substitute the selections result with the empty relation if the qualification of the result is contradictory. • Use the algebra of qualified relations to evaluate the qualification of operands of joins. Substitute the subtree, including the join and its operands, with the empty relation if the qualification of the result of the join is contradictory. • In order to distribute joins which appear in the global query, unions (representing fragment collections) must be pushed up, beyond the joins that we want to distribute.
A modified operator tree for query Q1by criterion 1 (use of idempotence) Note: R DF SLF R SL NOT F R
Decompose query Q2 by criterion 2(push selection/project down in the tree) • Q2: give the names of employees who work in a department whose manager has number 373 but who do not earn more than $35,000 PJEMP.NAME((EMP JN DEPTNUM=DEPTNUM SL MGRNUM=373 DEPT) DF (SLSAL>35000 EMP JN DEPTNUM=DEPTNUM SL MGRNUM=373 DEPT))
Decompose query Q3 by criterion 3(eliminate empty set)Q3: SL DEPTNUM=1 DEPT Given: DEPT = DEPT1: DEPTNUM<10 UN DEPT2: 10<DEPTNUM<20 UN DEPT3:DEPTNUM>20
Decompose query Q3 by criterion 3(eliminate empty set)Q3: SL DEPTNUM=1 DEPT
Decompose query Q4 by criterion 4(eliminate irrelevant joins) and 5(push union up in the tree)Q4: PJ SMUM (SUPPLY NJN SUPPLIER) Given: SUPPLIER => [SUPPLIER1:CITY=“SF”] UN [SUPPLIER2:CITY=“LA”] SUPPLY => [SUPPLY1:Snum=SUPPLIER1.Snum] UN [SUPPLY2:Snum=SUPPLIER2.Snum]
Decompose query Q4 by criterion 4(eliminate irrelevant joins) and 5(push union up in the tree)Q4: PJ SMUM (SUPPLY NJN SUPPLIER)
Decompose query Q4 by criterion 4(eliminate irrelevant joins) and 5(push union up in the tree)Q4: PJ SMUM (SUPPLY NJN SUPPLIER)
Lecture summary Many heuristics rules can be applied to reduce the cost of SQL query as follows: Idempotence of select and project. Push select and project down in the operator tree. Qualify relation first for eliminating contradictory relation(s). Evaluate join operations with qualified relation(s). Push Union operation up in the operator tree.
Review Question 5 (1) Discuss the reasons for converting SQL queries into relational algebra queries before optimization is done. (2) Show an example of qualifying following Relation R (Key, Attribute 1, Attribute 2) by Horizontal fragmentation Vertical fragmentation Into relations R1 and R2. (3) Show how to reconstruct relations R1 and R2 back into relation R from (a) Horizontal fragmentation (b) Vertical fragmentation
Make-up Tutorial Question 5 Given: Relation PATIENT (Pnum, Name, Dept, Treat, Dnum) Relation CARE (*Pnum, Drug, Quan) A query search is needed for the names of the patients who are taking ‘asprin’ drug. List the Query • in SQL (20%) • in relational algebra (20%) • in operator tree (20%) (iv) Given its qualified relations (fragments) as follows: Relation PATIENT1 = SL Dept=’surgery” and Treat=”intensive” PATIENT Relation PATIENT2 = SL Dept=’surgery” and Treat”intensive” PATIENT Relation PATIENT3 = SL Dept’surgery” PATIENT Relation CARE1 = CARE SJ Pnum=Pnum PATIENT1 Relation CARE2 = CARE SJ Pnum=Pnum PATIENT2 Relation CARE3 = CARE SJ Pnum=Pnum PATIENT3 Translate the query in into fragment queries (20%) and simplify them for optimization. (20%)
Reading Assignment Chapter 15 Algorithm for Query Processing and Optimization of “Fundamentals of Database Systems” by Elmasri and Navathe, 5th edition, Pearson, 2007.