180 likes | 269 Views
Query Processing – Query Trees. Evaluation of SQL. Conceptual order of evaluation Cartesian product of all tables in from clause Rows not satisfying where clauses eliminated Rows grouped by Groups not satisfying having eliminated Select clause target list evaluated
E N D
Evaluation of SQL • Conceptual order of evaluation • Cartesian product of all tables in from clause • Rows not satisfying where clauses eliminated • Rows grouped by • Groups not satisfying having eliminated • Select clause target list evaluated • If distinct eliminate duplicate rows • Union after each subselect evaluated • Rows sorted in order by
Actual Order of Evaluation • Order chosen by query optimizer • Determines most efficient way • Syntax checking phase • scan - identify tokens of text • parse- check syntax • validate - check attributes and relation names • Query optimization phase • create internal representation - query tree • identify execution strategies - plan • Maintains statistics for tables and columns, indexes • choose suitable plan in query to optimize query e.g. order of execution of ops, use indexes, etc.
Evaluation cont’d • Execution phase • query optimizer produces execution plan • code generator generates code • runtime db processor runs query code • To minimize run time • chosen strategy NOT optimal, but reasonably efficient For procedural languages limited need for query optimization
Heuristics in Query Optimization • Apply select and project before join or other binary operations. Why? • select and project reduce size • strategy is obvious, but challenge was to show could be done with rules
Query Optimization • Canonical form (initial query tree - conceptual order of evaluation) • Leaf nodes are tables • Internal nodes are operations • Begin by separating select conditions from joins (end up with X) • Combine all selects then all projects • transform to final query tree using general transformation rules for relational algebra
Query tree for SQL query Select lname From employee, works_on, project Where pname=‘Aquarius’ and pnumber=pno and essn=ssn and bdate > ‘1957-12-31’
General Transformation Rules for Relational Algebra • Cascade of s • Commutativity of s • Cascade of p • Commuting s with p • Commutativity of |X| (and X) • Commuting s with |X| (or X) • Commuting p with |X| (or X)
General Transformation Rules for Relational Algebra 8. Commutativity of set operations U and ∩ 9. Associativity of |X|, X, U and ∩ 10. Commuting s with set operations 11. The p operation commuted with U 12. Converting a (s, X) sequence into |X|
Outline of a Heuristic Algebraic Optimization Algorithm • Use Rule 1 break up conjunctive s’s into cascades of s’s • Use Rules 2,4,6, 10 for commutativity of s to: • move s’s as far down tree as possible • Use Rules 5 and 9 for commutativity and associativity of binary operations to: • Place most restrictive s (and |X|) so executed first • fewest tuples, smallest absolute size or smallest selectivity • But make sure no cartesian products result
Outline of a Heuristic Algebraic Optimization Algorithm • Use Rule 12, combine Cartesian product with s to: • create |X| • Use Rules 3, 4, 7, 11 concerning cascade of p’s and commuting p with other ops to: • move down tree as far as possible • Identify subtrees that represent groups of operations than can be executed by a single algorithm
Summary of Heuristics • Apply first operations that reduce size of intermediate results • Perform s’s and p’s as early as possible (move down tree as far as possible) • Execute most restrictive s and |X| first (reorder leaf nodes but avoid cartesian products)
Multiple table joins • Multiple table joins • Query plan identifies most efficient ordering of joins • may not have to materialize actual table resulting from join • instead use pipelining - successive rows output from one step fed into next plan
Converting trees into Query plans • pipelined evaluation • Forward result from an operation directly to next operation • Result from s, placed in buffer • |X| consumes tuples from buffer • Result from |X| pipelined to p
Query Tree Question • Should we do a ppname, pnumber then spname = ‘Aquarius’ then ppnumber ? • No, since the operations are done together • the processor would read a row of project, see if pname = ‘Aquarius’ then use pnumber to perform the join.
Algorithms • DBMS has general access algorithms to implement select, join, or combinations of ops • Don't have separate access routines for each op • Creating temporary files is inefficient • Generate algorithms for combinations of operations • join, select, project - code is created dynamically to implement multiple operations
Materialized table • Think about what operations require utilizing a materialized table • Input to select? • Input to project? • Input to join? • How is this implemented? Next topic