180 likes | 323 Views
CS 257, Spring’08 Presented By: Gayatri Gopalakrishnan ID : 201. Agenda. Query Compilation Physical-Query-Plan operators Scanning Tables Sorting while Scanning Parameters for Measuring cost I/O Cost for Scan Operators Iterators for Implementation of Physical Operators.
E N D
CS 257, Spring’08 Presented By: Gayatri Gopalakrishnan ID : 201
Agenda • Query Compilation • Physical-Query-Plan operators • Scanning Tables • Sorting while Scanning • Parameters for Measuring cost • I/O Cost for Scan Operators • Iterators for Implementation of Physical Operators
Query Processing High Level SQL Query Query Processor Low Level Data manipulation steps
Query Compilation Query Compilation consists of • Parsing – • Parse tree is constructed based on query and its structure • Query Rewrite – • Parse tree converted into the initial query plan. • The initial query plan is an algebraic representation of the query. • Initial query plan is transformed to equivalent query plan which would take lesser time to execute. • Also called Logical Query Plan • Physical plan execution – • Physical Query Plan is generated. • Algorithms selected to implement each operator present in the Logical Query Plan. • After parsing the logical query plan the physical plan is represented by an Expression Tree • Contains details like how the relations are accessed and when they need to be sorted
QueryCompilation SQL Query Parse Query Select Logical Query Plan Select Physical Query Plan Execute Plan
... BDATE LNAME SSN ESSNPNO HOURS PNAME PLOCATION PNUMBER DNUM Example of Transforming a Query • Find the last names of employees born after 1957 who work on a project named ‘Aquarius’. • SELECT LNAME FROM EMPLOYEE, WORKS_ON, PROJECT WHERE PNAME=‘Aquarius’ AND NUMBER=PNO AND ESSN=SSN AND BDATE > ‘1957-12-31’; EMPLOYEE WORKS_ON PROJECT
pLNAME ... PROJECT BDATE LNAME SSN ESSNPNO HOURS PNAME PLOCATION PNUMBER DNUM WORKS_ON EMPLOYEE Initial Tree - Push down select sPNAME=‘Aquarius’ & PNUMBER=PNO & ESNN=SSN & BDATE > 1957-12-31 Select LNAME From EMPLOYEE, WORKS_ON, PROJECT Where PNUMBER=PNO and ESNN=SSN and BDATE > 1957-12-31 and PNAME=‘Aquarius’ X X
pLNAME ESNN=SSN PNUMBER=PNO s BDATE 1957-12-31 EMPLOYEE sPNAME=‘Aquarius’ WORKS_ON PROJECT pESSN pSSN, LNAME pPNUMBER pESSN, PNO
Introduction to Physical Query Plan Operators • Physical query plans are built from physical operators • Physical Operators are implementations of relational algebra operators • There are other kinds of physical operators, that do functions like “SCAN Table”, bring contents of a relation into the main memory • Iterator is a method by which operators of a physical plan can pass tuples among themselves
Scanning Tables • Basic in most physical query plans • Join queries, Union Queries, Queries with a predicate • Two Basic approaches to locate tuples in a relation • Table Scan • Index Scan
Scanning Tables Table Scan • If R is stored on secondary storage • Tuples of R are arranged in blocks • Fetch the blocks one by one Index Scan • Index on an attribute of R • Sparse Index on R can lead us to the blocks holding R • Fetch the tuples based on the Index
Sorting While Scanning Tables • ORDER BY clause requires a Relation R to be sorted • Some operators of relational algebra might require one or both of its arguments to be sorted relations • Physical query plan operator “SORT-SCAN” takes a relation R and specifications of attributes on which the sort is to be made
Implementing SORT-SCAN Operator • If R is sorted by attribute a • R is stored as an indexed sequential file or there is B-tree index on a attribute • Scan of the Index will produce R is sorted order • If R is small enough • Retrieve all tuples into main memory by table or index scan • Use efficient main memory sorting algorithms • If R is too large • Multiway merge approach • Produce sorted block at a time in the main memory as tuples are needed
Parameters for measuring Costs • Memory required to store arguments and intermediate results of the operator • Say M is the number of buffers available for execution of an operator known by the query optimizer • M may be decided during execution, so if the actual availability is less than M, the query takes longer than predicted • Cost of accessing the argument relations • Size and distribution of data in a relation computed periodically to help the optimizer • B( R ) - Gives number of blocks on which the relation is stored • T ( R ) – Gives the number of tuples in relation • V ( R , a) – Number of distinct values that appear in each column • Based on memory and access data parameters the query Optimizer will choose the best physical operator
I/O Costs for Scan Parameters Cost for Table Scan • Assumptions for example • Relation R fits in main memory • Requires a Two phase Multiway merge sort • If R is clustered then number of I/O’s for table scan is 3B • B for reading R in sublists • B for writing out the sublists • B for re-reading the sublists • If R is not clustered then the number of I/O’s can be as much as T + 2B • T for reading R in sublists • B for writing the sublists • B for re-reading the sublists Cost for Index Scan – Examining Index+ Reading the Relation.
Iterators for Implementation of Physical Operators • Iterator is a group of 3 functions • Iterator allows the result of a physical operator to get one Tuple at a time • The three functions in an Iterator • Open • Initializes data structures needed for operation • GetNext • Returns the next Tuple in the result • Adjusts the data structure so that subsequent tuple can be obtained • If No more tuples present, it returns a NotFound • Close • Ends the iteration after all the tuples are obtained
Iterator for Table Scan Open() { b := the first block of B t := the first tuple of block B } GetNext() { IF (t is past the last tuple on block B){ increment B to next block; IF (there is no next block) RETURN NotFound ELSE /* B is a new block */ t := first tuple on block B } /* now we are ready to return t and increment */ oldt= t; increment t to the next tuple of B RETURN oldt; } Close() { }
Thank You References • Databases the Complete Book - by Hector Garcia-Molina (Author), Jeffrey D. Ullman (Author), Jennifer D. Widom (Author) • www.sky.fit.qut.edu.au