210 likes | 333 Views
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book. Krishna Vellanki 124. Introduction. What is Query Processor? Group of components of a DBMS that converts a user queries and data-modification commands into a sequence of database operations
E N D
CS 257Chapter – 15.9 Summary of Query ExecutionDatabase Systems: The Complete Book Krishna Vellanki 124
Introduction What is Query Processor? • Group of components of a DBMS that converts a user queries and data-modification commands into a sequence of database operations • It also executes those operations • Must supply detail regarding how the query is to be executed
Building Blocks of Query processing • Query Execution: • The algorithms that manipulate the data of the database. • Focus on the operations of extended relational algebra. 3
Outline of Query Compilation 4 Query compilation • Parsing: A parse tree for the query is constructed • Query Rewrite: The parse tree is converted to an initial query plan and transformed into logical query plan (less time) • Physical Plan Generation: Logical Q Plan is converted into physical query plan by selecting algorithms and order of execution of these operator.
Scanning Tables 5 • One of the basic thing we can do in a Physical query plan is to read the entire contents of a relation R. • Variation of this operator involves simple predicate, read only those tuplesof the relation R that satisfy the predicate. • Basic approaches to locate the tuplesof a relation R • Table Scan • Relation R is stored in secondary memory with its tuples arranged in blocks • It is possible to get the blocks one by one • Index-Scan • If there is an index on any attribute of Relation R, we can use this index to get all the tuples of Relation R
Sorting While Scanning Tables 6 Number of reasons to sort a relation • Query could include an ORDER BY clause, requiring that a relation be sorted. • Algorithms to implement relational algebra operations requires one or both arguments to be sorted relations. • Physical-query-plan operator sort-scan takes a relation R, attributes on which the sort is to be made, and produces R in that sorted order
Parameters for Measuring Costs 7 • Parameters that affect the performance of a query • Buffer space availability in the main memory at the time of execution of the query • Size of input and the size of the output generated • The size of memory block on the disk and the size in the main memory also affects the performance • B: The number of blocks are needed to hold all tuples of relation R. Also denoted as B(R). • T is the number of tuples in relation R, also denoted as T(R). • V: The number of distinct values that appear in a column of a relation R • V(R, a)- is the number of distinct values of column for a in relation R
One-Pass Algorithms for Database Operations The choice of an algorithm for each operator is an essential part of the process of transforming a logical query plan into a physical query plan. • Main classes of Algorithms: • Sorting-based methods • Hash-based methods • Index-based methods • Division based on degree difficulty and cost: • 1-pass algorithms • 2-pass algorithms • 3 or more pass algorithms
One-Pass Algorithm Methods 9 One-Pass Algorithms for Tuple-at-a-Time Operations: selection and projection One-Pass Algorithms for Unary, fill-Relation Operations: Duplicate Elimination and Grouping One-Pass Algorithms for Unary, fill-Relation Operations: Binary operations including Union, Intersection, Difference, Product and Join
Nested Loop Joins • Used for relations of any side. • Not necessary that relation fits in main memory • Uses “One-and-a-half” pass method in which for each variation: • One argument read just once. • Other argument read repeatedly. • Two kinds: • Tuple-Based Nested Loop Join • Block-Based Nested Loop Join
Improvement & Modification To decrease the cost • Method 1: Use algorithm for Index-Based joins • We find tuple of R that matches given tuple of S • We need not to read entire relation R • Method 2: Use algorithm for Block-Based joins • Tuples of R & S are divided into blocks • Uses enough memory to store blocks in order to reduce the number of disk I/O’s.
Physically Unrealizable Behaviors Read too Late Transaction T tries to read too late
Write too Late Transaction T tries to write too late
Problem with dirty data T could perform a dirty read if it is reads X
A write is cancelled because of a write with a later timestamp, but the writer then aborts
Two passed Algorithm based on hashing Hashing is done if the data is too big to store in main memory buffers. • Hash all the tuples of the argument(s) using an appropriate hash key. • For all the common operations, there is a way to select the hash key so all the tuples that need to be considered together when we perform the operation have the same hash value. • This reduces the size of the operand(s) by a factor equal to the number of buckets.
Steps to be followed for a Two passed Algorithm based on hashing • Duplicate Elimination • Grouping and Aggregation • Union, Intersection, and Difference • Hash-Join Algorithm
Sort based Vs Hash based • For binary operations, hash-based only limits size to min of arguments, not sum • Sort-based can produce output in sorted order, which can be helpful • Hash-based depends on buckets being of equal size • Sort-based algorithms can experience reduced rotational latency or seek time
15.6Index based Algorithms • Clustered Relation: Tuples are packed into roughly as few blocks as can possibly hold those tuples • Clustering indexes: Indexes on attributes that all the tuples with a fixed value for the search key of this index appear on roughly as few blocks as can hold them • A relation that isn’t clustered cannot have a clustering index • A clustered relation can have nonclustering indexes