350 likes | 526 Views
Chapter 7. Query Processing. Seoul National University Department of Computer Engineering OOPSLA Lab. CONTENTS. Introduction Traversal of Nodes in a Query Graph Cost Model. Introduction. Phases of Query Processing Query Graph Similarities between OO and Relational Queries Join Methods.
E N D
Chapter 7. Query Processing Seoul National University Department of Computer Engineering OOPSLA Lab.
CONTENTS • Introduction • Traversal of Nodes in a Query Graph • Cost Model
Introduction • Phases of Query Processing • Query Graph • Similarities between OO and Relational Queries • Join Methods
Object Algebra Expression Standardized Expression Declarative Query Declarative Optimizer Translator into Algebraic Structure Type Check Execution of the Chosen Plans Type Consistent Expression Optimized Expression Algebraic Optimizer Generator of Execution Plans Phases of Query Processing • Evaluating relational query in two phases • first phase for possible optimization • second phase for actual execution • Only small changes for OO query processing
Terminology Associated with Query • Simple predicate • <name-attribute operator value> • value can be a primitive or an object • Complex predicate • has an adjacent sequence of attributes on a branch of the aggregation hierarchy of a class • Single operand query • Multiple operand query • Target class
Query Graph in OODB • A subset of the schema graph(SG) • Definition of Query Graph (QG) for a single operand query on class C • QG is a connected subgraph of SG for C • QG includes only nodes of SG used in the query • an arc from attr. to a domain in SG can be changed into an arc to a subclass of the domain in QG • the set of arcs from C to the domains of the attributes of C form a direct graph rooted in C • leaf nodes of acyclic branch have simple predicate, whereas others can have simple or complex predicates
Example of Query Graph select :T from Task :T where :T man_years > 2 and :T leader specialization IN {“DB”,”IS”} and :T participant group_name LIKE ‘%CAD%’
Single Operand Query • In relational data model • on a single relation • In OO data model • aggregation hierarchy rooted on the target class • implicit join • inheritance hierarchy rooted on the target class • more powerful than a query on a single relation
Explicit Join • Multiple operand query • Joins between arbitrary pairs of classes • Definition of explicit join between attr. Si, Sj of class Ci, Cj • Ci, Cj are both the root of QGs that correspond to a single operand query on Ci , Cj • QGs for Ci, Cj partially overlap and Si, Sj are both the root of a hierarchy of classes • Query can have more than one target class
Similarities between OO and Relational Queries • Structure of queries is essentially same except for class hierarchies • Implicit and explicit joins between the classes are equivalent to joins in relations • Relational query processing techniques can be used on OO data model • all reasonable permutations between the relations • generation of an execution plan for each permutation • evaluation of the cost of each execution plan • in distributed DB, generation of a global plan
Join Methods considering Class Permutation • Forward order join where class D is the domain of // attribute A of class C • Reverse order join for c in C do begin retrieve c.A; retrieve d in D such that OID(d) = c.A if A is single-valued OID(d) c.A if A is multi-valued; evaluate the predicate on d; end. for d in D do begin retrieve u = OID(d); evaluate the predicate on d; retrieve c in C such that c.A = u if A is single-valued uc.A if A is multi-valued ; end.
Traversal of Nodes in a Query Graph • Visiting Classes in Query Graph • Access Algorithms to Instances of Classes • Query Execution Strategies • Query Execution Graph • Query Optimization • Differences in Query Processing compared with the Relational Model
Methods of Visiting Classes in QG(1) • Forward traversal • C1 is the root class of QG • any depth-first order in QG • ex) (Task Group Researcher) (Task Researcher Group) • Reverse traversal • C1 is a leaf class of QG • Ci can be a leaf class of QG • all the children classes must be visited previously than the parent class • ex) (Group Researcher Task) (Researcher Group Task)
Methods of Visiting Classes in QG(2) • Mixed traversal • C1 is any class of QG • Ci is connected to one of C1,..., Ci-1 by a direct arc towards Ci • Ci and one of C1 ,…, Ci-1 have a common parent in QG • one or more children class of Ci must be in C1,…, Ci-1 • ex) (Group Task Researcher) (Researcher Task Group)
Access Algorithms to Instances of Classes • Nested-loop access method • each instance is processed separately and completely • attributes of an instance are evaluated with concerned predicates • the instance is passed to the parent or child class • Sort-domain (or sort-merge) methods • all instances are instantiated and processed together • they are passed all together to the next proper node • storage pages of instances are accessed only once
Query Execution Strategies • Combining strategies for traversing QG and accessing instances • Nested-loop forward traversal(NLFT) • Sort-domain forward traversal(SDFT) • Nested-loop reverse traversal(NLRT) • Sort-domain reverse traversal(SDRT)
Basic Operations Composing Query • [P] projection • [S] selection • [] intersection • [I] index scanning • [NJ] nested-loop join • [SJ] sort-domain join • [O] arrangement
An Example of Query SELECT Task WHERE man_year > 2 AND Group.group_name LIKE ‘%CAD%’ AND Researcher.specialization IN {“DB”,”IS”}
Query Execution Graph by NLFT group_name LIKE ‘%CAD%’ Man_year>2 JP(1) Group Task S NJ S Specialization IN {“DB” , “IS”} JP(2) Researcher Group P NJ S {Task.OID} {Task.OID} Researcher P JP(1):Task.participant=Group.OID JP(2):Task.leader=Researcher.OID
Query Execution Graph by NLRT {Group.OID} Group S P Group_name LIKE ‘%CAD%’ {Task.OID} man_year>2 NJ P S P {Task.OID} Task JP(1) Specialization IN {“DB” , “IS”} {Researcher.OID} Researcher S P {Task.OID} man_year>2 NJ P S P {Task.OID} JP(2) Task JP(1):Task.participant=Group.OID JP(2):Task.leader=Researcher.OID
Considerations on Query Optimization • Too expensive to generate all possible execution plans • To reduce the complexity of choosing the best strategy • analyze the structure of DB • collect statistics on the frequencies of access • Use of methods in queries • difficult to estimate the cost and selectivity • possible to depend the evaluation order
Differences in Query Processing compared with the Relational Model • Indices • on aggregation hierarchy • on inheritance hierarchy • Multiple-valued attributes • existential or universal quantifiers • Methods on queries • Database statistics • number of instances along class hierarchy • number of pages containing such instances
Cost Model • Mathematical functions to determine the cost of the various types of traversal • Definition of Path • Model Parameters • Cost Functions
Definition of Path • Given an aggregation hierarchy H,path P is defined as C1.A1.A2....An(n1) where • C1is a class in H • A1 is an attribute of class C1 • Ai is an attribute of class Ci in H,such that Ci is the domain of attribute Ai-1 of class Ci-1(1<in) • length(P) : the length of the path • classes(P) : the set of classes along the path • dom(P) : the domain of attribute Anof class Cn
Logical Data Parameters(1) • nci :number of classes in the inheritance having class Ci-1 • Dij : average number of distinct values for attribute Aiof class Ci,j • Di :average number of values for attribute Ai, evaluated W.R.T all the members of class Ci,1 • Ni,j :cardinality of class Ci,j • Nhi :cardinality of the set of members of class Ci,1
Logical Data Parameters(2) • fani,j : average number of references to members of class Ci+1,1 contained in attribute Ai of an instance of class Ci,j • fani : average number of references to members of class Ci+1,1 contained in attribute Ai of a member of class Ci,1 • di,j : average number of instances in class Ci,j • di : average number of members of class Ci,1 • ki,j : average number of instances in class Ci,j • khi : average number of members of class Ci,1
Physical Data Parameters • Pij : number of pages for the instances of class Ci,j • Phi :number of pages for the members of class Ci,1 • ri : binary variable which assumes the value • 1, if the members of Ci,1 have reverse reference to the members of Ci-1,1 • 0, if not, 2 i n
Query Parameters • NIi : number of members of class Ci,1 on which the nested predicate must be evaluated • APi : number of pages effectively accessed to find the members of class Ci,1
Derived Parameters • RefBy(i, s, y, k) : average # of values contained in Ay for a set of kinstances of Ci,s • RefByh(i, s, y, k) : ... for a set of kmembers of Ci,s • : average number of instances of class Ci,j • : average number of members of class Ci,1 • Ref(i, y, s, k) : average # of instances of class Ci,s with avalue in a set of k elements, 1 i y n, as the value of the nested Attr. Ay • Refh(i, y, k) : average # of members ...
Assumptions for Cost Model • Independent distributions of values of the various attributes • Uniform distribution of the values of the each attribute • No relation between the cardinality of the instances of a class and that of the other classes in the same inheritance hierarchy
Cost Function • Nested predicate C1.A1...An op exp • op is a relation operator • exp is an expression • evaluated with a forward or reverse traversal • Types of distribution of the values of attribute Ai • disjunctive distribution • each value of is Ai used in a single class • inclusive distribution
NLFT Cost Functions 1. 2. 3. Assume the total # of instances = # of pages APk = NIk 4. Total Cost
SDFT Cost Functions 1. 2. 3.Total cost
NLRT Cost Functions 1. 2. 2.1. 2.2. 3. 4. Total cost
SDRT Cost Functions 1. 1.1 2. 2.1 3. Total cost