290 likes | 497 Views
Csci 8735 – Advanced Database Systems. Paper Presentation by: Pavithra Panneerselvam February 14, 2011. Semantic Optimization Techniques for Preference Queries. Author: Jan Chomicki Department of Computer Science and Engineering University of Buffalo, Buffalo, New York.
E N D
Csci 8735 – Advanced Database Systems Paper Presentation by: PavithraPanneerselvam February 14, 2011
Semantic Optimization Techniques for Preference Queries Author: Jan Chomicki Department of Computer Science and Engineering University of Buffalo, Buffalo, New York
Semantic Optimization • Semantic information present in the form of Integrity Constraints can be used for Semantic Optimization. • Query optimization improves the query execution time. • Integrity constraints are used to ensure accuracy and consistency in data in a database. (eg. UNIQUE, NOT NULL, FOREIGN KEY etc).
Preferences • Preferences are used to filter and personalize the information reaching the users of any information system. • Find the best answers to a query instead of all the answers. • In database systems, preferences are usually captured as preference relations that are used to build preference queries. • Preference operators - Winnow (ωC )selects from the given relation a set of most preferred tuples according to given preference relation.
Preferences (Contd) • Relation: Book • Preference Relation (≻C ): Prefer one Book tuple to another if and only if their ISBNs are the same and the Price of the first is lower. • Preference Operator : Winnow (ωC) return the tuples in table 2.
Problem Statement • In the context of semantic optimization of preference queries, two fundamental semantic properties to be identified: • Containment of preference relations relative to integrity constraints. • Satisfaction of order axioms relative to integrity constraints. • To develop optimizing techniques that: • Remove redundant occurrences of winnow. • Coalesce consecutive applications of winnow. • Recognize when more efficient evaluation of winnow is possible. • Domains considered are infinite domains: • D (un-interpreted constants) • Q (Rational numbers)
Definition 1 Given a relation schema R(A1 · · ·Ak) such that Ui, 1 ≤ i ≤ k, is the domain (either D or Q) of the attribute Ai, A relation ≻ is a preference relation over R if it is a subset of (U1 × · · · × Uk) × (U1 × · · · × Uk). • Irreflexivity: ∀x, x ≠≻ x • Asymmetry: ∀x, y, x ≻ y ⇒ y ≠≻ x, • Transitivity: ∀x, y, z, (x ≻ y ∧ y ≻ z) ⇒ x ≻ z, • Negativetransitivity: ∀x, y, z. (x ≠≻ y ∧ y ≠≻ z) ⇒ x ≠≻ z, • Connectivity: ∀x, y, (x ≻ y) ∨ (y ≻ x) ∨ x = y
Definition 1 -Contd The relation ≻ is said to be: • A strict partial order if it is irreflexive, asymmetric and transitive. • A weak order if it is a negatively transitive strict partial order; • A total order if it is a connected strict partial order.
Definition 2 • A preference formula (pf) C(t1, t2) is a first-order formula defining a preference relation ≻C in the standard sense, namely t1 ≻C t2 iffC(t1, t2). • An intrinsic preference formula (ipf) is a preference formula that uses only built-in predicates.
Domains • Two specific domains, D and Q, are considered. • Two types of variables, D-variables and Q-variables, and two kinds of atomic formulas: • Equality constraints: x = y, x ≠ y, x = c, or x ≠ c, where x and y are D-variables, and c is an un-interpreted constant; • Rational-order constraints: xθy or xθc, where θ ∈ {=, ≠,<,>,≤,≥}, x and y are Q-variables, and c is a rational number.
Indifference Relation and Prioritized Composition Every preference relation ≻C generates an indifference relation ∼C: Two tuples t1 and t2 are indifferent (t1 ∼C t2) if neither is preferred to the other one, i.e., t1 ≠≻C t2 and t2 ≠≻Ct1. • Complex preference relations can be easily defined using Boolean connectives. • Here we define a special operator: prioritized composition. The prioritized composition ≻C1 ≻C2has the following intuitive reading: prefer according to ≻C2unless ≻C1is applicable.
Definition 3 • Consider two preference relations ≻C1and ≻C2defined over the same schema R. The prioritized composition ≻C12= ≻C1C2of ≻C1and ≻C2 is a preference relation over R defined as: • t1 ≻C1C2t2 ≡ t1 ≻C1 t2 ∨ (t1 ∼C1 t2 ∧ t1 ≻C2t2).
Winnow Operator If R is a relation schema and C a preference formula defining a preference relation ≻Cover R, then the winnow operator is written as ωC(R), and for every instance r of R: • ωC(r) = {t ∈ r | ¬∃t′ ∈ r, t′ ≻C t}. • A preference query is a relational algebra query containing at least one occurrence of the winnow operator.
Examples Example a: • Consider the relation Book(ISBN, Vendor, Price) • The preference relation ≻C1 from this example can be defined using the rational-order ipf C1: • (i, v, p) ≻C1 (i′, v′, p′) ≡ (i = i′) ∧ (p < p′) • The answer to the preference query ωC1(Book) provides for every book the information about the vendors offering the lowest price for that book. Note that the preference relation ≻C1 is a strict partial order. Example b: • I prefer Warsaw to any other city and prefer any city to Moscow. • This preference relation can be formulated as an equality ipf C3: • x ≻C3 y ≡ x = ′Warsaw′ ∧ y ≠ ′Warsaw′ ∨ x ≠ ′Moscow′ ∧ y = ′Moscow′
Constraint-Generating Dependencies • A single relation schema R is considered and all the integrity constraints are over that schema. • The set of all instances of R satisfying a set of integrity constraints F is denoted as Sat (F). • F entails an integrity constraint f, written F ⊢ f, if every instance satisfying F also satisfies f.
Definition 5 • A constraint-generating dependency (CGD) can be expressed a formula of the following form: • ∀t1. . . . ∀tk, R(t1) ∧ · · · ∧ R(tk) ∧ γ (t1, . . . tk) ⇒ γ′(t1, . . . tk) where γ(t1, . . . tk) and γ′(t1, . . . tk) are constraints. Such a dependency is called a k-dependency. • Functional dependencies (FDs) are 2-CGDs, because a functional dependency (FD) f ≡ X → Y , where X and Y are sets of attributes of R, can be written down as the following logic formula: • ∀t1.∀t2. R(t1) ∧ R(t2) ∧ t1[X] = t2[X] ⇒ t1[Y ] = t2[Y ].
Examples: • Consider the relation Empwith attributes Name, Salary, and Manager, with Name being the primary key. The constraint that no employee can have a salary greater than that of her manager is a CGD: • ∀n, s,m, s′,m′, Emp(n, s,m) ∧ Emp(n, s′,m′) ⇒ s ≤ s′. • Single-tuple constraints are a special case of CGDs. For example, the constraint that no employee can have a salary over $200000 is expressed as: • ∀n, s,m, Emp(n, s,m) ⇒ s ≤ 200000.
Definition 6 • Containment of preference relations and satisfaction of order axioms are defined as: • A preference relation ≻C1 over a schema R is contained in a preference relation ≻C2 over the same schema relative to a set of integrity constraints F, written as ≻C1⊆F ≻C2if • ∀r ∈ Sat (F). ∀t1, t2 ∈ r. t1 ≻C1t2 ⇒ t1 ≻C2t2. • Satisfaction of order axioms relative to integrity constraints is defined similarly – by relativizing the universal quantifiers in the axioms.
Definitions 7 and 8 • A preference ≻Cover a schema R is a strict partial order relative to a set of integrity constraints F if: • ∀r ∈ Sat (F). ∀t ∈ r. t ≠≻Ct • ∀r ∈ Sat (F). ∀t1, t2, t3 ∈ r. t1 ≻ Ct2 ∧ t2 ≻ Ct3 ⇒ t1 ≻ Ct3. • A preference ≻ Cover a schema R is a weak order relative to a set of integrity constraints F if it is a strict partial order relative to F and • ∀r ∈ Sat (F). ∀t1, t2, t3 ∈ r. t1 ≠≻ Ct2 ∧ t2 ≠≻ Ct3 ⇒ t1 ≠≻ Ct3.
Eliminating Redundant Occurrences of Winnow Operator • Two situations where winnow operator is considered to be eliminated: • Single application of winnow does not remove any tuples. • Interaction between two consecutive winnows is such that one can be eliminated. • Given a set of integrity constraints F, the operator ωC is redundant relative to a set of integrity constraints F if ∀r ∈ Sat(F), ωC(r) = r. • Assume F is a set of integrity constraints over a schema R. If ≻C1 and ≻C2 are preference relations over R such that ≻C1 and ≻C2 are strict partial orders relative to F and ≻C1 ⊆F ≻C2 , then for all instances r ∈ Sat(F): • ωC1(ωC2(r)) = ωC2(ωC1(r)) = ωC2(r).
Computing Winnow • Algorithms that have good blocking behavior and thus are capable of efficiently processing very large data sets. • Blocked Nested Loops: • Basic algorithm for evaluating winnow. • Needs a fixed amount of main memory and a temporary tables for tuples whose status cannot be determined. • Winnow for Weak Orders: • Requires single pass over input. • Uses additional memory to keep track of current bucket.
BNL: Algorithm • (1) clear the window W and the temporary table F; • (2) make r the input; • (3) repeat the following until the input is empty: • (a) for every tuple t in the input: • t is dominated by a tuple in W ⇒ ignore t, • T dominates some tuples in W ⇒ eliminate the dominated tuples and insert t into W, • if t and all tuples in W are mutually indifferent ⇒insert t into W (if there is room), otherwise add t to F; • (b) output the tuples from W that were added there when F was empty, • (c) make F the input, clear the temporary table.
Winnow for Weak Orders • (1) top := the first input tuple • (2) B := {top} • (3) for every subsequent tuple t in the input: • t is dominated by top ⇒ ignore t, • t dominates top ⇒ top := t; B := {t} • t and top are indifferent ⇒ B := B ∪ {t} • (4) output B
Propagation of Integrity Constraints • How do we know whether a specific CGD holds in a relation? If this is a database relation, then the CGD may be enforced by the DBMS or the application. • If the relation is computed, then we need to determine if the CGD is preserved in the expression defining the relation. • We already know that the CGD dC1holds in the result of the winnow ωC. • Winnow returns a subset of a given relation, thus it preserves all the CGDs holding in the relation. The following theorem characterizes all the dependencies holding in the result of winnow. • Assume F is a set of CGDs, f a CGD over a schema R, and ≻C an irreflexive preference relation over R. Then F ∪ dC1 ⊢ f iff for every r ∈ Sat (F), ωC(r) ∈ Sat (f).
Computational complexity • The computational issues involved in checking the semantic properties essential for the semantic optimization of preference queries are addressed. • It is shown that such properties can be formulated in terms of the entailment of CGDs. • The paper deals with k-dependencies for some fixed k ≥ 1. For example, for FDs k = 2.
Related Work • U. S. Chakravarthy, J. Grant, and J. Minker. Logic-Based Approach to Semantic Query Optimization. ACM Transactions on Database Systems, 15(2):162–207, 1990. • Q. Cheng, J. Gryz, F. Koo, C. Leung, L. Liu, X. Qian, and B. Schiefer. Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database. In International Conference on Very Large Data Bases (VLDB), 1999. • M. Maher and J. Wang. Optimizing Queries in Extended Relational Databases. In International Conference on Database and Expert Systems Applications (DEXA), pages 386–396, 2000. • D. E. Simmen, E. J. Shekita, and T. Malkemus. Fundamental Techniques for Order Optimization. In ACM SIGMOD International Conference on Management of Data, pages 57–67, 1996. • X. Zhang and Z. M. Ozsoyoglu. Implication and Referential Constraints: • A New Formal Reasoning. IEEE Transactions on Knowledge and Data Engineering, 9(6):894–910, 1997.
Conclusion The paper: • Presented several novel techniques for semantic optimization of preference queries, focusing on the winnow operator. • Characterized the necessary and sufficient conditions for the applications of those techniques in terms of the entailment of constraint-generating dependencies. • Was able to leverage some of the computational complexity results of other papers and proved several new complexity results: Theorems 9 and 10. Theorem 3 is also completely new.
Future Work Further work can address, for example, the following issues: • Identifying other semantic optimization techniques for preference queries. • Expanding the class of integrity constraints by considering, e.g., tuple-generating dependencies and referential integrity constraints. • Deriving further tractable cases of (relative) containment and satisfaction of order axioms, studying the preservation of general constraint-generating dependencies by relation algebra operators and expressions. • Identifying weaker but easier to check sufficient conditions for the application of our techniques.
Thank You! Questions?