480 likes | 636 Views
Topics covered. Databases QL Query containment An evaluation of QL. Subsystem1. Subsystem2. Subsystem3. A simple case of information integration. SQL Server. Global Schema. SQL. Open, Scan. A single table: T. A “local as view” (LAV) integration schema: T ´ Q 2 .
E N D
Topics covered • Databases • QL • Query containment • An evaluation of QL
Subsystem1 Subsystem2 Subsystem3 A simple case of information integration SQL Server Global Schema SQL Open, Scan A single table: T • A “local as view” (LAV) integration schema: T ´Q2. • User submits Q1. • Query optimizer must determine if a scan of T suffices. • True iff Q1 is equivalent to Q2.
In the beginning … Infinite countable sets of each of the following kinds of symbols: C = {C1, C2, … } (primitive concepts) A = {A1, A2, …} [ {B1, B2, …} (attributes) R = {R1, R2, … } (roles) Conventions: Attributes (resp. primitive concepts and roles) correspond to words in lower case or to positive integers (resp. words in upper case and words in mixed case).
For a particular database I hD, (¢)Ii where D is a countable possibly infinite domain, and where for each symbol (C)IµD (A)I : D!D (R)Iµ (D£D)
A e1 e2 R e1 e2 Partial databases e e2D e : {C1, … , Cn} e2 (Ci)I (A)I(e1) = e2 (e1, e2) 2 (R)I e1=e2 e1 e2 e1 e2 e1¹e2
name “John” EMP name age e1: {EMP} “John” “Mary” 33 33 age 33 e2: {EMP} name “Mary” Relational databases
name “John” {e1, e2, “John”, 33, “Mary”} µD e1: {EMP} age 33 e2: {EMP} name “Mary” Relational databases (cont’d) {e1, e2} µ (EMP)I (name)I(e1) = “John” (age)I(e1) = (age)I(e2) = 33 e1¹e2 “John” ? 33
Conjunctive QL with bag semantics† Positive QL with bag semantics‡ First order QL with bag semantics (semantics) Conjunctive QL Positive QL First order QL (expressiveness) Dialects of QL †[Khizder et al., 1999], ‡[Lui et al., 2002]
Conjunctive QL Q ::= D as A (quantification) | A1 = A2.R (unnest) | A1.Pf1 = A2.Pf2(selection) | elim A1, … , AnQ (projection) | true (null tuple) | from Q1, Q2 (natural join) | ( Q ) D ::= THING | C (basic description) Pf ::= id | A.Pf (path function)
Well formed queries: a(Q) • a(D as A) ´ {A} • a(A1 = A2.R) ´ {A1, A2} • a(A1.Pf1 = A2.Pf2) ´ {A1, A2} • a(elim A1, … , AnQ) ´ {A1, … , An} • a(true) ´; • a(from Q1, Q2) ´a(Q1) [a(Q2) Require {A1, … , An} µa(Q) for projection operators.
Tuples and bags A (duplicate) tuplet with attribute bindings for attributes {A1, … , An} over a database I = hD, (¢)Ii has the general form hA1 : e1, … , An : en , cnt : ii, where {e1, … , en} µD, “cnt” is a distinct symbol not used as an attribute, and i a positive integer. A set of duplicate tuples that contain the same attribute bindings is called a bag.
Operations on tuples • a(t) ´ set of attributes occurring in t • t@cnt ´ integer i such that “cnt : i” occurs in t • t@A´ element e2D such that “A : e” occurs in t; • defined only when A2a(t) • t[{A1, … , An}] ´ {A1 : t@A1, … , An : t@An}; • defined only when {A1, … , An} µa(t) • [t] ´t[a(t)]
Semantics The meaning of a query Q, denoted «Q¬, is a function that maps databases to bags. The behavior of this function on a particular database I = hD, (¢)Ii is defined as follows. «THING as A¬(I) ´ {hA : e, cnt : 1i : e2D } «C as A¬(I) ´ {hA : e, cnt : 1i : e2 (C)I} «A1 = A2.R¬(I) ´ {hA1 : e1, A2 : e2, cnt : 1i : (e2, e1) 2RI} «A1.Pf1 = A2.Pf2¬(I) ´ {hA1 : e1, A2 : e2, cnt : 1i : (Pf1)I(e1) = (Pf2)I(e2)} where (id)I´ {(e, e) : e2D} (A.Pf )I´ {(e1, e2) : (Pf )I((A)I(e1)) = e2}
Semantics (cont’d) «elim A1, … , An Q¬(I) ´; , if not well formed; otherwise {hA1 : t@A1, … , An : t@An , cnt : 1i : t2«Q¬(I)} «true¬(I) ´ {hcnt : 1i} «from Q1, Q2¬(I) ´ {t : a(t) = a(Q1) [a(Q2) Æ9t12«Q1¬(I), t22«Q2¬(I) : t@cnt = t1@cnt £t2@cnt Æt[a(t1)] = [t1] Æt[a(t2)] = [t2]}
Syntactic sugar A1. L .An.id ´A1. L .An select distinct A1, … , An Q´ elim A1, … , An Q select * Q´Q Q1 where Q2´ from Q1, Q2 Q1 and Q2´ from Q1, Q2 from ´ true from Q1, Q2, … , Qn´ from (from Q1, Q2, …) Qn
Examples The names of employees who have the same age as another employee with a given name. select distinct :p, name from EMP as e, ( select distinct :p, e1 from EMP as e1, EMP as e2 where e1.age = e2.age and e2.name = :p ) where e.name = name and e.id = e1.id
Method calls (more syntactic sugar) A1.Pf1.C(A2.Pf2, … , An-1.Pfn-1) = An.Pfn select distinct A1, … , An ´ from C as A where A.1 = A1.Pf1 and … and A.n = An.Pfn A1.Pf1.C(A2.Pf2, … , An-1.Pfn-1) as An ´A1.Pf1.C(A2.Pf2, … , An-1.Pfn-1) = An.id
Examples (cont’d) The names of employees who have an age double that of another employee. select distinct name from EMP as e, ( select distinct e1 from EMP as e1, EMP as e2 where e2.age.+(e2.age) = e1.age ) where e.name = name and e.id = e1.id
Conjunctive datalog (more syntactic sugar) C(A1, … , An) select distinct A1, … , An ´ from C as A where A.1 = A1.id and … and A.n = An.id (A1, … , Am) :- Q1, … , Qn. ´ select distinct A1, … , Am from Q1, … , Qn
Positive QL • Q ::= empty A1, … , An (empty set) • | Q1 union Q2 (union) • a(empty A1, …, An) ´ {A1, …, An} • a(Q1 union all Q2) ´a(Q1) • Require a(Q1) = a(Q2) in union operations.
Semantics «empty A1, … , An¬(I) ´; «Q1 union Q2¬(I) ´ {t : t@cnt =1 Æa(t) = a(Q1) Æa(t) = a(Q2) Æ ( ( 9 t12«Q1¬(I) : [t] = [t1] Æ:9t22«Q2¬(I) : [t] = [t2] ) Ç ( 9t22«Q2¬(I) : [t] = [t2] Æ:9 t12«Q1¬(I) : [t] = [t1] ) Ç ( 9 t12«Q1¬(I), t22«Q2¬(I) : [t] = [t1] Æ [t] = [t2] ) )}
First order QL Q ::= Q1 minus Q2 (difference) a(Q1 minus Q2) ´a(Q1) Require a(Q1) = a(Q2) in difference operations.
Semantics «Q1 minus Q2¬(I) ´ {t : t@cnt = 1 Æa(t) = a(Q1) Æa(t) = a(Q2) Æ ( 9t1 2 «Q1¬(I) : [t] = [t1] ) Æ ( :9t22«Q2¬(I) : [t] = [t2] )}
QL with duplicates Q ::= select A1, … , An Q (duplicate preserving projection) | Q1 union all Q2 (bag union) | Q1 minus all Q2 (bag difference)
Well formed queries (cont’d) • a(select A1, … , An Q) ´ {A1, … , An} • a(Q1 union all Q2 ) ´a(Q1) • a(Q1 minus all Q2) ´a(Q1) • Require a(Q1) = a(Q2) in bag union and bag difference operations, and that {A1, … , An} µa(Q) in (duplicate preserving) projection operations.
Semantics «select A1, … , An Q¬(I) ´; , if not well formed and representable†; otherwise {hA1 : t1@A1, … , An : t1@An , cnt : ni : t12«Q¬(I) Æn = å (t2@cnt) } t22«Q¬(I) : t2[{A1, … , An}] = t1 [{A1, … , An}] †The selection operation is representable on database I iff, for every t12«Q¬(I), |{t22«Q¬(I) : t2[{A1, … , An}] = t1 [{A1, … , An}]}| is finite.
Example A duplicate preserving projection operation that is not representable in any database with an infinite domain. select e1 from THING as e1, THING as e2 Observation: All well-formed duplicate preserving projection operations on databases with finite domains are representable.
Semantics (cont’d) «Q1 union all Q2¬(I) ´; , if not well formed; otherwise {t2«Q1¬(I) : :9t22«Q2¬(I) : [t] = [t2]} [ {t2«Q2¬(I) : :9t12«Q1¬(I) : [t] = [t1]} [ {t : 9t12«Q1¬(I), t22«Q2¬(I) : [t] = [t1]} Æ [t] = [t2] Æt@cnt = t1@cnt + t2@cnt} «Q1 minus all Q2¬(I) ´; , if not well formed; otherwise {t2«Q1¬(I) : :9t22«Q2¬(I) : [t] = [t2]} [ {t : 9t12«Q1¬(I), t22«Q2¬(I) : [t] = [t1]} Æ [t] = [t2] Æt@cnt = t1@cnt -t2@cnt Æt1@cnt >t2@cnt }
at, =, elim, true, from, select at, =, elim, true, from, select, empty, union all at, =, elim, true, from, select, empty, union all, minus all (bag semantics) at, =, elim, true, from at, =, elim, true, from, empty, union at, =, elim, true, from, empty, union, minus (set semantics) (positive) (first order) (conjunctive) Summary
Query contexts An expression Q[] in the language QL enriched by an additional terminal symbol [] is called a query context. For a query Q12QL, the expression Q1[Q2] denotes the syntactical substitution of Q2 for []. Q2 is compatible with Q1 if Q1[Q2] 2QL. For example, Q2 is compatible with Q1 in the following. Q2: EMP as e where e.name = :p Q1: select distinct :p, d from DEPT as d, [] where d = e.dept
The query equivalence problem Q1is equivalent toQ2for databaseI, written I² (Q1´Q2 ), if «Q1¬(I) = «Q2¬(I). A query equivalence dependencyEhas the form (Q1´Q2). E = (Q1´Q2) is an axiom if, for any database I, I² (Q1´Q2 ). A query equivalence problem for a given set of query equivalence dependencies is to determine if a given member of the set is an axiom.
Some axioms Question: Is it true that any E with the following form is an axiom? (elim A1, … , AmQ1)[elim B1, … , BnQ2] ´ elim A1, … , AmQ1 [Q2] Answer: No. However, any such E is an axiom if any attribute in (a(Q2) – {B1, … , Bn}) does not occur in query context (elim A1, … , AmQ1 []).
Excluding variable reuse in QL Q has an occurrence ofvariable reuse if there is a query context Q1[] and a query of the form elim A1, … , AnQ2 or of the form select A1, … , AnQ2 such that Q = Q1[Q2] and there exists A in (a(Q2) – {A1, … , An}) that also occurs in Q1[]. Observation: For any Q1, there exists an equivalent class of query Q2 that has no occurrence of variable reuse.
The query containment problem Q1is contained inQ2fordatabaseI, written I² (Q1vQ2), if, for any tuple t1 in «Q1¬(I), there exists t2 in «Q2¬(I) such that [t1] = [t2] and t1@cnt t2@cnt. A query containment dependencyC has the form (Q1vQ2). C = (Q1vQ2) is an axiom if, for any database I, I² (Q1vQ2). A query containment problem for a given set of query containment dependencies is to determine if a given member of the set is an axiom.
Equivalence and containment Observation: Equivalence reduces to containment. Q1´Q2 iff Q1vQ2 and Q2vQ1 Observation: Containment reduces to equivalence in first order QL. Q1vQ2 iff (Q1 minus all Q2) ´ empty a(Q1)
Some complexity results Theorem: The query equivalence and containment problems for conjunctive QL is NP-complete.† †Chandra, A. K. and P. M. Merlin. Optimal implementation of conjunctive queries in relational databases. Proc. Ninth Annual ACM Symposium on the Theory of Computing, pp. 77–90, 1977.
A decision procedure • Theorem: The following procedure decides if C = (Q1vQ2) is an • axiom for conjunctive QL.† • Freeze the body of Q1 by creating a partial database consisting of individuals that include its variables. • If the tuple • hA1 : A1 , … , An : An , cnt : 1i • occurs in «Q2¬(I), where a(Q1) = {A1, … , An}, then return true; otherwise return false.‡ • †Derived from [Ullman, 1999]. • ‡Use forced semantics for selection operations.
R A2 A1 A2 Am … A1 B2 Bn … B1 Obtaining a partial database from Q A THING as A A1 = A2.R C as A A : {C} A1.A2. L .Am = B1.B2. L .Bn
A A u : L1 v : L3 u : L1 v : L3 A A w : L2 x : L4 w : L2 x : L4 A A w : L w : L u : L1 u : L1 A A v : L2 v : L2 Derived partial databases (cont’d)
n1 : L1 n2 : L2 n1 : L1[L2 n2 : L1[L2 n1 : L1 n2 : L2 n3 : L3 n1 : L1 n2 : L2 n3 : L3 Deriving partial databases (cont’d)
Evaluating selections on partial databases Note that selection conditions can navigate missing attribute values. In such cases, assume a forced semantics. In particular, two nodes n1 and n2 satisfy a selection condition iff the condition has the form n1.Pf1.Pf = n2.Pf2.Pf where (Pf1)I(n1) and (Pf2)I(n2) are defined and lead to nodes connected by an equality arc.
Some complexity results (cont’d) Theorem: The query equivalence problem for conjunctive QL with bag semantics is NP-complete. Observation: The complexity of the query containment problem for conjunctive QL with bag semantics remains open at this time. Example:† In conjunctive QL with bag semantics, the query containment dependency Q1vQ2 is an axiom, where Q1 and Q2 have the respective definitions select x, z select x, z from P as x, R as z from P as x, R as z where x = u.Q and z = v.Q where y = u.Q and y = v.Q † [Chaudhuri and Vardi, 1993]
The query membership problem A database schema, denoted T, consists of a finite set {C1, … , Cn} of query containment dependencies. C is an axiom relative to database schemaT = {C1, … , Cn}, written T²C, if, for any database I, I²C if I²Ci for each i. A query membership problem for a given set of query containment dependencies is to determine if a given member of the set is an axiom relative to a given database schema also consisting of members of the set.
More complexity results Theorem: The query membership problem for conjunctive QL is undecidable. Theorem: The query membership problem for first order QL is equivalent to the query containment problem for first order QL. Proof: Assignment.
Evaluating QL • Defining database schema • Expressing access plans • Fine grained APIs: record addresses • Protocols • Safety • Binding patterns • Adequacy: SQL, OQL, XQuery
Modeling generalization taxonomies • Consider a simple object-oriented schema language consisting of sentences • of the following form.† • class C {A1 : ref C1, … , Am : ref Cm} [isa C1 , … , Cn]; • Assignment: Encode a fixed collection of such sentences as a database • schema in conjunctive QL. Your encoding should be as compact as possible • and should enable the following questions to be expressed as query • containment dependencies over your schema. • Is C a defined class? • Is attribute A defined on class C? • Can an object reside in both class C1 and class C2? • †Assume that any object in a database was created with respect to a single • class.
Modeling pipelined query access plans • (syntax) (defn ofb(¢)) • (parameter) Q ::=(PARAM as A) {A} • (index scan) | (from C as A, A.1 = B1, … , A.n = Bn) {A} • (nested loops) | (from Q1, Q2) b(Q1) [b(Q2) • (noop) | (select A1, … , An Q) b(Q) Å {A1, … , An} • (record field access) | (A1 = A2.B) {A1} • (comparison) | (A1 = A2) ; • (catenation) | (Q1 union all Q2) b(Q1) Åb(Q2) • (cut) | (elim A1, … , An Q) ; • | … • Require: • (a(Q2) – b(Q2)) µa(Q1) for nested loops, and • a(Q) = b(Q) for top-level queries.
Alternative semantics • Require richer models theories for • sort operations, and • named cuts.