1 / 48

Topics covered

Topics covered. Databases QL Query containment An evaluation of QL. Subsystem1. Subsystem2. Subsystem3. A simple case of information integration. SQL Server. Global Schema. SQL. Open, Scan. A single table: T. A “local as view” (LAV) integration schema: T ´ Q 2 .

yeva
Download Presentation

Topics covered

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topics covered • Databases • QL • Query containment • An evaluation of QL

  2. Subsystem1 Subsystem2 Subsystem3 A simple case of information integration SQL Server Global Schema SQL Open, Scan A single table: T • A “local as view” (LAV) integration schema: T ´Q2. • User submits Q1. • Query optimizer must determine if a scan of T suffices. • True iff Q1 is equivalent to Q2.

  3. In the beginning … Infinite countable sets of each of the following kinds of symbols: C = {C1, C2, … } (primitive concepts) A = {A1, A2, …} [ {B1, B2, …} (attributes) R = {R1, R2, … } (roles) Conventions: Attributes (resp. primitive concepts and roles) correspond to words in lower case or to positive integers (resp. words in upper case and words in mixed case).

  4. For a particular database I hD, (¢)Ii where D is a countable possibly infinite domain, and where for each symbol (C)IµD (A)I : D!D (R)Iµ (D£D)

  5. A e1 e2 R e1 e2 Partial databases e e2D e : {C1, … , Cn} e2 (Ci)I (A)I(e1) = e2 (e1, e2) 2 (R)I e1=e2 e1 e2 e1 e2 e1¹e2

  6. name “John” EMP name age e1: {EMP} “John” “Mary” 33 33 age 33 e2: {EMP} name “Mary” Relational databases

  7. name “John” {e1, e2, “John”, 33, “Mary”} µD e1: {EMP} age 33 e2: {EMP} name “Mary” Relational databases (cont’d) {e1, e2} µ (EMP)I (name)I(e1) = “John” (age)I(e1) = (age)I(e2) = 33 e1¹e2 “John” ? 33

  8. Conjunctive QL with bag semantics† Positive QL with bag semantics‡ First order QL with bag semantics (semantics) Conjunctive QL Positive QL First order QL (expressiveness) Dialects of QL †[Khizder et al., 1999], ‡[Lui et al., 2002]

  9. Conjunctive QL Q ::= D as A (quantification) | A1 = A2.R (unnest) | A1.Pf1 = A2.Pf2(selection) | elim A1, … , AnQ (projection) | true (null tuple) | from Q1, Q2 (natural join) | ( Q ) D ::= THING | C (basic description) Pf ::= id | A.Pf (path function)

  10. Well formed queries: a(Q) • a(D as A) ´ {A} • a(A1 = A2.R) ´ {A1, A2} • a(A1.Pf1 = A2.Pf2) ´ {A1, A2} • a(elim A1, … , AnQ) ´ {A1, … , An} • a(true) ´; • a(from Q1, Q2) ´a(Q1) [a(Q2) Require {A1, … , An} µa(Q) for projection operators.

  11. Tuples and bags A (duplicate) tuplet with attribute bindings for attributes {A1, … , An} over a database I = hD, (¢)Ii has the general form hA1 : e1, … , An : en , cnt : ii, where {e1, … , en} µD, “cnt” is a distinct symbol not used as an attribute, and i a positive integer. A set of duplicate tuples that contain the same attribute bindings is called a bag.

  12. Operations on tuples • a(t) ´ set of attributes occurring in t • t@cnt ´ integer i such that “cnt : i” occurs in t • t@A´ element e2D such that “A : e” occurs in t; • defined only when A2a(t) • t[{A1, … , An}] ´ {A1 : t@A1, … , An : t@An}; • defined only when {A1, … , An} µa(t) • [t] ´t[a(t)]

  13. Semantics The meaning of a query Q, denoted «Q¬, is a function that maps databases to bags. The behavior of this function on a particular database I = hD, (¢)Ii is defined as follows. «THING as A¬(I) ´ {hA : e, cnt : 1i : e2D } «C as A¬(I) ´ {hA : e, cnt : 1i : e2 (C)I} «A1 = A2.R¬(I) ´ {hA1 : e1, A2 : e2, cnt : 1i : (e2, e1) 2RI} «A1.Pf1 = A2.Pf2¬(I) ´ {hA1 : e1, A2 : e2, cnt : 1i : (Pf1)I(e1) = (Pf2)I(e2)} where (id)I´ {(e, e) : e2D} (A.Pf )I´ {(e1, e2) : (Pf )I((A)I(e1)) = e2}

  14. Semantics (cont’d) «elim A1, … , An Q¬(I) ´; , if not well formed; otherwise {hA1 : t@A1, … , An : t@An , cnt : 1i : t2«Q¬(I)} «true¬(I) ´ {hcnt : 1i} «from Q1, Q2¬(I) ´ {t : a(t) = a(Q1) [a(Q2) Æ9t12«Q1¬(I), t22«Q2¬(I) : t@cnt = t1@cnt £t2@cnt Æt[a(t1)] = [t1] Æt[a(t2)] = [t2]}

  15. Syntactic sugar A1. L .An.id ´A1. L .An select distinct A1, … , An Q´ elim A1, … , An Q select * Q´Q Q1 where Q2´ from Q1, Q2 Q1 and Q2´ from Q1, Q2 from ´ true from Q1, Q2, … , Qn´ from (from Q1, Q2, …) Qn

  16. Examples The names of employees who have the same age as another employee with a given name. select distinct :p, name from EMP as e, ( select distinct :p, e1 from EMP as e1, EMP as e2 where e1.age = e2.age and e2.name = :p ) where e.name = name and e.id = e1.id

  17. Method calls (more syntactic sugar) A1.Pf1.C(A2.Pf2, … , An-1.Pfn-1) = An.Pfn select distinct A1, … , An ´ from C as A where A.1 = A1.Pf1 and … and A.n = An.Pfn A1.Pf1.C(A2.Pf2, … , An-1.Pfn-1) as An ´A1.Pf1.C(A2.Pf2, … , An-1.Pfn-1) = An.id

  18. Examples (cont’d) The names of employees who have an age double that of another employee. select distinct name from EMP as e, ( select distinct e1 from EMP as e1, EMP as e2 where e2.age.+(e2.age) = e1.age ) where e.name = name and e.id = e1.id

  19. Conjunctive datalog (more syntactic sugar) C(A1, … , An) select distinct A1, … , An ´ from C as A where A.1 = A1.id and … and A.n = An.id (A1, … , Am) :- Q1, … , Qn. ´ select distinct A1, … , Am from Q1, … , Qn

  20. Positive QL • Q ::= empty A1, … , An (empty set) • | Q1 union Q2 (union) • a(empty A1, …, An) ´ {A1, …, An} • a(Q1 union all Q2) ´a(Q1) • Require a(Q1) = a(Q2) in union operations.

  21. Semantics «empty A1, … , An¬(I) ´; «Q1 union Q2¬(I) ´ {t : t@cnt =1 Æa(t) = a(Q1) Æa(t) = a(Q2) Æ ( ( 9 t12«Q1¬(I) : [t] = [t1] Æ:9t22«Q2¬(I) : [t] = [t2] ) Ç ( 9t22«Q2¬(I) : [t] = [t2] Æ:9 t12«Q1¬(I) : [t] = [t1] ) Ç ( 9 t12«Q1¬(I), t22«Q2¬(I) : [t] = [t1] Æ [t] = [t2] ) )}

  22. First order QL Q ::= Q1 minus Q2 (difference) a(Q1 minus Q2) ´a(Q1) Require a(Q1) = a(Q2) in difference operations.

  23. Semantics «Q1 minus Q2¬(I) ´ {t : t@cnt = 1 Æa(t) = a(Q1) Æa(t) = a(Q2) Æ ( 9t1 2 «Q1¬(I) : [t] = [t1] ) Æ ( :9t22«Q2¬(I) : [t] = [t2] )}

  24. QL with duplicates Q ::= select A1, … , An Q (duplicate preserving projection) | Q1 union all Q2 (bag union) | Q1 minus all Q2 (bag difference)

  25. Well formed queries (cont’d) • a(select A1, … , An Q) ´ {A1, … , An} • a(Q1 union all Q2 ) ´a(Q1) • a(Q1 minus all Q2) ´a(Q1) • Require a(Q1) = a(Q2) in bag union and bag difference operations, and that {A1, … , An} µa(Q) in (duplicate preserving) projection operations.

  26. Semantics «select A1, … , An Q¬(I) ´; , if not well formed and representable†; otherwise {hA1 : t1@A1, … , An : t1@An , cnt : ni : t12«Q¬(I) Æn = å (t2@cnt) } t22«Q¬(I) : t2[{A1, … , An}] = t1 [{A1, … , An}] †The selection operation is representable on database I iff, for every t12«Q¬(I), |{t22«Q¬(I) : t2[{A1, … , An}] = t1 [{A1, … , An}]}| is finite.

  27. Example A duplicate preserving projection operation that is not representable in any database with an infinite domain. select e1 from THING as e1, THING as e2 Observation: All well-formed duplicate preserving projection operations on databases with finite domains are representable.

  28. Semantics (cont’d) «Q1 union all Q2¬(I) ´; , if not well formed; otherwise {t2«Q1¬(I) : :9t22«Q2¬(I) : [t] = [t2]} [ {t2«Q2¬(I) : :9t12«Q1¬(I) : [t] = [t1]} [ {t : 9t12«Q1¬(I), t22«Q2¬(I) : [t] = [t1]} Æ [t] = [t2] Æt@cnt = t1@cnt + t2@cnt} «Q1 minus all Q2¬(I) ´; , if not well formed; otherwise {t2«Q1¬(I) : :9t22«Q2¬(I) : [t] = [t2]} [ {t : 9t12«Q1¬(I), t22«Q2¬(I) : [t] = [t1]} Æ [t] = [t2] Æt@cnt = t1@cnt -t2@cnt Æt1@cnt >t2@cnt }

  29. at, =, elim, true, from, select at, =, elim, true, from, select, empty, union all at, =, elim, true, from, select, empty, union all, minus all (bag semantics) at, =, elim, true, from at, =, elim, true, from, empty, union at, =, elim, true, from, empty, union, minus (set semantics) (positive) (first order) (conjunctive) Summary

  30. Query contexts An expression Q[] in the language QL enriched by an additional terminal symbol [] is called a query context. For a query Q12QL, the expression Q1[Q2] denotes the syntactical substitution of Q2 for []. Q2 is compatible with Q1 if Q1[Q2] 2QL. For example, Q2 is compatible with Q1 in the following. Q2: EMP as e where e.name = :p Q1: select distinct :p, d from DEPT as d, [] where d = e.dept

  31. The query equivalence problem Q1is equivalent toQ2for databaseI, written I² (Q1´Q2 ), if «Q1¬(I) = «Q2¬(I). A query equivalence dependencyEhas the form (Q1´Q2). E = (Q1´Q2) is an axiom if, for any database I, I² (Q1´Q2 ). A query equivalence problem for a given set of query equivalence dependencies is to determine if a given member of the set is an axiom.

  32. Some axioms Question: Is it true that any E with the following form is an axiom? (elim A1, … , AmQ1)[elim B1, … , BnQ2] ´ elim A1, … , AmQ1 [Q2] Answer: No. However, any such E is an axiom if any attribute in (a(Q2) – {B1, … , Bn}) does not occur in query context (elim A1, … , AmQ1 []).

  33. Excluding variable reuse in QL Q has an occurrence ofvariable reuse if there is a query context Q1[] and a query of the form elim A1, … , AnQ2 or of the form select A1, … , AnQ2 such that Q = Q1[Q2] and there exists A in (a(Q2) – {A1, … , An}) that also occurs in Q1[]. Observation: For any Q1, there exists an equivalent class of query Q2 that has no occurrence of variable reuse.

  34. The query containment problem Q1is contained inQ2fordatabaseI, written I² (Q1vQ2), if, for any tuple t1 in «Q1¬(I), there exists t2 in «Q2¬(I) such that [t1] = [t2] and t1@cnt  t2@cnt. A query containment dependencyC has the form (Q1vQ2). C = (Q1vQ2) is an axiom if, for any database I, I² (Q1vQ2). A query containment problem for a given set of query containment dependencies is to determine if a given member of the set is an axiom.

  35. Equivalence and containment Observation: Equivalence reduces to containment. Q1´Q2 iff Q1vQ2 and Q2vQ1 Observation: Containment reduces to equivalence in first order QL. Q1vQ2 iff (Q1 minus all Q2) ´ empty a(Q1)

  36. Some complexity results Theorem: The query equivalence and containment problems for conjunctive QL is NP-complete.† †Chandra, A. K. and P. M. Merlin. Optimal implementation of conjunctive queries in relational databases. Proc. Ninth Annual ACM Symposium on the Theory of Computing, pp. 77–90, 1977.

  37. A decision procedure • Theorem: The following procedure decides if C = (Q1vQ2) is an • axiom for conjunctive QL.† • Freeze the body of Q1 by creating a partial database consisting of individuals that include its variables. • If the tuple • hA1 : A1 , … , An : An , cnt : 1i • occurs in «Q2¬(I), where a(Q1) = {A1, … , An}, then return true; otherwise return false.‡ • †Derived from [Ullman, 1999]. • ‡Use forced semantics for selection operations.

  38. R A2 A1 A2 Am … A1 B2 Bn … B1 Obtaining a partial database from Q A THING as A A1 = A2.R C as A A : {C} A1.A2. L .Am = B1.B2. L .Bn

  39. A A u : L1 v : L3 u : L1 v : L3 A A w : L2 x : L4 w : L2 x : L4 A A w : L w : L u : L1 u : L1 A A v : L2 v : L2 Derived partial databases (cont’d)

  40. n1 : L1 n2 : L2 n1 : L1[L2 n2 : L1[L2 n1 : L1 n2 : L2 n3 : L3 n1 : L1 n2 : L2 n3 : L3 Deriving partial databases (cont’d)

  41. Evaluating selections on partial databases Note that selection conditions can navigate missing attribute values. In such cases, assume a forced semantics. In particular, two nodes n1 and n2 satisfy a selection condition iff the condition has the form n1.Pf1.Pf = n2.Pf2.Pf where (Pf1)I(n1) and (Pf2)I(n2) are defined and lead to nodes connected by an equality arc.

  42. Some complexity results (cont’d) Theorem: The query equivalence problem for conjunctive QL with bag semantics is NP-complete. Observation: The complexity of the query containment problem for conjunctive QL with bag semantics remains open at this time. Example:† In conjunctive QL with bag semantics, the query containment dependency Q1vQ2 is an axiom, where Q1 and Q2 have the respective definitions select x, z select x, z from P as x, R as z from P as x, R as z where x = u.Q and z = v.Q where y = u.Q and y = v.Q † [Chaudhuri and Vardi, 1993]

  43. The query membership problem A database schema, denoted T, consists of a finite set {C1, … , Cn} of query containment dependencies. C is an axiom relative to database schemaT = {C1, … , Cn}, written T²C, if, for any database I, I²C if I²Ci for each i. A query membership problem for a given set of query containment dependencies is to determine if a given member of the set is an axiom relative to a given database schema also consisting of members of the set.

  44. More complexity results Theorem: The query membership problem for conjunctive QL is undecidable. Theorem: The query membership problem for first order QL is equivalent to the query containment problem for first order QL. Proof: Assignment.

  45. Evaluating QL • Defining database schema • Expressing access plans • Fine grained APIs: record addresses • Protocols • Safety • Binding patterns • Adequacy: SQL, OQL, XQuery

  46. Modeling generalization taxonomies • Consider a simple object-oriented schema language consisting of sentences • of the following form.† • class C {A1 : ref C1, … , Am : ref Cm} [isa C1 , … , Cn]; • Assignment: Encode a fixed collection of such sentences as a database • schema in conjunctive QL. Your encoding should be as compact as possible • and should enable the following questions to be expressed as query • containment dependencies over your schema. • Is C a defined class? • Is attribute A defined on class C? • Can an object reside in both class C1 and class C2? • †Assume that any object in a database was created with respect to a single • class.

  47. Modeling pipelined query access plans • (syntax) (defn ofb(¢)) • (parameter) Q ::=(PARAM as A) {A} • (index scan) | (from C as A, A.1 = B1, … , A.n = Bn) {A} • (nested loops) | (from Q1, Q2) b(Q1) [b(Q2) • (noop) | (select A1, … , An Q) b(Q) Å {A1, … , An} • (record field access) | (A1 = A2.B) {A1} • (comparison) | (A1 = A2) ; • (catenation) | (Q1 union all Q2) b(Q1) Åb(Q2) • (cut) | (elim A1, … , An Q) ; • | … • Require: • (a(Q2) – b(Q2)) µa(Q1) for nested loops, and • a(Q) = b(Q) for top-level queries.

  48. Alternative semantics • Require richer models theories for • sort operations, and • named cuts.

More Related