1 / 47

A Brief History of DB Theory

A Brief History of DB Theory. Functional dependencies --- Normalization More dependencies: multivalued, general Universal relations Acyclic hypergraphs Logical query languages: Datalog. Functional Dependencies.

shania
Download Presentation

A Brief History of DB Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Brief History of DB Theory • Functional dependencies --- Normalization • More dependencies: multivalued, general • Universal relations • Acyclic hypergraphs • Logical query languages: Datalog

  2. Functional Dependencies • Setting: A relation = table with column headers (schema ) and a finite number of rows (tuples ). A B C 0 1 2 0 3 4

  3. FD X ->Y • A statement about which instances (finite sets of tuples) are “legal” for a given relation. • If two tuples agree in all the attributes of set X , then then must also agree in all attributes of set Y. • Common case: X is the key of the relation, Y is the other attributes.

  4. Issue #1: Inference • Armstrong’s Axioms • X ->Y if Y is a subset of X • X ->Y implies XZ ->YZ (XZ , etc. = “union”) • X ->Y and YZ ->W implies XZ ->W • Closure test (Bernstein): given FD’s and a possible consequent X ->Y , use the FD’s to see what X determines, and then see if Y is contained therein.

  5. Issue #2: Redundancy • Certain combinations of FD’s lead to redundancy and other “anomalies.” • Example: B ->C is the only FD: A B C 0 1 2 3 1 ? • The FD lets us predict the value of ?.

  6. Issue #3: Normalization • Eliminate redundancy by splitting schemas. • A FD X ->Y allows us to split schema XYZ into XY and XZ . • Without the FD, the decomposition would lack a lossless join : the ability to reconstruct the original XYZ relation from the decomposed relations.

  7. How I Met Moshe & Learned Database Theory Tsichritzis Bernstein Beeri Ullman Vardi

  8. Why I Like Working With Vardi • Taste, selection of issues. • Best inventor of constructions. • Name comes after mine in alphabet.

  9. Lossless Joins • When relations are decomposed, do the pieces allow reconstruction of the original? • Only way: join the projected relations. • You always get back what you started with. • Bad case is when you get more.

  10. Example A B C 0 1 2 3 1 4 A B A B C B C 0 1 0 1 4 1 2 3 1 3 1 2 1 4 0 1 2 3 1 4

  11. More Kinds of Dependencies • In essence, a dependency is any predicate that tells whether a given set of tuples is OK for a given schema. • The language chosen determines what constraints can be expressed and what we can decide about relations. • FD’s are just one, simple example

  12. Multivalued Dependencies • These occur when a relation tries to connect one class of objects to independent sets from two other classes.. • Notation: X ->->Y means: • If two tuples agree on X , then we may swap the Y components and get two tuples that are also in the relation.

  13. Example • EmpID ->-> Phone EmpID Addr Phone Project If: 123 a1 p1 j1 123 a2 p2 j2 Then: 123 a1 p2 j1 123 a2 p1 j2

  14. MVD Origins • Independent work of Delobel, Fagin, Zaniolo • Inference of MVD’s + FD’s: Beeri, Fagin, and Howard.

  15. Generalized Dependencies • Equality-generating dependencies: • If tuples with this pattern of equal symbols appear in a relation instance, then certain symbols must also be equal. • Generalizes FD’s. • Tuple-generating dependencies: • If tuples with this pattern of equal symbols appear in a relation instance, then a certain tuple must also appear. • Generalizes MVD’s.

  16. Example: EGD • FD A ->B in schema ABC : A B C a b1 c1 (Hypotheses) a b2 c2 b1 = b2 (Conclusion)

  17. Example: TGD • MVD A ->->B in schema ABC : A B C a b1 c1 (Hypotheses) a b2 c2 a b1 c2 (Conclusion)

  18. Full Versus Embedded GD’s • Full GD’s have a conclusion with only symbols that appear in the hypotheses. • Embedded GD’s may have new symbols in the conclusion. • Existentially quantified. • Embedded EGD makes no sense, but embedded TGD’s are quite interesting.

  19. The Chase for Inferring GD’s • Test whether a GD G follows from given GD’s. • Start with R = the hypotheses of G . • Apply GD H by mapping the hypotheses of H to some rows of R . • If so, infer the (mapped) conclusion of H --- equate two symbols of R or add a tuple to R . • If you eventually infer the conclusion of G , then G follows, else it does not.

  20. Example: If A ->B Then A ->->B • R has tuples (a,b1,c1) and (a,b2,c2). Does it have (a,b1,c2)? • Apply A ->B . • It’s hypotheses are the same as the two tuples of R , so surely they map. • Lets us conclude b1=b2. • Thus, since R has (a,b2,c2), It surely has (a,b1,c2).

  21. Decision Properties of GD’s • If all dependencies are full, no new symbols ever appear, so the chase must terminate. • Either we prove the desired conclusion, or R becomes a relation that provides a counterexample --- it satisfies all the given GD’s, but not the one we were trying to infer.

  22. Undecidability of Embedded GD’s • First undecidability results involved the inference of untyped GD’s (symbols may appear in several columns). • Beeri and Vardi • Chandra, Lewis, and Makoswky • Tough result is undecidability even when GD’s are typed (one column per symbol). • Vardi • Gurevich and Lewis

  23. History of GD’s • Similar ideas were developed independently by several people: • Yannakakis and Papadimitriou • Beeri and Vardi • Fagin • Sadri and Ullman

  24. The Universal Relation • Idea: attributes carry information regardless of how they are placed in relation schemas. • One of the cool things about Moshe Vardi is the collection he keeps of re-inventions of this universal relation concept. • Typically touted by some HCI type as a “wonderful new interface to databases.”

  25. UR Query Systems Query UR Stored Relations

  26. Example • Stored relations: ES(Emp,Sal), ED(Emp,Dept), DM(Dept,Mgr) • Universal relation: U(Emp,Sal, Dept,Mgr) • UR query: SELECT Mgr WHERE Emp=a • Translated to: SELECT Mgr FROM ED, DM WHERE Emp=a AND ED.Dept=DM.Dept • U != join(ES, ED, DM); empty ES proves it.

  27. Theory of Universal Relations • Some were quite upset by the UR idea. • Objection: the informal or ad-hoc way queries were translated from UR to stored relations. • Codd: “Neither the collection of all base relations nor the collection of all views should be cast by the DBMS in the form of a `universal relation’ (in the Stanford University sense) [Vardi, 1988].”

  28. Some Early Approaches to the UR • Selected joins of stored relations. • Representative instance = pad stored relations with nulls in missing attributes; then chase with given dependencies. • Contributions by Vassiliou, Honeyman, Mendelzon, Sagiv, Yannakakis. • Maximal objects = union of joins within “acyclic hypergraphs.” • Maier, Ullman

  29. Window Functions • Two stage process: 1. If query involves set of attributes X , compute the window function [X ] = some relation with schema X derived from the UR. 2. Apply the query to [X ]. • Different UR definitions give different window functions, e.g., join-based, rep.-instance, maximal-object. • From Maier, Ullman, Vardi [1984].

  30. Acyclic Hypergraphs • Several different applications led to the identification of a class of database schemas (collection of relation schemas) with useful properties. • Sensible connections among stored relations in maximal-object theory of UR. • Joins of many relations in time polynomial in the size of the input and output (Yannakakis). • Etc., etc.

  31. Hypergraphs • Nodes (typically attributes) • (Hyper)edges = sets of any number of nodes. A B C D E F

  32. GYO Reduction • Graham, Yu-Ozsuyoglu, independently defined acyclic hypergraphs thusly: • Two transformations: 1. Delete a node that is in only one hyperedge. 2. Delete a hyperedge contained in another. • Hypergraph is acyclic iff the result is a single, empty edge. • Limit is unique for any hypergraph.

  33. Example • GYO steps: delete C; delete D; delete {B}; delete F; delete {B,E}; delete A; delete B; delete E; A B C D E F

  34. History • Acyclic hypergraph idea from Bernstein and Goodman. • But the definition is rather different. • Hypergraph version from Fagin, Mendelzon, Beeri, Maier, others. • And don’t forget the GYO folks. • Fagin defined several related-but-distinct concepts of “acyclic.” • This one is “alpha-acyclic.”

  35. Logical Query Languages • Collections of Horn-clause rules without function symbols behave almost like SQL, but can do recursion. • Conjunctive queries = single Horn-clause rules w/o function symbols have decidable containment (Chandra and Merlin).

  36. Conjunctive Queries • Example: path(X,Y) :- edge(X,Z) & edge(Z,Y) • Atoms in the body (hypothesis) refer to stored relations (EDB or “Extensional Database”). • Atom in head (conclusion) is the result. • Q1 is contained in Q2 if for every database D, Q1(D ) is a subset of Q2(D ).

  37. Containment Mappings • Mapping from the variables of Q2 to the variables of Q1 that: 1. Turns the head of Q2 into the head of Q1. 2. Turns every atom in the body of Q2 into some atom in the body of Q1. • Containment mapping exists iff Q1 is contained in Q2.

  38. Example Q1: answer(X,Y) :- e(X,Y) & e(Y,X) Q2: answer(X,Y) :- e(X,W) & e(W,Z) & e(Z,Y)

  39. Datalog Programs • Collection of conjunctive queries. • Some predicates are EDB (stored). • Other predicates are IDB (intensional database; defined by the CQ’s only). • One IDB predicate is the answer = least fixedpoint of the CQ’s.

  40. Example path(X,Y) :- edge(X,Y) path(X,Y) :- path(X,Z) & edge(Z,Y) • Value of path is all pairs of nodes such that there is a path from the first to the second according to EDB predicate edge .

  41. Optimization of Datalog Programs • Problem: often the query asks for only a fraction of the answer. • e.g., find path(0,Y) for fixed node 0. • Bottom-up methods (“seminaive”) evaluate the whole least-fixedpoint, throw most away. • Top-down (goal seeking) can get stuck in left-recursion.

  42. Linear-Recursion Methods • Recursion is linear if at most one IDB atom in the body. • Henschen-Naqvi. • Magic-sets for linear (Bancilhon, Maier, Sagiv, Ullman). • Left- and right-linear special cases (Naughton). • Conversion of nonlinear to linear (Ramakrishnan, Sagiv, Ullman, Vardi).

  43. Magic Sets • Several similar techniques for rewriting or executing Datalog programs in a way that avoids generation of useless facts. • Rohmer, Lescoeur, and Kerisit (earliest exposition). • Beeri and Ramakrishnan (reordering of atoms). • Sacca and Zaniolo (simplification of rules). • Dietrich and Warren (tabulation). • Vielle (query-subquery).

  44. Bounded Recursion • Sometimes a recursive Datalog program is equivalent to a nonrecursive program. • Sufficient conditions (Naughton, Ioannidis, Naughton and Sagiv). • Undecidability (Gaifman, Mairson, Sagiv, and Vardi). • Decidable cases (Vardi, Cosmodakis, Gaifman, Kanellakis, and Vardi)

  45. Containment for Datalog • Undecidable if one Datalog program is contained in another (Shmueli). • NP-complete whether a CQ is contained in a Datalog program. • Triply exponential whether a Datalog program is contained in a CQ (Chaudhuri and Vardi).

  46. What Is It All Good For? • Dependencies and normalization are now familiar to most CS graduates. • Difference between BCNF and 3NF. • Tradeoffs: lossless joins versus maintaining dependencies. • Magic-sets, recursion used in IBM’s DB/2. • Universal-relation interfaces despite Codd.

  47. More “What Is It All Good For?” • Information integration . While logical query languages have not caught on, logic has been used to specify how legacy databases are combined into a uniform whole. • Tsimmis (Papakonstantinou, Vassalos). • Information Manifold (Levy). • Infomaster (Duschka, Genesereth).

More Related