210 likes | 287 Views
CPSC 504: Data Management Review of Relational Model 1/2. Laks V.S. Lakshmanan Dept. of CS UBC. The Basic Model. A DB = a (finite) set of (finite) relations.
E N D
CPSC 504: Data ManagementReview of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC
The Basic Model • A DB = a (finite) set of (finite) relations. • A relation has a well-defined schema – its set of attributes, each with its domain: e.g., dom(songName), dom(Artist), dom(AlbumNo), dom(fileFormat), … • A relation – a subset of the Cartesian (aka cross) product of its attribute domains: • e.g., myfavs dom(AlbumNo) x dom(songName) x dom(Artist) x dom(fileFormat).
How do you get there? • In practice, you often use the ER (for entity-relationship) model as an informal spec of a design of a relational database schema. • Schema = structure or table template. • Relation = table instance.
Example ER schema A schema in the ER model is specified using an ER diagram. Actor Movie Stars in Name DOB Addr Title Year Dir Role
Some constraints • Each relation has a key: candidate keys, primary keys, superkeys, …? • Functional dependency (FD): formalization and generalization of key. • r(R), X, Y R. r |= XY iff any pair of tuples in r that agree on X, necessarily agree on Y. • e.g.: AlbumNoArtist, AlbumNo, songNameAlbumNo, …, fileFormat. • {AlbumNo, songName} – a key, but not AlbumNo. • Note: ER diagrams can express a rich class of (but not all) constraints.
More on FDs • FDs help determine keys: there are efficient algorithms as well as an assortment of (computationally) hard problems. • FDs are pointers to redundancies in data: e.g., same Artist (value) will appear many times, once for each song in an album of that artist. • Why is redundancy (in representation) bad? • efficient algorithms for redundancy removal – the so-called normal forms. • There is a full assortment of them – 1NF, 2NF, 3NF, BCNF, 4NF, … • FDs/keys help with indexing, with query optimization, with data exchange, etc.
More on Normal Forms • 1NF really says tuples must be “flat”. • Other NFs really seek to capture different degrees of redundancy (and eliminate them). • 3NF, BCNF – some of the most important ones in practice. • Normalization algorithms: start from U, the set of all attributes, and figure out which attr sets should stay together, which ones should split up.
Some more on FDs • Constraints such as FDs are called integrity constraints. Why? • ICs are statements in predicate logic. • What does an FD really say? • Need to reason with ICs, so: • Can determine keys, • Can check if certain ICs are enforced. • Can understand interaction of ICs.
Last bits on FDs • FDs admit a sound and complete axiom system: • Reflexivity: if X Y, then YX. • Augmentation: if WZ and X Y, then WYZX. • Transitivity: if XY and YZ, then XZ. • Not all classes of ICs admit a S&C axiom system. • Some consequences of the above axiom system for FDs.
ICs, Tableaux, and Queries • An FD can also be stated in a “pictorial” form: Schema/column headers Pattern rows. x2 = x3 Summary row/ Consequent/ Conclusion.
ICs, Tableuax, and Queries • Tableaux are general; can express what conditions must hold when certain patterns are present in data. • Can express equality generating dependencies (egd’s) and tuple generating dependencies (tgd’s). • Can also express queries (by using summary row for saying what you want in the output).
ICs, Tableaux, and Queries • Tableaux can also be expressed in the form of rules, also known as “Horn clauses”. • Examples: X2 = X3 r(X1,X2,_,_), r(X1,X3,_,_). Which really is saying … X2 = X3 r(X1,X2,Y3,Y4), r(X1,X3,Y5,Y6). Logically speaking, what are these rules saying? p(X,Y) a(X,Z), b(Z,Y), a(Y,W), W>100.
Some more on NFs – 3NF • Def. of 3NF: r(R), given set of applicable FDs F, is in 3NF iff for every XA that holds for r, X is a superkey for r according to F OR A belongs to some key. • e.g.: r(AlbumNo, Artist, songName, fileFormat) is not in 3NF, while r1(AlbumNo, Artist), r2(AlbumNo, songName, fileFormat) are. Why?
Some more on NFs – BCNF • Def. of BCNF: r(R), given applicable FDs F, is in BCNF iff for every XA that holds for r, X is a superkey for r according to F. • Clearly stronger than 3NF. • e.g.: {S(treet), (s)T(ate), Z(ip)}, with STZ, ZT. – it’s in 3NF. • Not in BCNF. What’s the redundancy here? • {ZT, ZS} is in BCNF. • Can you anticipate any problems with the design {ZT, ZS}? • Majority of 3NF designs in practice tend to be in BCNF.
More on ICs • Inclusion depencency (IND): r1(Emp, Dept, Sal), r2(Mgr, Dept, Budget). r2[Mgr] r1[Emp]: every manager is an employee. • INDs are a generalization of referential integrity constraints. • FDs, INDs. – most fundamental. • What can we say about reasoning with INDs, or with INDs + FDs?
More on INDs • Good news -- a complete axiomatization. • Reflexivity: r[X]r[X]. • Projection+permutation: if r[A1,…,Am] s[B1…Bm], then r[Ai1…Aik] s[Bi1…Bik]. • Transitivity: if r[X] s[Y] and s[Y] t[Z], then r[X] t[Z]. • But, while testing whether F |= f can be done in linear time, testing I |= i is PSPACE-complete!
Story of FDs+INDs • Recall: FDs admit complete axiomatization, as do INDs. • But, put together, there is no complete (k-ary) axiomatization for INDs+FDs! • We have seen some (very) positive results and some negative results. • What approaches can you think of for tackling the negative situations?
Last bits on ICs • There is a rich landscape of lot more classes of ICs. • Fundamentally split into two syntactic classes: • Equality generating dependencies (egds), e.g., FDs. • Tuple generating dependencies (tgds), e.g., INDs. • The value is NOT that these fancier ICs arise as is in practice. • Rather, they arise in solving various problems: e.g., integrating data from diverse sources. • E.g. tgd: (\forall M,D,B) [r2(M, D, B) (\exists S) r1(M, D, S)]. • What is it saying, logically speaking?
More Reading • J.D. Ullman: Principles of Database and Knowledge-Base Systems, vol. I and II. CS Press, 1988. • R. Ramakrishnan & J. Gehrke. Database Management Sytems. Mc-Graw Hill, latest edition. • S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. 1995. Available online from DBLP.
Some Resources for Research Literature • DBLP • Citeseer • ACM digital library • If you aren’t already a member, consider becoming ACM student member: really a great deal!
Some top conferences/journals for DMM Research • ACM SIGMOD/PODS • VLDB / PVLDB • IEEE ICDE • ICDT / EDBT • KDD, PKDD, ICDM, SDM, … • CIKM • WWW, WSDM, ICWSM, • TODS, IS, VLDBJ, TKDE, TCS, JACM, JCSS, I&C, …