1 / 21

CPSC 504: Data Management Review of Relational Model 1/2

CPSC 504: Data Management Review of Relational Model 1/2. Laks V.S. Lakshmanan Dept. of CS UBC. The Basic Model. A DB = a (finite) set of (finite) relations.

donnel
Download Presentation

CPSC 504: Data Management Review of Relational Model 1/2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 504: Data ManagementReview of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

  2. The Basic Model • A DB = a (finite) set of (finite) relations. • A relation has a well-defined schema – its set of attributes, each with its domain: e.g., dom(songName), dom(Artist), dom(AlbumNo), dom(fileFormat), … • A relation – a subset of the Cartesian (aka cross) product of its attribute domains: • e.g., myfavs  dom(AlbumNo) x dom(songName) x dom(Artist) x dom(fileFormat).

  3. How do you get there? • In practice, you often use the ER (for entity-relationship) model as an informal spec of a design of a relational database schema. • Schema = structure or table template. • Relation = table instance.

  4. Example ER schema A schema in the ER model is specified using an ER diagram. Actor Movie Stars in Name DOB Addr Title Year Dir Role

  5. Some constraints • Each relation has a key: candidate keys, primary keys, superkeys, …? • Functional dependency (FD): formalization and generalization of key. • r(R), X, Y  R. r |= XY iff any pair of tuples in r that agree on X, necessarily agree on Y. • e.g.: AlbumNoArtist, AlbumNo, songNameAlbumNo, …, fileFormat. • {AlbumNo, songName} – a key, but not AlbumNo. • Note: ER diagrams can express a rich class of (but not all) constraints.

  6. More on FDs • FDs help determine keys: there are efficient algorithms as well as an assortment of (computationally) hard problems. • FDs are pointers to redundancies in data: e.g., same Artist (value) will appear many times, once for each song in an album of that artist. • Why is redundancy (in representation) bad? • efficient algorithms for redundancy removal – the so-called normal forms. • There is a full assortment of them – 1NF, 2NF, 3NF, BCNF, 4NF, … • FDs/keys help with indexing, with query optimization, with data exchange, etc.

  7. More on Normal Forms • 1NF really says tuples must be “flat”. • Other NFs really seek to capture different degrees of redundancy (and eliminate them). • 3NF, BCNF – some of the most important ones in practice. • Normalization algorithms: start from U, the set of all attributes, and figure out which attr sets should stay together, which ones should split up.

  8. Some more on FDs • Constraints such as FDs are called integrity constraints. Why? • ICs are statements in predicate logic. • What does an FD really say? • Need to reason with ICs, so: • Can determine keys, • Can check if certain ICs are enforced. • Can understand interaction of ICs.

  9. Last bits on FDs • FDs admit a sound and complete axiom system: • Reflexivity: if X  Y, then YX. • Augmentation: if WZ and X  Y, then WYZX. • Transitivity: if XY and YZ, then XZ. • Not all classes of ICs admit a S&C axiom system. • Some consequences of the above axiom system for FDs.

  10. ICs, Tableaux, and Queries • An FD can also be stated in a “pictorial” form: Schema/column headers Pattern rows. x2 = x3 Summary row/ Consequent/ Conclusion.

  11. ICs, Tableuax, and Queries • Tableaux are general; can express what conditions must hold when certain patterns are present in data. • Can express equality generating dependencies (egd’s) and tuple generating dependencies (tgd’s). • Can also express queries (by using summary row for saying what you want in the output).

  12. ICs, Tableaux, and Queries • Tableaux can also be expressed in the form of rules, also known as “Horn clauses”. • Examples: X2 = X3  r(X1,X2,_,_), r(X1,X3,_,_). Which really is saying … X2 = X3  r(X1,X2,Y3,Y4), r(X1,X3,Y5,Y6). Logically speaking, what are these rules saying? p(X,Y)  a(X,Z), b(Z,Y), a(Y,W), W>100.

  13. Some more on NFs – 3NF • Def. of 3NF: r(R), given set of applicable FDs F, is in 3NF iff for every XA that holds for r, X is a superkey for r according to F OR A belongs to some key. • e.g.: r(AlbumNo, Artist, songName, fileFormat) is not in 3NF, while r1(AlbumNo, Artist), r2(AlbumNo, songName, fileFormat) are. Why?

  14. Some more on NFs – BCNF • Def. of BCNF: r(R), given applicable FDs F, is in BCNF iff for every XA that holds for r, X is a superkey for r according to F. • Clearly stronger than 3NF. • e.g.: {S(treet), (s)T(ate), Z(ip)}, with STZ, ZT. – it’s in 3NF. • Not in BCNF. What’s the redundancy here? • {ZT, ZS} is in BCNF. • Can you anticipate any problems with the design {ZT, ZS}? • Majority of 3NF designs in practice tend to be in BCNF.

  15. More on ICs • Inclusion depencency (IND): r1(Emp, Dept, Sal), r2(Mgr, Dept, Budget). r2[Mgr]  r1[Emp]: every manager is an employee. • INDs are a generalization of referential integrity constraints. • FDs, INDs. – most fundamental. • What can we say about reasoning with INDs, or with INDs + FDs?

  16. More on INDs • Good news -- a complete axiomatization. • Reflexivity: r[X]r[X]. • Projection+permutation: if r[A1,…,Am] s[B1…Bm], then r[Ai1…Aik] s[Bi1…Bik]. • Transitivity: if r[X] s[Y] and s[Y] t[Z], then r[X] t[Z]. • But, while testing whether F |= f can be done in linear time, testing I |= i is PSPACE-complete!

  17. Story of FDs+INDs • Recall: FDs admit complete axiomatization, as do INDs. • But, put together, there is no complete (k-ary) axiomatization for INDs+FDs! • We have seen some (very) positive results and some negative results. • What approaches can you think of for tackling the negative situations?

  18. Last bits on ICs • There is a rich landscape of lot more classes of ICs. • Fundamentally split into two syntactic classes: • Equality generating dependencies (egds), e.g., FDs. • Tuple generating dependencies (tgds), e.g., INDs. • The value is NOT that these fancier ICs arise as is in practice. • Rather, they arise in solving various problems: e.g., integrating data from diverse sources. • E.g. tgd: (\forall M,D,B) [r2(M, D, B)  (\exists S) r1(M, D, S)]. • What is it saying, logically speaking?

  19. More Reading • J.D. Ullman: Principles of Database and Knowledge-Base Systems, vol. I and II. CS Press, 1988. • R. Ramakrishnan & J. Gehrke. Database Management Sytems. Mc-Graw Hill, latest edition. • S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. 1995. Available online from DBLP.

  20. Some Resources for Research Literature • DBLP • Citeseer • ACM digital library • If you aren’t already a member, consider becoming ACM student member: really a great deal!

  21. Some top conferences/journals for DMM Research • ACM SIGMOD/PODS • VLDB / PVLDB • IEEE ICDE • ICDT / EDBT • KDD, PKDD, ICDM, SDM, … • CIKM • WWW, WSDM, ICWSM, • TODS, IS, VLDBJ, TKDE, TCS, JACM, JCSS, I&C, …

More Related