190 likes | 329 Views
CMSC424: Database Design. Instructor: Amol Deshpande amol@cs.umd.edu. Today…. Integrity Constraints Relational Database Design. Referential Integrity Constraints. Idea: prevent “dangling tuples” (e.g.: a loan with a bname, Kenmore , when no Kenmore tuple in branch). Referencing
E N D
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu
Today… • Integrity Constraints • Relational Database Design
Referential Integrity Constraints • Idea: prevent “dangling tuples” (e.g.: a loan with a bname, Kenmore, when no Kenmore tuple in branch) Referencing Relation (e.g. loan) Referenced Relation (e.g. branch) “foreign key” bname primary key bname Ref Integrity: ensure that: foreign key value primary key value (note: don’t need to ensure , i.e., not all branches have to have loans)
Referential Integrity Constraints bname bname x Referencing Relation (e.g. loan) Referenced Relation (e.g. branch) x x In SQL: CREATE TABLE branch( bname CHAR(15) PRIMARY KEY ....) CREATE TABLE loan ( ......... FOREIGN KEY bname REFERENCES branch); Affects: 1) Insertions, updates of referencing relation 2) Deletions, updates of referenced relation
Referential Integrity Constraints c c ti x x tj x what happens when we try to delete this tuple? A B Ans: 3 possibilities 1) reject deletion/ update 2) set ti [c], tj[c] = NULL 3) propagate deletion/update DELETE: delete ti, tj UPDATE: set ti[c], tj[c] to updated values
Referential Integrity Constraints c c ti x x tj x what happens when we try to delete this tuple? A B CREATE TABLE A ( ..... FOREIGN KEY c REFERENCES B action .......... ) Action: 1) left blank (deletion/update rejected) 2) ON DELETE SET NULL/ ON UPDATE SET NULL sets ti[c] = NULL, tj[c] = NULL 3) ON DELETE CASCADE deletes ti, tj ON UPDATE CASCADE sets ti[c], tj[c] to new key values
Global Constraints • Idea: two kinds • 1) single relation (constraints spans multiple columns) • E.g.: CHECK (total = svngs + check) declared in the CREATE TABLE • 2) multiple relations: CREATE ASSERTION SQL examples: 1) single relation: All Bkln branches must have assets > 5M CREATE TABLE branch ( .......... bcity CHAR(15), assets INT, CHECK (NOT(bcity = ‘Bkln’) OR assets > 5M)) Affects: insertions into branch updates of bcity or assets in branch
Global Constraints SQL example: 2) Multiple relations: every loan has a borrower with a savings account CHECK (NOT EXISTS ( SELECT * FROM loan AS L WHERE NOT EXISTS( SELECT * FROM borrower B, depositor D, account A WHERE B.cname = D.cname AND D.acct_no = A.acct_no AND L.lno = B.lno))) Problem: Where to put this constraint? At depositor? Loan? .... Ans: None of the above: CREATE ASSERTION loan-constraint CHECK( ..... ) Checked with EVERY DB update! very expensive.....
SQL • Is that it ? • Unfortunately No • SQL 3 standard is several hundreds of pages (if not several thousands) • And expensive too.. • We will discuss a few more constructs along the way • E.g. Embedded SQL, creating indexes etc • Again, this is what the reference books are for; you just need to know where to look in the reference book
Questions ? • Next: • Relational Database Design
Relational Database Design • Where did we come up with the schema that we used ? • E.g. why not store the actor names with movies ? • Topics: • Formal definition of what it means to be a “good” schema. • How to achieve it.
Movies Database Schema • Movie(title, year, length, inColor, studioName, producerC#) • StarsIn(movieTitle, movieYear, starName) • MovieStar(name, address, gender, birthdate) • MovieExec(name, address, cert#, netWorth) • Studio(name, address, presC#) • Changed to: • Movie(title, year, length, inColor, studioName, producerC#, starName) • <StarsIn merged into above> • MovieStar(name, address, gender, birthdate) • MovieExec(name, address, cert#, netWorth) • Studio(name, address, presC#)
Movie(title, year, length, inColor, studioName, producerC#, starName) • Issues: • Redundancy higher storage, inconsistencies (“anomalies”) • Need nulls • Unable to represent some information without using nulls • How to store movies w/o actors (pre-productions etc) ?
Movie(title, year, length, inColor, studioName, producerC#, starNames) • Issues: • 3. Avoid sets • - Hard to represent • - Hard to query
Smaller schemas always good ???? Split Studio(name, address, presC#) into: Studio1 (name, presC#) Studio2(name, address)??? • This process is also called “decomposition” • Issues: • 4. Requires more joins (w/o any obvious benefits) • 5. Hard to check for some dependencies • What if the “address” is actually the presC#’s address ? • No easy way to ensure that constraint (w/o a join).
Smaller schemas always good ???? Decompose StarsIn(movieTitle, movieYear, starName) into: StarsIn1(movieTitle, movieYear) StarsIn2(movieTitle, starName) ??? • Issues: • 6. “joining” them back results in more tuples than what we started with • (King Kong, 1933, Watts) & (King Kong, 2005, Faye) • This is a “lossy” decomposition • We lost some constraints/information • The previous example was a “lossless” decomposition.
Desiredata • No sets • Correct and faithful to the original design • Avoid lossy decompositions • As little redundancy as possible • To avoid potential anomalies • No “inability to represent information” • Nulls shouldn’t be required to store information • Dependency preservation • Should be possible to check for constraints
Approach • We will encode and list all our knowledge about the schema somehow • Functional dependencies (FDs) • SSN name (SSN “implies” length) • If two tuples have the same “SSN”, they must have the same “name” • movietitle length --- Not true. • But, (movietitle, movieYear) length --- True. • We will define a set of rules that the schema must follow to be considered good • “Normal forms”: 1NF, 2NF, 3NF, BCNF, 4NF, … • Rules specify constraints on the schemas and FDs