160 likes | 168 Views
Decomposition and Higher Forms of Normalization. Chapter 15.8 (skim). "Lossless" Joins. The main idea: if you decompose a relation schema, then join the parts of an instance via a natural join, you might get more rows than you started with, i.e., spurious tuples This is bad!
E N D
Decomposition and Higher Forms of Normalization Chapter 15.8 (skim)
"Lossless" Joins • The main idea: if you decompose a relation schema, then join the parts of an instance via a natural join, you might get more rows than you started with, i.e., spurious tuples • This is bad! • Called a "lossy join". • Goal: decompositions which produce only "lossless" joins • "non-additive" join is more descriptive
Preserving FDs • What if, when a relation is decomposed, the X of an XY ends up only in one of the new relations and the Y ends up only in another? • Such a decomposition is not “dependency-preserving.” • Goal: Always have FD-preserving decompositions
Decomposing R Into BCNF • Pick an X-> A which violates BCNF, where A is a single attribute. • Decompose so that R1={X,A} is in one relation, R2=R-A is in the other. • Repeat if necessary on R2. • Guaranteed lossless... but may lose dependencies • SBD with SB->D, D->B
Decomposing R into 3NF • The algorithm is much more complicated • 1. Get a “minimal cover” of FDs • 2. Find a lossless-join decomposition of R (which might miss dependencies) • 3. Add additional relations to the decomposition to cover any missing FDs of the cover • Result will be lossless, will be dependency-preserving 3NF; might not be BCNF
Fact of life... Finding a decomposition which is both lossless and dependency-preserving is not always possible.
Contracts schema CSJDPQV Contracts (cid, supplierid, projectid, deptid, partid, qty, value) cid is a key; JP -> C; SD -> P What are candidate keys? BCNF, 3NF decompositions? What if J->S is added?
Multivalued Dependencies (MVDs) • XY means that given X, there is a unique set of possible Y values (which do not depend on other attributes of the relation) • PARENTNAMECHILDNAME • An FD is also a MVD • MVD problems arise if there are two independent 1:N relationships in a relation.
Textbook Example (3-way pun) • course, teacher, book • Each teacher teaches a set of courses • teacher courses • Each course has a recommended set of books (independent of teacher) • course book • No FDs!
MVD Theory • A whole theory exists, analogous to FD theory • Armstrong’s axioms plus 5 additional rules
Fourth Normal Form • A relation R is in 4NF if for every nontrivial XY, X is a superkey of R. • Decomposition into 4NF: If there is a non-trivial XY, form one relation with only X and Y, and another with R-Y. • In other words, allow only one MVD per relation • This will be lossless, but not necessarily FD-preserving. • Achieving 4NF is a trade-off
Fifth Normal Form • Sometimes a relation cannot be losslessly decomposed into two relations, but can be into three or more. • 5NF captures the idea that a relation scheme must have some particular lossless decomposition ("join dependency"). • Finding actual 5NF cases is difficult.
Normalization Summary • 1NF: usually part of the woodwork • even so, know how to decompose • 2NF: usually skipped • but lots of defs. that make great exam Q's! • 3NF: a biggie • Always aim for this • BCNF and 4NF: tradeoffs start here • in re: d-preserving and losslessness • 5NF: You can say you've heard of it...
Caveat • Normalization is not the be-all and end-all of DB design • Example: suppose attributes A and B are always used together, but normalization theory says they should be in different tables. • Decomposition might produce unacceptable performance loss (extra disk reads) • Plus -- there are constraints other than FDs and MVDs
Looking Ahead: One Query Is Usually Not Enough • A meaningful unit of DB work might involve doing one or more queries, checking constraints, performing calculations, interacting with user, etc. • Such a unit is termed a "transaction" • "Transaction Processing" will be our next topic
Current Trends • Object DBs and Object-Relational DB’s • May permit complex attributes • 1st normal form unnecessary • Data Warehouses • Huge historical databases, seldom or never updated after creation • Joins expensive or impractical • Argues against normalization • Every-day Relational DBs • Aim for BCNF, settle for 3NF