640 likes | 835 Views
Normalization. Joe Meehean. Redundancies. Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing data make building software on top of DB difficult Normalization process of removing redundancies. Modification Anomalies.
E N D
Normalization Joe Meehean
Redundancies • Repeated data in database • Wastes space • Can cause modification anomalies • unexpected side effect when changing data • make building software on top of DB difficult • Normalization • process of removing redundancies
Modification Anomalies • Insert anomaly • extra data must be known to insert a row into a table • Update anomaly • must change multiple rows to modify a single fact • Deletion anomaly • deleting a row causes other data to be deleted • deletes more data than is necessary or desired
Bad College Database • All data in 1 table
Bad College Database • Insert anomaly • adding Rush Daniels as a student • requires knowing which offerings Rush isenrolled in • cannot add Rush as a student until he enrolls
Bad College Database • Update anomaly • if Emily changes her name to Emma • need to change multiple rows
Bad College Database • Delete anomaly • if Roger drops out of college and we delete him • we also delete that there is an offering of DB in the spring
Functional Dependencies (FDs) • Constraint between 2 or more columns • Represented by → • X determines Y (X→Y) if there exists at most 1 value of Y for each value of X • like a mathematical function f(x) = y • left hand side (or LHS) is called the determinant • e.g., StdNo determines Student first name • StdNo → First Name
Organizing FDs • Make a list • can condense list by listing all dependent columns for a given determinant • e.g., StdNo→First Name, Last Name • Determinants should be minimal • least # of columns required to determine values of other columns • e.g., StdNo,First Name → Last Name
Bad College Database • StdNo→ First Name, Last Name • OfferNo → Term, Year, Course No, Course Descr. • Std No, Offer No → Grade
Identifying FDs • From business narrative • Look for words like unique • e.g., “Each student has a unique student number, a first name, and a last name.” • Look for 1-M relationships • child (M-side) is the determinant (LHS) • e.g., “Faculty teach many offerings.” • e.g., Offer No → Faculty Id
Identifying FDs • From relational tables • FDs where determinant (LHS) is not the PK or a candidate key • recall, a candidate key is column(s) that unique identify a row • e.g., Zip →State • Combined PKs • does 1 column determine values of some other columns? • e.g., StdNo→ First Name, Last Name
Normal Forms • Normalization • remove redundancies in tables • removes modification anomalies • makes data easier to modify • Normal form • rules about functional dependencies (FDs) allowed • each successive normal form removes FDs
Normal Forms 1NF 2NF 3NF/BCNF
1st Normal Form • All relational tables are already in 1NF by definition
2nd Normal Form • Key columns • columns that are part (or all of) a candidate key • recall a candidate key is a key that uniquely identifies a row • Non-key columns • columns that are not part of a candidate key
2nd Normal Form • A table is in 2NF if each non-key column • depends on all candidate keys • NOT on any subset of any candidate key • check functional dependencies (FDs) • A 2NF violation • a FD where part of a key determines a non-key column
2nd Normal Form • 2NF Violations • StdNo → First Name, Last Name • OfferNo → Term, Year, Course No, Course Descr.
3rd Normal Form • A table is in 3NF • if it is in 2NF • AND each non-key column depends only on candidate keys • NOT other non-key columns • e.g., CourseNr → Course Desc. • 3NF violation • a non-key column on the right-hand side (RHS) • AND anything other than a candidate key on LHS
3rd Normal Form • 3NF prohibits transitive dependencies • Transitive dependencies • if A → B & B → C, then A → C • e.g., Offer No → Course No • & Course No → Course Desc. • then Offer No → Course Desc.
Combined 2NF & 3NF • A table is in 3NF if each non-key column depends on • all candidate keys • whole candidate keys • and nothing but candidate keys
3rd Normal Form • 2NF Violations • StdNo → First Name, Last Name • OfferNo → Term, Year, Course No, Course Descr. • 3NF Violations • CourseNo → Course Descr. • OfferNo→ Course Descr.
Boyce-Codd Normal Form (BCNF) • Revised, simpler version of 3NF • Covers additional special cases • A table is in BCNF if every determinant is a candidate key • Violations are easy to detect • determinant (LHS) is not a candidate key • e.g., StdNo → Last Name
Boyce-Codd Normal Form (BCNF) • Excludes 2 redundancies that 3NF does not • part of a key determines part of a key • a non-key determines part of a key
Boyce-Codd Normal Form (BCNF) • BCNF Violations • Email → StdNo
Simple Synthesis (BCNF) • Convert tables into BCNF • Eliminate extraneous columns from LHS of FDs • Remove derived (transitive) FDs • Arrange FDs into groups by determinant • For each FD group make table with determinant as primary key • Merge tables where one table include all columns of other table • choose PK of one of the tables to be PK of new table
Bad College Database (1) • StdNo→ First Name • StdNo → Last Name • OfferNo → Term • OfferNo→ Year • Offer No → Course No • Offer No → Course Descr. • Std No, Offer No → Grade • Course No → Course Descr.
Bad College Database (2) • StdNo→ First Name • StdNo → Last Name • OfferNo → Term • OfferNo→ Year • Offer No → Course No • Offer No → Course Descr. • Std No, Offer No → Grade • Course No → Course Descr.
Bad College Database (3) • StdNo→ First Name, Last Name • OfferNo → Term, Year, Course No • Std No, Offer No → Grade • Course No → Course Descr.
Importance of Normal Form Violations • We have the BCNF synthesis process • we can just make BCNF tables • why do we care about detecting NF violations? • DBA has 2 jobs • make new databases • maintain old ones • Making new DBs requires using BCNF synthesis process • Maintaining old DBs requires detecting NF violations • perhaps made by other employees • detecting violations narrows scope of DB redesign
4th Normal Form (4NF) • M-way relationships • associative entity types (weak entities) • multiple associations • primary key made of FKs from 3 or more tables • often represent important documents • glue multiple things together • e.g., invoice • can sometimes contain redundancies
4th Normal Form (4NF) Student Offering Textbook Enroll StdNo Name OfferNo Location TextNo TextTitle
4th Normal Form (4NF) Enroll Table
Multivalued Dependencies (MVDs) • Given table R with columns X,Y, and Z • X→→ Y • each X maps to a set of Ys (between 1 and M) • X →→ Z • each X maps to a set of Zs (between 1 and M) • Y & Z are independent • knowing Y doesn’t tell you anything about Z and vice-versa • Y →→ Z & Y → Z • Z →→ Y & Z → Y • also Y,V →→ Z, unless V →→ Z • Every FD is an MVD • not every MVD is an FD
Trivial MVDs • MVD X →→ Y is trivial if • Y is a subset of X • OR X and Y are the only columns in the table • OR X → Y and X → Z • e.g., has-job table • E# →→ P# • e.g. offering table • C#, S# →→ #S
Multivalued Dependences (MVDs) • non-trivial MVDs manifest as redundancies in tables • there exist rows where X and Y are the same but Z is different • e.g., enroll table • O# →→ S# • O# →→ T# • S# independent of T# • if Emily drops 242 it doesn’t change the text books
Multivalued Dependences (MVDs) • non-trivial MVDs manifest as redundancies in tables • there exist rows where X and Y are the same but Z is different • e.g., enroll table • O# →→ S# • O# →→ T# • S# independent of T# • if Emily drops 242 it doesn’t change the text books
4th Normal Form (4NF) • 4th normal form • table in BCNF • AND all MVDs are trivial • Detecting a violation • are there any MVDs? • are those MVDs non-trivial?
4th Normal Form (4NF) • Resolving violations • X →→ Y • X →→ Z
More Examples • S →→ O & S →→ G ? • O →→ G & O →→ S ? • G →→ S & G →→ O ?
More Examples • S →→ O & S →→ G ? • O →→ G & O →→ S ? • G →→ S & G →→ O ? • Offering and Grade not independent • Grade and Student not independent • Student and Offering not indepedent
More Examples • B →→ E & B →→ C • Is this a trivial MVD?
More Examples • B →→ E & B →→ C • Is this a trivial MVD? • E is not a subset of B & C is not a subset of B • B and E are not the only columns in the table • B → E & B → C • NO!!!
QUIZ BREAK!!! • PQty →→ PDesc & PQty →→ Part# ?