520 likes | 631 Views
Relational Database Design III. Normalization:. A table is good if it is about one thing. A table is good if the only FDs are of the form Consider the table below with the following FDs. We know that its CKs are {A,F}, {B,F} and {C,F}. So R is a bad table with several bad FDs.
E N D
Relational Database Design III Database Principles
Normalization: • A table is good if it is about one thing. • A table is good if the only FDs are of the form • Consider the table below with the following FDs. We know that its CKs are {A,F}, {B,F} and {C,F}. • So R is a bad table with several bad FDs. key attribute bad B, E -> C A, F -> B C -> A, D B -> E good bad bad Database Principles
Exercise: • Label the FDs of the following table as good or bad. The only CK is {B}. bad D -> C, A B -> F F -> C, E B -> D good bad good Database Principles
Exercise: • Label the FDs of the following table as good or bad. The CKs are {B,E}, {D,E} and {F,E}. A, B -> C D, E -> B F -> D B, E -> F D -> A bad good bad good bad Database Principles
Database Principles Prime and Non-prime Attributes • Attributes that belong to some CK are called prime. • Other attributes are called non-prime. • The CKs of R are {A,F}, {B,F} and {C,F}. • Prime Attributes: A, B, C, F • Non-prime Attributes: D, E B, E -> C A, F -> B C -> A, D B -> E
Database Principles Prime and Non-prime Attributes • The CK of R is {B}. • Prime Attributes: B • Non-prime Attributes: A, C, D, E, F D -> C, A B -> F F -> C, E B -> D
Database Principles Prime and Non-prime Attributes • The CKs are {B,E}, {D,E} and {F,E}. • Prime Attributes: B, D, F, E • Non-prime Attributes: A, C A, B -> C D, E -> B F -> D B, E -> F D -> A
Database Principles BCNF 1NF 3NF 2NF ... 4NF The Normal Forms: • The process of turning bad tables into good ones is called normalization. • The process of taking good tables and combining them in such a way as to create a bad table (typically done for the sake of efficiency) is called denormalization. unnormalizedtables
Database Principles BCNF 1NF 3NF 2NF ... 4NF Basic Idea: • Start with a table in one of the outer rings of the diagram and decompose it into two tables that belong to an inner ring. • NOTE: The closer to the center the “better” the tables.
Database Principles Decomposing Tables: • Decomposing a table means breaking it up into two or more tables using the relational algebra project operator. R A B C D E S T A B C C D E S = πA,B,C ( R ) T = πC,D,E ( R ) Exercise: Write these relational algebra operations in SQL.
Database Principles Beware: Not all Table Decompositions are Safe: • Consider the table • Suppose we decompose the table into two tables Student SID SName DOB Major s1 John 1982 Hist s2 Mary 1983 Eng s3 Alfredo 1975 Math s4 Tracey 1982 CS T = πdob,major ( Student ) S = πsid,sname,dob ( Student ) SID SName DOB DOB Major s1 John 1982 s2 Mary 1983 s3 Alfredo 1975 s4 Tracey 1982 1982 Hist 1983 Eng 1975 Math 1982 CS
Database Principles Question: • Do S and T contain the same information as Student? • Examples:So you can see that S and T do NOT contain the same info as Student. • The easy test is to try to recreate Student from S and T using JOIN. Math What is Alfredo’s major? What year was Mary born in? 1983 History or Computer Science What is John’s major?
Database Principles Answer: • Join(S,T) • We notice that the join table contains noise rows. • This means we can not reconstruct Student exactly so we have lost info using this decomposition. • It is called a lossy-join decomposition but we might call it a noisy-join decomposition. SID SName DOB Major s1 John 1982 Hist s1 John 1982 CS s2 Mary 1983 Eng s3 Alfredo 1975 Math s4 Tracey 1982 Hist s4 Tracey 1982 CS
Database Principles Lossless-Join Decomposition • A decomposition of a table R into two tables, S and T, is called a lossless-join decomposition if R = join(S,T), with no new rows and no rows lost. • We might call this a noiseless-join decomposition. • If a decomposition of R into two tables, S and T, is a lossless-join decomposition then no information found in the original table R is lost.
Database Principles Main Theorem: • If R is a table and we decompose R into two tables, S and T, such that • union(S ,T)= R and intersect( S , T ) is a key of either S or T then this is a lossless-join decomposition of R and R = join(S,T). NOTE: What makes our original decomposition a lossy-join decomposition is that {DOB} = intersect( S , T ) is neither a key to S nor to T. ∩
Database Principles Theorem Proof: • If R is a table and we decompose R into two tables, S and T, such that • union(S ,T) = R and • intersect( S , T ) is a key of T. Let |R| denote the number of rows in R. |S| <= |R| and |T| <= |R| since both S and T are produced by projecting over R. R subset_of join(S,T) so |R| <= |join(S,T)| Since intersect( S , T ) is a key to T, each value of intersect( S , T ) appears once and only once in T. Thus each row in S will join to only one row of T and so |join(S,T)| <= |S|. Since |S| <= |R| we conclude |join(S,T)| == |R| and so the two tables are equal.
Database Principles Lossless-join Example: • Consider the table • Suppose we decompose the table into two tables Student SID SName DOB Major s1 John 1982 Hist s2 Mary 1983 Eng s3 Alfredo 1975 Math s4 Tracey 1982 CS T = πsid,dob,major ( Student ) S = πsid,sname,dob ( Student ) SID SName DOB SID DOB Major s1 John 1982 s2 Mary 1983 s3 Alfredo 1975 s4 Tracey 1982 s1 1982 Hist s2 1983 Eng s3 1975 Math s4 1982 CS
Database Principles Question: • Do S and T contain the same information as Student? • Examples:So you can see that S and T do contain the same info as Student. • Try to recreate Student from S and T using JOIN. Math What is Alfredo’s major? What year was Mary born in? 1983 History What is John’s major?
Database Principles Answer: • Join(S,T) • We notice that the join table contain no noise rows. • This means we can reconstruct Student exactly so we have not lost info using this decomposition. • For that reason it is called a lossless-join decomposition but we might call it a noiseless-join decomposition. SID SName DOB Major s1 John 1982 Hist s2 Mary 1983 Eng s3 Alfredo 1975 Math s4 Tracey 1982 CS
Database Principles We Use Lossless-join Decompositions. • In all of the work that follows, the only decompositions we will be applying are lossless-join decompositions.
Database Principles Normalization (1NF): • First Normal Form: A table is in First Normal Form (1NF) if all of its values are atomic. Enrollment SID CrsID NOT 1NF s1 c1,c2,c3 Enrollment SID CrsID 1NF s1 c1 s1 c2 s1 c3 NOTE: All our tables will be assumed 1NF
Database Principles Normalization (2NF): • Second Normal Form: A table is in Second Normal Form (2NF) if it is in 1NF and there are no bad FDs of the form: • Note 1: This situation can not occur if all CKs are singletons. In that case all prime attributes are also keys. • Note 2: Suppose the CKs for a table are {A,E} and {B, E} and suppose A, B --> C. This is not a 2NF violation since {A,B} is not a subset of a CK. We will deal with this situation alter. prime (proper subset of candidate key) -> non-prime
Database Principles ISBN author title pub_name pub_date c_price a_addr 2NF Violation Example: • Suppose we have an alternative BOOK table whose CKs are {ISBN} and {author,title}. • Explanation: The prime attributes are {ISBN, author, title} and a_addr is non-prime so the FD violates the 2NF rule. Book1 CKs: {ISBN}, {author,title} FD: author --> a_addr -- 2NF violation
Database Principles author a_addr ISBN author title pub_name pub_date c_price 2NF Violation Resolution • To remove an FD that is a 2NF violation from a table perform the following decomposition: • create a new table containing all the attributes of the bad FD • remove the attributes from the right-hand-side of the bad FD from the original table Author CK: {author} FD: author --> a_addr -- good Book CK: {ISBN} FDs: all good
Database Principles intersect(Book , Author ) = {author} Did We Lose Information? • {author} is the intersection of the two table schemas and is the key to one of the tables (Author) so this is a lossless-join decomposition. • In other words: join(Book, Author) = Book1
2NF Exercise: • Consider the table below with the following FDs and whose CKs are {A,F}, {B,F} and {C,F}. • Step 1: Identify the prime and non-prime attributes • Step 2: Identify the 2NF violating FDs. B, E -> C A, F -> B C -> A, D B -> E prime: A, B, C, F non-prime: D, E C --> D and B --> E Database Principles
2NF Exercise: • Step 3: Decompose the original table R, removing the bad FDsdecomposes to S R’ T C D A B C F B E CKs: {A, F}, {B, F}, {C,F} FDs: B -> C -- bad A, F -> B -- good C -> A -- bad PK: C FD: C -> D -- good PK: B FD: B -> E -- good Database Principles
Database Principles Exercise: • In the previous example we combined the two FDsas • Prove this is so. B, E -> C -- FD 1 B -> E -- FD 2 B -> C (i) B -> B, E -- FD1, ident(B), aug+ (ii) B -> C -- (i), FD1, trans
Normalization (3NF): • Third Normal Form: A table is in Third Normal Form (3NF) if it is in 2NF and there are no bad FDs of the form: non-prime -> non-prime Database Principles
3NF Violation Example: • Suppose we have the original Cardholder table created from the ER Diagram. prime: borroweridnon-prime: b_name, b_addr, b_status, loan_limit borrowerid -> b_name -- goodborrowerid -> b_addr -- good borrowerid -> b_status -- good borrowerid -> loan_limit -- good b_status -> loan_limit -- 3NF violation Database Principles
Database Principles 3NF Violation Resolution • To remove an FD that is a 3NF violation from a table perform the following decomposition: • create a new table containing all the attributes of the bad FD • remove the attributes from the right-hand-side of the bad FD from the original table Status CK: {b_status} FD: b_status --> loan_limit -- good b_status loan_limit Cardholder CK: {borrowerid} FDs: all good borrowerid b_name b_addr b_status
3NF Exercise: • The following table has only one CK and that is {B}. Decompose the table. • Step 1: List the prime and non-prime attributes Step 2: Identify the 3NF violating FDs. bad D -> C, A B -> F F -> C, E B -> D good bad good prime: B non-prime: A, C, D, E, F D -> C, A and F -> C, E Exercise: Prove that a table with only singleton CKs has no 2NF violations. Database Principles
3NF Exercise: • Step 3: Decompose the original table R, removing the bad FDsdecomposes to D -> C, A B -> F F -> C, E B -> D S R’ T B D F F E D C A PK: B FDs: B -> D,F -- good PK: D FD: D -> C,A -- good PK: F FD: F -> E -- good NOTE: The column C is missing from the table T because we removed it from R when we created S so it was notavailable when we created T. Database Principles
Database Principles Boyce-Codd Normal Form (BCNF): • Boyce-Codd Normal Form: A table is in Boyce-Codd Normal Form (BCNF) if it is in 3NF and there are no remaining bad FDs of any kind. • Any remaining bad FDs must be of one of the following forms: prime(not a key) -> prime prime(not a subset of a candidate key) -> non- prime prime(not subset of a candidate key), non-prime -> prime prime(not subset of a candidate key), non-prime ->non-prime Exercise: Prove non-prime --> prime is impossible.
Database Principles ShiftAssignment Officer VIN Shift Date R_Level Weather o v s d r w a BCNF Violation Example: • The table below describes the assignment of a police officer (o) to drive a certain vehicle (v) during a certain shift (s) on a certain day (d). The assignment is given a certain readiness level (r ) that depends on the skills of the officer and the equipment in the vehicle. The weather (w) is a function of the date and the shift. • The only CK is {VIN,Shift,Date}.
Database Principles ShiftAssignment Officer VIN Shift Date R_Level Weather o v s d r w a BCNF Violation Example: Prime: VIN, Shift, Date Non-prime: Officer, R_Level, Weather FDs: VIN Shift, Date -> Officer -- good VIN, Officer -> R_Level -- BCNF violation Shift, Date -> Weather -- BCNF violation
Database Principles BCNF Violation Resolution • To remove an FD that is a BCNF violation from a table perform the following decomposition: • create a new table containing all the attributes of the bad FD Readiness CK: {VIN, Officer} FD: VIN, Officer --> R_Level -- good VIN Officer R_Level Weather CK: {Shift, Date} FD: Shift, Date --> Weather -- good Shift Date Weather
Database Principles pk BCNF Violation Resolution • To remove an FD that is a BCNF violation from a table perform the following decomposition: • remove the attributes from the right-hand-side of the bad FDs from the original table ShiftAssignment PK: {VIN, Shift, Date} FD: VIN, Shift, Date --> Officer -- good Officer VIN Shift Date
Database Principles R A B C D E Exercise: FDs: A -> D D -> B A, C -> E E -> A CKs: (A, C} {E, C} Prime: A, E, C Non-prime: B, D Violations: -- 2NF viol A -> D D -> B A, C -> E E -> A -- 3NF viol -- good -- BCNF viol
Database Principles R’ A B C E R’ A B C E Exercise: • Remove 2NF violations • Remove 3NF violations: S A D FD: A -> D -- good pk A, C -> E -- good E -> A -- BCNF viol D -> B seems to have disappeared?? But did it? A -> D together with D -> B imply A -> B This new FD is another 2NF violation
Database Principles R’ S A D B A B C E pk Exercise: • Remove 2NF violations (second try) • Remove 3NF violations: FD: A -> D,B -- good D -> B -- 3NF viol R” S S D B A D A C E pk pk D -> B -- good A -> D -- good A, C -> E -- good E -> A -- BCNF viol
Database Principles R’” C E U A E Exercise: • Remove BCNF violation S S D B A D pk pk pk D -> B -- good A -> D -- good NOTE: The FD A, C -> E seems to have disappeared but performing a join of R’” and U will reproduce R” R” = join(R’”,U) and so recover the original FD (and its associated Enterprise Rule). pk E -> A -- good
Database Principles E1 E2 E3 Fourth Normal Form (4NF): • By this time there are no more bad FDs. • This makes the tables pretty good but sometimes it is still possible to improve them. • 4NF violations can come about from Enterprise Rules that can not be expressed as an FD. • 4NF violations typically come about when the two rela- tionships, r1and r2, described below are converted into a single table. r1 r2 e1 e2 e3 (1,n) (1,n) (1,n) (1,n)
Database Principles car_type maker_name country_name sells sells_in car_type maker_name maker_name country_name 4NF Violation Example: • From the ERD • We produce the table • Instead of CAR type MAKER name COUNTRY name sold_in sells (1,n) (1,n) (1,n) (1,n) sells_sold_in
Database Principles 4NF violation example What rows would the table sells_sold_in contain? One possibility is sells_sold_in = join(sells,sold_in) This will mean that in sells_sold_in every car_type that a particular car maker produces will be matched with every country that the same car maker sells cars in. If this is the case, then the sells_sold_in table contains a 4NF violation.
Database Principles car_type maker_name country_name c1 m1 ctry1 c1 m1 crty2 c2 m1 crty1 c2 m2 crty2 How to Recognize a 4NF Violation: • Suppose we have a car maker that sells two models in two countries. • In the single sells_sold_in table this would appear as sells sells_in car_type maker_name maker_name country_name c1 m1 c2 m1 m1 crty1 m1 crty2 sells_sold_in
Database Principles make_sells_in car_type maker_name country_name c1 m1 ctry1 c1 m1 crty2 c2 m1 crty1 c2 m2 crty2 How to Recognize a 4NF Violation: • Within this table there may be a hidden Enterprise Rule Enterprise Rule: If a car maker sells one car type in a country then it sells all its car types in the same country. Exercise: Verify that the insert, update and delete anomalies exist in the table above and they do not exist in the tables below sells sells_in car_type maker_name maker_name country_name c1 m1 c2 m1 m1 crty1 m1 crty2
Database Principles How to Recognize a 4NF Violation: A table is a 4NF violation only if the permanent Enterprise Rule exists and not whether or not, on a particular day, it happens to contain all the rows of a join of two other tables. The sells_sold_in table has a 4NF violation only if every time you might ever construct this join. sells_sold_in = join(sells, sold_in) NOTE: intersect(sells, sold_in ) is not a key in either sells or sold_in so as a decomposition, this is not guaranteed to be a lossless-join decomposition. It is lossless-join if sells_sold_in in a 4NF violation however. prove this.
Database Principles sells maker_sells_in car_type maker_name car_type maker_name country_name c1 m1 c2 m1 c1 m1 ctry1 c1 m1 crty2 c2 m1 crty1 c2 m2 crty2 sells_in maker_name country_name m1 crty1 m1 crty2 How to Recognize a 4NF Violation looking at Rows: • sells_sold_in is a 4NF violation if join(sells, sells_in) is always equal to sells_sold_in join(sells, sells_in) = maker_sells_in = join
Database Principles 4NF Violation Resolution Decompose the table with a 4NF violation into two tables. Since the 4NF-violation table is always equal to the join the two decomposition tables this decomposition will be a lossless-join decomposition.