1 / 52

Relational Database Design III

Relational Database Design III. Normalization:. A table is good if it is about one thing. A table is good if the only FDs are of the form Consider the table below with the following FDs. We know that its CKs are {A,F}, {B,F} and {C,F}. So R is a bad table with several bad FDs.

efuru
Download Presentation

Relational Database Design III

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relational Database Design III Database Principles

  2. Normalization: • A table is good if it is about one thing. • A table is good if the only FDs are of the form • Consider the table below with the following FDs. We know that its CKs are {A,F}, {B,F} and {C,F}. • So R is a bad table with several bad FDs. key  attribute bad B, E -> C A, F -> B C -> A, D B -> E good bad bad Database Principles

  3. Exercise: • Label the FDs of the following table as good or bad. The only CK is {B}. bad D -> C, A B -> F F -> C, E B -> D good bad good Database Principles

  4. Exercise: • Label the FDs of the following table as good or bad. The CKs are {B,E}, {D,E} and {F,E}. A, B -> C D, E -> B F -> D B, E -> F D -> A bad good bad good bad Database Principles

  5. Database Principles Prime and Non-prime Attributes • Attributes that belong to some CK are called prime. • Other attributes are called non-prime. • The CKs of R are {A,F}, {B,F} and {C,F}. • Prime Attributes: A, B, C, F • Non-prime Attributes: D, E B, E -> C A, F -> B C -> A, D B -> E

  6. Database Principles Prime and Non-prime Attributes • The CK of R is {B}. • Prime Attributes: B • Non-prime Attributes: A, C, D, E, F D -> C, A B -> F F -> C, E B -> D

  7. Database Principles Prime and Non-prime Attributes • The CKs are {B,E}, {D,E} and {F,E}. • Prime Attributes: B, D, F, E • Non-prime Attributes: A, C A, B -> C D, E -> B F -> D B, E -> F D -> A

  8. Database Principles BCNF 1NF 3NF 2NF ... 4NF The Normal Forms: • The process of turning bad tables into good ones is called normalization. • The process of taking good tables and combining them in such a way as to create a bad table (typically done for the sake of efficiency) is called denormalization. unnormalizedtables

  9. Database Principles BCNF 1NF 3NF 2NF ... 4NF Basic Idea: • Start with a table in one of the outer rings of the diagram and decompose it into two tables that belong to an inner ring. • NOTE: The closer to the center the “better” the tables.

  10. Database Principles Decomposing Tables: • Decomposing a table means breaking it up into two or more tables using the relational algebra project operator. R A B C D E S T A B C C D E S = πA,B,C ( R )‏ T = πC,D,E ( R )‏ Exercise: Write these relational algebra operations in SQL.

  11. Database Principles Beware: Not all Table Decompositions are Safe: • Consider the table • Suppose we decompose the table into two tables Student SID SName DOB Major s1 John 1982 Hist s2 Mary 1983 Eng s3 Alfredo 1975 Math s4 Tracey 1982 CS T = πdob,major ( Student )‏ S = πsid,sname,dob ( Student )‏ SID SName DOB DOB Major s1 John 1982 s2 Mary 1983 s3 Alfredo 1975 s4 Tracey 1982 1982 Hist 1983 Eng 1975 Math 1982 CS

  12. Database Principles Question: • Do S and T contain the same information as Student? • Examples:So you can see that S and T do NOT contain the same info as Student. • The easy test is to try to recreate Student from S and T using JOIN. Math What is Alfredo’s major? What year was Mary born in? 1983 History or Computer Science What is John’s major?

  13. Database Principles Answer: • Join(S,T) • We notice that the join table contains noise rows. • This means we can not reconstruct Student exactly so we have lost info using this decomposition. • It is called a lossy-join decomposition but we might call it a noisy-join decomposition. SID SName DOB Major s1 John 1982 Hist s1 John 1982 CS s2 Mary 1983 Eng s3 Alfredo 1975 Math s4 Tracey 1982 Hist s4 Tracey 1982 CS

  14. Database Principles Lossless-Join Decomposition • A decomposition of a table R into two tables, S and T, is called a lossless-join decomposition if R = join(S,T), with no new rows and no rows lost. • We might call this a noiseless-join decomposition. • If a decomposition of R into two tables, S and T, is a lossless-join decomposition then no information found in the original table R is lost.

  15. Database Principles Main Theorem: • If R is a table and we decompose R into two tables, S and T, such that • union(S ,T)= R and intersect( S , T ) is a key of either S or T then this is a lossless-join decomposition of R and R = join(S,T). NOTE: What makes our original decomposition a lossy-join decomposition is that {DOB} = intersect( S , T ) is neither a key to S nor to T. ∩

  16. Database Principles Theorem Proof: • If R is a table and we decompose R into two tables, S and T, such that • union(S ,T) = R and • intersect( S , T ) is a key of T. Let |R| denote the number of rows in R. |S| <= |R| and |T| <= |R| since both S and T are produced by projecting over R. R subset_of join(S,T) so |R| <= |join(S,T)| Since intersect( S , T ) is a key to T, each value of intersect( S , T ) appears once and only once in T. Thus each row in S will join to only one row of T and so |join(S,T)| <= |S|. Since |S| <= |R| we conclude |join(S,T)| == |R| and so the two tables are equal.

  17. Database Principles Lossless-join Example: • Consider the table • Suppose we decompose the table into two tables Student SID SName DOB Major s1 John 1982 Hist s2 Mary 1983 Eng s3 Alfredo 1975 Math s4 Tracey 1982 CS T = πsid,dob,major ( Student ) S = πsid,sname,dob ( Student ) SID SName DOB SID DOB Major s1 John 1982 s2 Mary 1983 s3 Alfredo 1975 s4 Tracey 1982 s1 1982 Hist s2 1983 Eng s3 1975 Math s4 1982 CS

  18. Database Principles Question: • Do S and T contain the same information as Student? • Examples:So you can see that S and T do contain the same info as Student. • Try to recreate Student from S and T using JOIN. Math What is Alfredo’s major? What year was Mary born in? 1983 History What is John’s major?

  19. Database Principles Answer: • Join(S,T) • We notice that the join table contain no noise rows. • This means we can reconstruct Student exactly so we have not lost info using this decomposition. • For that reason it is called a lossless-join decomposition but we might call it a noiseless-join decomposition. SID SName DOB Major s1 John 1982 Hist s2 Mary 1983 Eng s3 Alfredo 1975 Math s4 Tracey 1982 CS

  20. Database Principles We Use Lossless-join Decompositions. • In all of the work that follows, the only decompositions we will be applying are lossless-join decompositions.

  21. Database Principles Normalization (1NF): • First Normal Form: A table is in First Normal Form (1NF) if all of its values are atomic. Enrollment SID CrsID NOT 1NF s1 c1,c2,c3 Enrollment SID CrsID 1NF s1 c1 s1 c2 s1 c3 NOTE: All our tables will be assumed 1NF

  22. Database Principles Normalization (2NF): • Second Normal Form: A table is in Second Normal Form (2NF) if it is in 1NF and there are no bad FDs of the form: • Note 1: This situation can not occur if all CKs are singletons. In that case all prime attributes are also keys. • Note 2: Suppose the CKs for a table are {A,E} and {B, E} and suppose A, B --> C. This is not a 2NF violation since {A,B} is not a subset of a CK. We will deal with this situation alter. prime (proper subset of candidate key) -> non-prime

  23. Database Principles ISBN author title pub_name pub_date c_price a_addr 2NF Violation Example: • Suppose we have an alternative BOOK table whose CKs are {ISBN} and {author,title}. • Explanation: The prime attributes are {ISBN, author, title} and a_addr is non-prime so the FD violates the 2NF rule. Book1 CKs: {ISBN}, {author,title} FD: author --> a_addr -- 2NF violation

  24. Database Principles author a_addr ISBN author title pub_name pub_date c_price 2NF Violation Resolution • To remove an FD that is a 2NF violation from a table perform the following decomposition: • create a new table containing all the attributes of the bad FD • remove the attributes from the right-hand-side of the bad FD from the original table Author CK: {author} FD: author --> a_addr -- good Book CK: {ISBN} FDs: all good

  25. Database Principles intersect(Book , Author ) = {author} Did We Lose Information? • {author} is the intersection of the two table schemas and is the key to one of the tables (Author) so this is a lossless-join decomposition. • In other words: join(Book, Author) = Book1

  26. 2NF Exercise: • Consider the table below with the following FDs and whose CKs are {A,F}, {B,F} and {C,F}. • Step 1: Identify the prime and non-prime attributes • Step 2: Identify the 2NF violating FDs. B, E -> C A, F -> B C -> A, D B -> E prime: A, B, C, F non-prime: D, E C --> D and B --> E Database Principles

  27. 2NF Exercise: • Step 3: Decompose the original table R, removing the bad FDsdecomposes to S R’ T C D A B C F B E CKs: {A, F}, {B, F}, {C,F} FDs: B -> C -- bad A, F -> B -- good C -> A -- bad PK: C FD: C -> D -- good PK: B FD: B -> E -- good Database Principles

  28. Database Principles Exercise: • In the previous example we combined the two FDsas • Prove this is so. B, E -> C -- FD 1 B -> E -- FD 2 B -> C (i) B -> B, E -- FD1, ident(B), aug+ (ii) B -> C -- (i), FD1, trans

  29. Normalization (3NF): • Third Normal Form: A table is in Third Normal Form (3NF) if it is in 2NF and there are no bad FDs of the form: non-prime -> non-prime Database Principles

  30. 3NF Violation Example: • Suppose we have the original Cardholder table created from the ER Diagram. prime: borroweridnon-prime: b_name, b_addr, b_status, loan_limit borrowerid -> b_name -- goodborrowerid -> b_addr -- good borrowerid -> b_status -- good borrowerid -> loan_limit -- good b_status -> loan_limit -- 3NF violation Database Principles

  31. Database Principles 3NF Violation Resolution • To remove an FD that is a 3NF violation from a table perform the following decomposition: • create a new table containing all the attributes of the bad FD • remove the attributes from the right-hand-side of the bad FD from the original table Status CK: {b_status} FD: b_status --> loan_limit -- good b_status loan_limit Cardholder CK: {borrowerid} FDs: all good borrowerid b_name b_addr b_status

  32. 3NF Exercise: • The following table has only one CK and that is {B}. Decompose the table. • Step 1: List the prime and non-prime attributes Step 2: Identify the 3NF violating FDs. bad D -> C, A B -> F F -> C, E B -> D good bad good prime: B non-prime: A, C, D, E, F D -> C, A and F -> C, E Exercise: Prove that a table with only singleton CKs has no 2NF violations. Database Principles

  33. 3NF Exercise: • Step 3: Decompose the original table R, removing the bad FDsdecomposes to D -> C, A B -> F F -> C, E B -> D S R’ T B D F F E D C A PK: B FDs: B -> D,F -- good PK: D FD: D -> C,A -- good PK: F FD: F -> E -- good NOTE: The column C is missing from the table T because we removed it from R when we created S so it was notavailable when we created T. Database Principles

  34. Database Principles Boyce-Codd Normal Form (BCNF): • Boyce-Codd Normal Form: A table is in Boyce-Codd Normal Form (BCNF) if it is in 3NF and there are no remaining bad FDs of any kind. • Any remaining bad FDs must be of one of the following forms: prime(not a key) -> prime prime(not a subset of a candidate key) -> non- prime prime(not subset of a candidate key), non-prime -> prime prime(not subset of a candidate key), non-prime ->non-prime Exercise: Prove non-prime --> prime is impossible.

  35. Database Principles ShiftAssignment Officer VIN Shift Date R_Level Weather o v s d r w a BCNF Violation Example: • The table below describes the assignment of a police officer (o) to drive a certain vehicle (v) during a certain shift (s) on a certain day (d). The assignment is given a certain readiness level (r ) that depends on the skills of the officer and the equipment in the vehicle. The weather (w) is a function of the date and the shift. • The only CK is {VIN,Shift,Date}.

  36. Database Principles ShiftAssignment Officer VIN Shift Date R_Level Weather o v s d r w a BCNF Violation Example: Prime: VIN, Shift, Date Non-prime: Officer, R_Level, Weather FDs: VIN Shift, Date -> Officer -- good VIN, Officer -> R_Level -- BCNF violation Shift, Date -> Weather -- BCNF violation

  37. Database Principles BCNF Violation Resolution • To remove an FD that is a BCNF violation from a table perform the following decomposition: • create a new table containing all the attributes of the bad FD Readiness CK: {VIN, Officer} FD: VIN, Officer --> R_Level -- good VIN Officer R_Level Weather CK: {Shift, Date} FD: Shift, Date --> Weather -- good Shift Date Weather

  38. Database Principles pk BCNF Violation Resolution • To remove an FD that is a BCNF violation from a table perform the following decomposition: • remove the attributes from the right-hand-side of the bad FDs from the original table ShiftAssignment PK: {VIN, Shift, Date} FD: VIN, Shift, Date --> Officer -- good Officer VIN Shift Date

  39. Database Principles R A B C D E Exercise: FDs: A -> D D -> B A, C -> E E -> A CKs: (A, C} {E, C} Prime: A, E, C Non-prime: B, D Violations: -- 2NF viol A -> D D -> B A, C -> E E -> A -- 3NF viol -- good -- BCNF viol

  40. Database Principles R’ A B C E R’ A B C E Exercise: • Remove 2NF violations • Remove 3NF violations: S A D FD: A -> D -- good pk A, C -> E -- good E -> A -- BCNF viol D -> B seems to have disappeared?? But did it? A -> D together with D -> B imply A -> B This new FD is another 2NF violation

  41. Database Principles R’ S A D B A B C E pk Exercise: • Remove 2NF violations (second try) • Remove 3NF violations: FD: A -> D,B -- good D -> B -- 3NF viol R” S S D B A D A C E pk pk D -> B -- good A -> D -- good A, C -> E -- good E -> A -- BCNF viol

  42. Database Principles R’” C E U A E Exercise: • Remove BCNF violation S S D B A D pk pk pk D -> B -- good A -> D -- good NOTE: The FD A, C -> E seems to have disappeared but performing a join of R’” and U will reproduce R” R” = join(R’”,U) and so recover the original FD (and its associated Enterprise Rule). pk E -> A -- good

  43. Database Principles E1 E2 E3 Fourth Normal Form (4NF): • By this time there are no more bad FDs. • This makes the tables pretty good but sometimes it is still possible to improve them. • 4NF violations can come about from Enterprise Rules that can not be expressed as an FD. • 4NF violations typically come about when the two rela- tionships, r1and r2, described below are converted into a single table. r1 r2 e1 e2 e3 (1,n) (1,n) (1,n) (1,n)

  44. Database Principles car_type maker_name country_name sells sells_in car_type maker_name maker_name country_name 4NF Violation Example: • From the ERD • We produce the table • Instead of CAR type MAKER name COUNTRY name sold_in sells (1,n) (1,n) (1,n) (1,n) sells_sold_in

  45. Database Principles 4NF violation example What rows would the table sells_sold_in contain? One possibility is sells_sold_in = join(sells,sold_in) This will mean that in sells_sold_in every car_type that a particular car maker produces will be matched with every country that the same car maker sells cars in. If this is the case, then the sells_sold_in table contains a 4NF violation.

  46. Database Principles car_type maker_name country_name c1 m1 ctry1 c1 m1 crty2 c2 m1 crty1 c2 m2 crty2 How to Recognize a 4NF Violation: • Suppose we have a car maker that sells two models in two countries. • In the single sells_sold_in table this would appear as sells sells_in car_type maker_name maker_name country_name c1 m1 c2 m1 m1 crty1 m1 crty2 sells_sold_in

  47. Database Principles make_sells_in car_type maker_name country_name c1 m1 ctry1 c1 m1 crty2 c2 m1 crty1 c2 m2 crty2 How to Recognize a 4NF Violation: • Within this table there may be a hidden Enterprise Rule Enterprise Rule: If a car maker sells one car type in a country then it sells all its car types in the same country. Exercise: Verify that the insert, update and delete anomalies exist in the table above and they do not exist in the tables below sells sells_in car_type maker_name maker_name country_name c1 m1 c2 m1 m1 crty1 m1 crty2

  48. Database Principles How to Recognize a 4NF Violation: A table is a 4NF violation only if the permanent Enterprise Rule exists and not whether or not, on a particular day, it happens to contain all the rows of a join of two other tables. The sells_sold_in table has a 4NF violation only if every time you might ever construct this join. sells_sold_in = join(sells, sold_in) NOTE: intersect(sells, sold_in ) is not a key in either sells or sold_in so as a decomposition, this is not guaranteed to be a lossless-join decomposition. It is lossless-join if sells_sold_in in a 4NF violation however. prove this.

  49. Database Principles sells maker_sells_in car_type maker_name car_type maker_name country_name c1 m1 c2 m1 c1 m1 ctry1 c1 m1 crty2 c2 m1 crty1 c2 m2 crty2 sells_in maker_name country_name m1 crty1 m1 crty2 How to Recognize a 4NF Violation looking at Rows: • sells_sold_in is a 4NF violation if join(sells, sells_in) is always equal to sells_sold_in join(sells, sells_in) = maker_sells_in = join

  50. Database Principles 4NF Violation Resolution Decompose the table with a 4NF violation into two tables. Since the 4NF-violation table is always equal to the join the two decomposition tables this decomposition will be a lossless-join decomposition.

More Related