250 likes | 602 Views
Normalisation. Relation A B C D E F. Relation1 A B. 1NF?. 2NF?. Relation2 A * C D E*. 3NF?. Help me Codd!!. Relation3 E F. Reading: Connolly and Begg 13 & 14 (4th ed), . Normalisation. From this…. …to this. In 3+ easy(?) steps. What is normalisation?.
E N D
Normalisation Relation A B C D E F Relation1 A B 1NF? 2NF? Relation2 A* C D E* 3NF? Help me Codd!! Relation3 E F Reading: Connolly and Begg 13 & 14 (4th ed),
Normalisation From this… …to this In 3+ easy(?) steps
What is normalisation? • A method for database design • Theory examines how “good” is a schema? • Transform non-normalised schemas • Minimise storage • Takes a set of attributes and derives the relational model • By separating out the required tables • Completely different approach to ERM • But should get the same result • A minimum of 3 steps are used: • For each stage, the normal form gets stronger (i.e. removes redundancy) so less open to update anomalies • All based on functional dependencies
Functional Dependency • Underpins normalisation process • If every value of column A uniquely determines the value in column B, then • B is functionally dependent on A (B depends on A) • A determines B, or, formally, A B (A is called the determinant) • For example, • EmpID Age, Dept (AB,C) Employee ID, Project Role (X, Y Z) • Note multiple attributes are often involved EmpIDProject Age Dept Dsize Budget Role
Rules for functional dependency • A B does NOT automatically mean B A • E.g. student ID name but not name ID • Transitive dependency:If AB and BC then AC • Many other rules • E.g. if X,YZ but XZ also • In this case Zis partially dependenton X,Y • “Transitive” and “partial” dependency are two key concepts of the normalisation process
A Question for you! EmpIDProject Age Dept Dsize Budget Role A B C D EmpIDProject Age Dept Dsize Budget Role E1 P2 33 D2 10 100 Analyst E1 P1 33 D2 10 200 Prog. E2 P1 34 D5 10 200 Prog. E2 P2 34 D5 20 100 Analyst Which functional dependency is violated by the data?
Unnormalised Form Relation contains: non-atomic attribute values ID Employee Salary Project 1 Grey 31000 A 2 Brown 35000 B,C 3 White 55000 A,B,C 4 Black 47000 A,C Violation of 1NF non-atomic values
redundancy Repeating First Normal Form ID Employee Salary Project Budget 1 Grey 31000 A 10 2 Brown 35000 B 5 2 Brown 35000 C 5 3 White 55000 A 5 3 White 55000 B 5 3 White 55000 C 5 4 Black 47000 A 10 4 Black 47000 C 5 Permits only single (atomic) attribute values Remove Repeating Group along with primary key from other Table ID (fk) Project Budget 1 A 10 2 B 5 2 C 5 3 A 5 3 B 5 3 C 5 4 A 10 4 C 5 ID Employee Salary 1 Grey 31000 2 Brown 35000 3 White 55000 4 Black 47000
Second Normal Form • Full Functional Dependency (FFD) • X Y is FFD • if removal of any attribute from X removes the dependency • X Y is partially dependent • if removal of attribute from X leaves the dependency intact • 2NF test • involves testing for partial dependency on the PK (therefore PK MUST be composite to test for 2NF) • Relation R is in 2NF if: • every non-primary-key attribute in R is FFD on the primary key of R
EmpIDProject Age Dept Dsize Budget Role • So which FD’s are violating 2NF? • “Second Normalised” by: • removing non-primary-key attributes and forming a FFD on appropriate part of primary key {EmpID ,Age, Dept , Dsize} {EmpID*, Project*, Role} {Project , Budget} 2NF
Third Normal Form • Remove Transitive Dependency • Conditions • A non-primary-key attribute Z is transitively dependent on primary key X if: • X Y; Y Z (Y attribute provides the transition to the PK) [EmpID* Project* Role] A [EmpID Age Dept Dsize] B [Project Budget] C D None of the above Which of the above could have transitive dependency?
Here is an un-normalised Table Ord# Date Cust# Name Prod# Desc Qty Supplier Tel 1 12/1/01 1 Jones 1 Disk 3 X 101 1 12/1/01 1 Jones 2 CD 5 Y 223 2 13/1/01 2 Black 1 Disk 1 X 101 2 13/1/01 2 Black 2 CD 1 Y 223 2 13/1/01 2 Black 3 Mouse 1 X 101 3 13/1/01 1 Jones 3 Mouse 1 X 101
Normalise it to 1NF Ord# Date Cust# Name Prod# Desc Qty Supplier Tel 1 12/1/01 1 Jones 1 Disk 3 X 101 1 12/1/01 1 Jones 2 CD 5 Y 223 2 13/1/01 2 Black 1 Disk 1 X 101 2 13/1/01 2 Black 2 CD 1 Y 223 2 13/1/01 2 Black 3 Mouse 1 X 101 3 13/1/01 1 Jones 3 Mouse 1 X 101 fk Ord#Prod# Desc Qty Supplier Tel 1 1 Disk 3 X 101 1 2 CD 5 Y 223 2 1 Disk 1 X 101 2 2 CD 1 Y 223 2 3 Mouse 1 X 101 3 3 Mouse 1 X 101 Ord# Date Cust# Name 1 12/1/01 1 Jones 2 13/1/01 2 Black 3 13/1/01 1 Jones
Ord# Date Cust# Name 1 12/1/01 1 Jones 2 13/1/01 2 Black 3 13/1/01 1 Jones Ord#Prod# Desc Qty Supplier Tel 1 1 Disk 3 X 101 1 2 CD 5 Y 223 2 1 Disk 1 X 101 2 2 CD 1 Y 223 2 3 Mouse 1 X 101 3 3 Mouse 1 X 101 Already in 2NF Prod# Desc Supplier Tel 1 Disk X 101 2 CD Y 223 3 Mouse X 101 Ord#Prod# Qty 1 1 3 1 2 5 2 1 1 2 2 1 2 3 1 3 3 1 Now we normalise this to 2NF remembering to test on the PK for any partial dependency fk fk
So, any transitive dependency? Ord#Prod# Qty 1 1 3 1 2 5 2 1 1 2 2 1 2 3 1 3 3 1 Ord# Date Cust# Name 1 12/1/01 1 Jones 2 13/1/01 2 Black 3 13/1/01 1 Jones Prod# Desc Supplier Tel 1 Disk X 101 2 CD Y 223 3 Mouse X 101 fk fk
Yes! But not in all ……………. Prod# Desc Supplier Tel 1 Disk X 101 2 CD Y 223 3 Mouse X 101 Ord#Prod# Qty 1 1 3 1 2 5 2 1 1 2 2 1 2 3 1 3 3 1 Cust# Name 1 Jones 2 Black Supplier Tel X 101 Y 223 Ord# Date Cust# Name 1 12/1/01 1 Jones 2 13/1/01 2 Black 3 13/1/01 1 Jones OK! Prod# Desc Supplier (fk) 1 Disk X 2 CD Y 3 Mouse X Ord# Date Cust# (fk) 1 12/1/01 1 2 13/1/01 2 3 13/1/01 1
Final Decomposition Ord#{fk} Prod#{fk} Qty 1 1 3 1 2 5 2 1 1 2 2 1 2 3 1 3 3 1 Ord# Date Cust# (fk) 1 12/1/01 1 2 13/1/01 2 3 13/1/01 1 Cust# Name 1 Jones 2 Black Prod# Desc Supplier (fk) 1 Disk X 2 CD Y 3 Mouse X Supplier Tel X 101 Y 223 Now in 3NF
The underlying E-R Model ….. Ord# Date Cust# Name Prod# Desc Qty Supplier Tel 1 12/1/01 1 Jones 1 Disk 3 X 101 1 12/1/01 1 Jones 2 CD 5 Y 223 2 13/1/01 2 Black 1 Disk 1 X 101 2 13/1/01 2 Black 2 CD 1 Y 223 2 13/1/01 2 Black 3 Mouse 1 X 101 3 13/1/01 1 Jones 3 Mouse 1 X 101 makes Customer Order 1..1 0..* 0..* has How many tables would you get from mapping? 0..* despatches Product Supplier 1..* 1..1
So Normalisation to 3NF is Normal!! • Remember, 2NF and 3NF disallow partial and transitive dependencies respectively on the PK, otherwise they are open to update anomalies • But ….. even at 3NF, a relation may be open to update anomalies on rare occasions due to redundancy too • So we look briefly at these • Boyce-Codd • 4NF
Boyce-Codd NF • Is a stronger normalised form then 3NF • Definition: A relation is in BCNF, if and only if, every determinant is a candidate key • And remember that a candidate key is any key that could become the PK of the relation (i.e. there may be competition for it!) • Potential to violate BCNF comes from: • A relation containing at least 2 composite candidate keys • Or candidate keys overlapping (i.e. they have at least one attribute in common)
BCNF Example • Consider the candidate keys for: • FD1 {PK}: clientNo, interviewDate interviewTime, staffNo, roomNo • FD2 {CK}: staffNo, interviewDate, interviewTime clientNo • FD3 {CK}: roomNo, interviewDate, interviewTime staffNo, clientNo • FD4: staffNo, interviewDate roomNo PK is primary key and CK is candidate key. But what about FD4? It is not a CK Adapted from Connolly and Begg, 2005, 4th ed. Page 420
So new decomposition? So duplication in the room number is now eradicated
4NF • Comes from 2 multi-valued attributes in a relation • E.g. for each value of A there is a set of values for B and a set for C, while B and C remain independent of each other So if you model your databases from ERM’s this type of dependency should not arise.
Example of 4NF Note: if step 9 applied to multi-valued attributes then we should map this correctly and avoid such redundancy as the two tables on the right would be the result of the mapping! Adapted from Connolly and Begg, 2005, 4th ed. Page 428
Normal Form Summary • A Relation’s degree of normalisation • Stronger in format at each stage • less vulnerable to update anomalies • First Normal Form (1NF) • The relation has no non-atomic values • Or the relation has “no repeating group” • 2nd Normal Form (2NF) • The relation has no partial dependencies • All non-key attributes are fully functionally dependent on the PK • 3rd Normal Form (3NF) • The relation has no transitive dependencies • Boyce-Codd • Every determinant is a candidate key • 4NF – no multi-valued dependencies