330 likes | 791 Views
Data Analysis. Improving Database Design. Normalization. The process of transforming a data model into a flexible, stable structure. Reduces anomalies Anomaly – An unintended negative consequence of changing the contents of the data. Anomalies. SID. Activity. Fee. Functional Dependency.
E N D
Data Analysis Improving Database Design
Normalization • The process of transforming a data model into a flexible, stable structure. • Reduces anomalies • Anomaly – An unintended negative consequence of changing the contents of the data.
Anomalies SID Activity Fee
Functional Dependency • A relationship between attributes such that if the value of one attribute is known, the value of another attribute can be determined. In a database including social security numbers and names, given the value of SSN the value of Name can be determined. SSN Name SSN functional determines Name Name is functionally dependent on SSN SSN is the determinant of Name
Functional Dependency (cont.) • May exist among groups of attributes. SID Class Grade Functional Dependency: (SID, Class) Grade
Keys • A key is an attribute or group of attributes that uniquely identifies a row in a table. • If a key is a group of attributes, it is called a composite key. • A key functionally determines the entire row. • Often called the primary key.
Uniqueness • Keys must be unique in a table. • Determinants may or may not be unique in a table. SID Dorm Rent
Normal Forms • Classifications of tables based on the types of anomalies to which they are vulnerable. • There are currently 7 normal forms (1NF, 2NF, 3NF, etc.) • Each normal form eliminates a particular type of anomaly. • Normal forms are cumulative.
First Normal Form (1NF) A table is in 1NF if: • Every cell contains a single value (no repeating groups or arrays) • Each column has a unique name • All values in a column are of the same kind • The order of the columns is insignificant • Every row is unique • The order of the rows is insignificant
1NF Example EMPLOYEE Emp_ID Emp_FName Emp_LName Emp_Phone Emp_DepName Employee has multiple phone numbers. Employee has multiple Dependents.
1NF Example Solution EMPLOYEE DEPENDENT Emp_ID Emp_FName Emp_LName Emp_OfficePhone Emp_HomePhone Emp_CellPhone Emp_ID Emp_DepName
Second Normal Form (2NF) A table is in 2NF if it is in 1NF and it has no partial dependencies. • 2NF is only a concern if a table has a composite key. • A partial dependency is when a non-key attribute is functionally dependent on only part of a composite key.
2NF Example SID Activity Fee Expertise Recreation(SID, Activity, Fee, Expertise) Key: (SID, Activity) Functional Dependency: Activity Fee
2NF Example solution SID Activity Expertise Activity Fee
Third Normal Form (3NF) • A table is in 3NF if it is in 2NF and has no transitive dependencies. • A transitive dependency is when one non-key attribute determines another non-key attribute.
3NF Example SID Dorm Rent Housing(SID, Dorm, Rent) Key: SID Functional Dependency: Dorm Rent
3NF Example solution Dorm Rent SID Dorm
Boyce-Codd Normal Form (BCNF) • Special form of 3NF • A table is in BCNF if it is in 3NF and every determinant is a candidate key. • Arises when a non-key attribute determines part of a composite key.
BCNF Example A student can have many majors. A student has a different advisor for each major. Each advisor advises for only one major.
BCNF Example Advising (SID, Major, Advisor) Candidate Keys: (SID, Major) or (SID, Advisor) Functional Dependency: Advisor Major
Fourth Normal Form (4NF) • A table is in 4NF if it is in BCNF and has no multivalued dependencies (MVDs). • A multivalued dependency exists when one attribute determines multiple values for two or more other attributes that are independent of each other.
Fifth Normal Form (5NF) • A table is in 5NF if it is already in 4NF and cannot have any lossless decompositions. • The table cannot be represented by a set of smaller tables that can reconstruct the original table. • Also called Projection-Join Normal Form (PJNF) • Defines a point where a table cannot be decomposed further.
Domain Key Normal Form (DKNF) • Theoretical structure that is free of all anomalies. • “Every constraint on the database is a logical consequence of the definition of keys and domains.”
Denormalization • Tables may be denormalized to improve performance. • Normalization increases the number of tables and relationships • Accessing multiple tables across relationships requires more processing than accessing a single table
Normalized Model • Evaluate the attributes of the tables to ensure compliance with normalization rules. • Create new tables as needed. • Place foreign keys for new tables.