210 likes | 388 Views
Schema Refinement and Normal Forms. Normalization. It is a process that we can use to remove design flaws from a database A number of normal forms, which are sets of rules describing what we should and should not do in our table structure
E N D
Schema Refinement and Normal Forms CS3754 Class Notes #7, John Shieh
Normalization • It is a process that we can use to remove design flaws from a database • A number of normal forms, which are sets of rules describing what we should and should not do in our table structure • 3NF is sufficient to avoid the data redundancy problem of a designed relational database CS4753/2006F
Problems caused by redundancy • Redundant Storage • Some information is stored repeatedly. • Update Anomalies • If one copy of such repeated data is updated, an inconsistency is created, unless all copies are similarly updated. • Insertion anomalies • It may not be possible to store certain information unless some other, unrelated, information is stored. • Deletion Anomalies • It may not be possible to delete certain information without losing some other, unrelated, information. CS3754 Class Notes #7, John Shieh
Redundant Storage • The hourly wages depend on rating levels. So, for example, hourly wage 10 for rating level 8 is repeated three times. • Update Anomalies • The hourly_wages in the first tuple could be updated without making a similar change in the second tuple. CS3754 Class Notes #7, John Shieh
Insertion Anomalies • We cannot insert a tuple for an employee unless we know the hourly wage for the employee’s rating value. • Deletion Anomalies • If we delete all tuples with a given rating value (e.g. tuples of Smethurst and Guldu) we lose the association between the rating value and its hourly_wage value. CS3754 Class Notes #7, John Shieh
Decompositions • Intuitively, redundancy arise when a relational schema forces an association between attributes that is not natural. • Functional dependencies can be used to identify such situations and suggest refinements to the schema. • The essential idea is that many problems arising from redundancy can be addressed by replacing a relation with a collection of ‘smaller’ relation. CS3754 Class Notes #7, John Shieh
Functional dependency: - rating determines Hourly_wages A decomposition of a relation schema R consists of replacing the relation schema by two (or more) relation schemas each of which contains a subset of attributes of R and together include all attributes in R CS3754 Class Notes #7, John Shieh
Functional Dependencies • A functional dependency (FD) is a kind of IC that generalizes the concept of a key. • Let R be a relation schema, and X and Y be sets of nonempty sets of attributes in R. • An FD X Y exists, if in every relation instance for R, any two tuples that agree on the value of X also agree on the value of Y. • More formally • Let R be a relation schema and let X and Y be nonempty sets of attributes in R. An FD X Y exists in R if every instance of R preserves the FD X Y. • We say that an instance r of Rpreserves the FD X Y if the following holds for every pair of tuples t1 and t2 in r If t1.X = t2.X, then t1.Y = t2.Y The notation t1.X refers to the subset of fields of tuple t1 for the attributes in X CS3754 Class Notes #7, John Shieh
Take Examples: course_ID course_name is preserved? {student_ID, course_ID} course_name is preserved ? if no two rows agree on value, then is trivially preserved. yes yes CS3754 Class Notes #7, John Shieh
The table instance also preserves the following • student_ID student_name • Student_ID, course_ID {student_name, course_name} • student_ID, course_ID • {student_ID, student_name, course_ID, course_Name} • student_name student_name (a trivial dependency) • student_name, course_name student_name (also trivial) • many more …. CS3754 Class Notes #7, John Shieh
How do we know if a FD exist in R? • Can we check all instances of R to see if the FD is preserved? • Definitely, not possible! • Whether or not a functional dependency exists must be determined by assumptions given in advance, or common sense, not by individual relation instances. • Given an instance r of R, we can check if r preserves some • functional dependency f, but we cannot tell if f holds over R. • course_ID student_name ? • Although it is preserved by this table, it does not fit the assumption. no CS3754 Class Notes #7, John Shieh
The assumptions given in advance, or common sense, impose some constraints, and are called the semantics of a database • Assumptions given in advance impose explicit constraints; common sense imposes implicit constraints CS3754 Class Notes #7, John Shieh
Example: • Application is to keep track of information about • employees in a company. • Information to be kept track of includes: • eid: employee’s id number • ename: employee name • address: address of the employee • sex: employee’s sex • dname: name of the department that the employee works for • dhname: department head’s name • dhsex: department head’s sex CS3754 Class Notes #7, John Shieh
Let’s construct a relation schema as follows: • Which of the following dependencies are true? • eid ename • ename eid • eid address • eid sex • sex address • dhname dname • dhname eid • dhsex sex Employee eid ename address sex dname dhname dhsex Assumptions: a:Employee’s id number is unique b:Each employee has a unique address c:Each employee works for only one dept. d:A person can be the head of at most one department e:All department heads have different names Implicit: common sense CS3754 Class Notes #7, John Shieh
is a superkey for relation schema R iff attri(R) • where attri(R) denotes the set of all the attributes in schema R • is a candidate key (or simply, key) for R iff • attri(R), and • is minimal, i.e., for any , attri(R) • In other words,a candidate key is a minimal superkey (student_ID, course_ID) is a candidate key (and the only one) (student_ID, course_ID, course_name) is a superkey, but not a candidate key (student_ID, course_ID, student_name) is another non-candidate superkey (student_ID, course_ID, course_name, student_name) is also a non-candidate superkey CS3754 Class Notes #7, John Shieh
Normal Forms CS3754 Class Notes #7, John Shieh
Normal Form (NF) • 1NF: each attribute or column value must be atomic • 2NF: if a schema is 1NF, and if its all attributes that are not part of the primary key are fully functionally dependent on the primary key • 3NF: if a schema is 2NF, and all transitive dependencies have been removed Ex: employeeDept(employeeID, name, job, deptID, deptName) has to convert to employee(employeeID, name, job, deptID) Dept(deptID, deptName) CS4753/2006F
2NF • It means that each non-key attribute must be functionally dependent on all parts of the primary key (i.e., the combination of the composite attributes of the key). • Example: not 2NF Employee(employeeID, name, job, departmentID, skill) employeeID, skill name, job, departmentID employeeID name, job, departmentID (Note: determine) • Break the table into two tables to become 2NF Employee(employeeID, name, job, departmentID) employeeSkills(employeeID, skill) CS4753/2006F
3NF • Example: 2NF but not 3NF Employee(employeeID, name, job, departmentID, departmentName) Here employeeID departmentID employeeID departmentName Also departmentID departmentName, departmentID is not a key Therefore, employeeID departmentName is a transitive dependency • Convert the schema to 3NF by breaking to two tables: Employee(employeeID, name, job, departmentID) Department(departmentID, departmentName) CS4753/2006F
Normal Forms Defined Informally • 1st normal form • All attributes depend on the key • 2nd normal form • All attributes depend on the whole key • 3rd normal form • All attributes depend on nothing but the key CS4753/2006F
SUMMARY OF NORMAL FORMS based on Primary Keys CS4753/2006F