600 likes | 815 Views
CS157A. Lecture 14. Keys and Functional Dependency. Prof. Sin-Min Lee Department of Computer Science San Jose State University. Data Normalization. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data.
E N D
CS157A Lecture 14 Keys and Functional Dependency Prof. Sin-Min Lee Department of Computer Science San Jose State University
Data Normalization • Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data. • The process of decomposing relations with anomalies to produce smaller, well-structured relations. • Primary Objective: Reduce Redundancy,Reduce nulls, • Improve “modify” activities: • insert, • update, • delete, • but not read • Price: degraded query, display, reporting
Functional Dependency and Keys • Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute. • Candidate Key: Each non-key field is functionally dependent on every candidate key.
Functional dependency • a constraint between two attributes (columns) or two sets of columns • A B if “for every valid instance of A, that value of A uniquely determines the value of B” • or …A B if “there exists at most one value of B for every value of A”
Functional Dependencies R X Y Z • FDs defined over two sets of attributes: X, Y Ì R • Notation: X à Y reads as “X determines Y” • If X à Y, then all tuples that agree on X must also agree on Y 1 2 3 2 4 5 1 2 4 1 2 7 2 4 8 3 7 9
Functional Dependencies (example) X Y Z X Y Z 1 2 3 2 4 5 1 2 4 1 2 7 2 4 8 3 7 9
… functional dependency • some examples • SSN Name, Address, Birthdate • VIN Make, Model, Color • note: the LHS is the determinant • so functional dependency is the technical term for determines
Candidate Keys • an attribute (or set of attributes) that uniquely identifies a row • primary key is a special candidate key • values cannot be null • e.g. • ENROLL (Student_ID, Name, Address, …) • PK = Student_ID • candidate key = Name, Address
… candidate key • a candidate key must satisfy: • unique identification. • implies that each nonkey attribute is functionally dependent on the key (for not(A B) to be true, A must occur more than once (with a different B), or A must map to more than one B in a given row) • nonredundancy • no attribute in the key can be deleted and still be unique • minimal set of columns (Simsion)
keys and dependencies EMPLOYEE1 (Emp_ID, Name, Dept_Name, Salary) determinant functional dependency
EMPLOYEE2 (Emp_ID, Course_Title, Name, Dept_Name, Salary, Date_Completed) not fully functionally dependant on the primary key
determinants & candidate keys • candidate key is always a determinant (one way to find a determinant) • determinant may or may not be a candidate key • candidate key is a determinant that uniquely identifies the remaining (nonkey) attributes • determinant may be • a candidate key • part of a composite candidate key • nonkey attribute
Introduction • Data integrity maintained by various constraints on data • Functional dependencies are application constraints that help DB model real-world entity • Join dependencies are a further constraint that help resolve some FD constraint limitations
Normal Forms provide database designers with: • A formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes. • A series of tests that can be carried out on individual relation schemas so that the relational database can be normalized to any degree.
Keys • superkey:a superkey is a set of attributes S R={A1,A2,….An} with the property that no two tuples t1 and t2 in any relation state r of R will have t1[S] = t2[S]. • A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey anymore.
Keys • The difference between a key and a superkey is that a key has to be “minimal”. • Example: • {SSN} is a key for EMPLOYEE, whereas {SSN}, {SSN,ENAME}, {SSN, ENAME, BDATE} are all superkeys.
Keys • If a relation schema has more than one “minimal” key, each is called a candidate key.
Keys • one of the candidate keys is designated to be the primary key. • Each relation schema must have a primary key. • For example, {SSN} is the only candidate key for EMPLOYEE, so it is also the primary key.
R(A B C D E) • FD1. A -> C • FD2. BC ->D • FD3. E ->AB • result = A • By FD1. A -> C A result • result = {A, C} By FD2. BC -> D BC result result = {A, C} By FD3. E ->AB E result result = {A, C} {A}+ = {A, C}
Similarly {B}+ = {B} • {C}+ = {C} • {D}+ = {D} • {E}+ = {E,A,B,C,D} • E is a candidate key Now, we see {AB}+ = {ABCD} {AC}+ = {AC} {AD}+ = {ACD} {BC}+ = {BCD} {BD}+ = {BD} {CD}+ = {CD} {ABC}+ = {ABCD} {ABD}+ = {ABCD} {BCD}+ = {BCD} {ACD}+ = {ACD}
What is the largest normal form of this table? R(A B C D E) FD1. A ->C FD2. BC ->D FD3. E ->AB Answer: {E} is the only candidate key of R The non-prime attributes are: A, B, C, D As FD!. A->C, we have transitive dependency. Thus R(ABCD) is 2NF but not 3NF
What is Normalization? • The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the enterprise. By following the principles of normalization, we can achieve a design that is highly flexible, allowing the model to be extended when needed to account for new attributes, entity sets, and relationships.
Normal Forms • A relation is in specific normal form if it satisfies the set of requirements or constraints for that form. All of the normal forms are nested in that each satisfies the constraints of the previous one but is a "better" form because each eliminates flaws found in the previous
1NF • relation is in first normal form if it contains no multivalued attributes • remove repeating groups to a new table as already demonstrated, “carrying” the PK as a FK
First Normal Form ( 1NF ) • the domains of attributes must include only atomic(simple, indivisible) valuesand the value of any attribute in a tuple must be a single value from the domain of the attribute.
First Normal Form ( 1NF ) • example: Department DNAME DNUMBER DMGRSSN DLOCATIONS research 5 333445555 {Bellaire , Sugarland Houston} Administration 4 987654321 {Stafford} Headquarters 1 888665555 {Houston} • the domain of DLOCATIONS contains atomic values, but some tuples can have a set of these values. In this case, • DNUMBER x->DLOCATIONS. • The domain of DLOCATIONS contains sets of values and hence in non-atomic.
Our Example in 1NF PROJ_NUM PROJ_NAME EMP_NUM EMP_NAME JOB_CLASS CHG_HOUR HOURS • Key (PROJ_NUM, EMP_NUM) • Given PROJ_NUM • PROJ_NAME is determined • Given EMP_NUM • EMP_NAME, JOB_CLASS, and CHG_HOUR are determined
2NF • a relation is in second normal form if it is in first normal form AND every nonkey attribute is fully functionally dependant on the primary key • i.e. remove partial functional dependencies, so no nonkey attribute depends on just part of the key
EMPLOYEE2 (Emp_ID, Course_Title, Name, Dept_Name, Salary, Date_Completed) not fully functionally dependant on the primary key
Second Normal Form ( 2NF ) • it is based on the concept of full functional dependency. • A functional dependency XY is a full functional dependency , for any attribute A X, {X - {A}} Y.
Second Normal Form • A relation is in second normal form (2NF) if and only if it is in first normal form and all the nonkey attributes are fully functionally dependent on the key.
Second Normal Form • A table is in second normal form (2NF) if: • It is in 1NF • It includes no partial dependencies. No attribute is dependent on only a portion of the primary key.
2NF • a relation is in 2NF if it is in 1NF and any one of these is true: • the PK consists of only 1 attribute • all attributes are part of the PK (no nonkey attributes) • every nonkey attribute is functionally dependant on the whole PK
2NF (Example) A B C D 2 Candidate Keys R with key{AB} is NOT 2NF R with key{AC} is NOT 2NF
Second Normal Form ( 2NF ) fd1 fd2 • {SSN, PNUMBER}HOURS is a fully dependency (neither SSNHOURS nor PNUMBERHOURS holds). fd3
Second Normal Form ( 2NF ) EMP_PROJ • The functional dependencies fd1,fd2,fd3 lead to the decomposition of EMP_PROJ into the three relation schemas EP1,EP2,EP3, each of which is in 2NF. fd1 fd2 fd3 2NF NORMALIZATION EP2 EP3 EP1 fd2 fd1 fd3
1NF 2NF • EMPLOYEE2 (Emp_ID, Course_Title, Name, Dept_Name, Salary, Date_Completed) • • EMPLOYEE1 (Emp_ID, Name, Dept_Name, Salary) • and • EMP_COURSE (Emp_ID, Course_Title, Date_Completed) • EMPLOYEE1 satisfies condition1 • EMP_COURSE satisfies condition3
3NF • a relation is in third normal form if it is in 2NF, AND no transitive dependencies exist • transitive dependency is a functional dependency between nonkey attributes
transitive dependency transitive dependency
… transitive dependency • same problems • insertion anomaly (no salesman without a customer) • deletion anomaly (if a salesman is assigned to only 1 customer, and the customer is deleted, we lose the salesman!) • modification (update) anomaly (reassign salesperson to region)