540 likes | 697 Views
CSE-503 :: Database Management System. Normalization. 2. National Institute of Technology Agartala Aug-Dec,2010. ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala. Outline. Functional dependencies Normalization. Normalization – use Functional Dependencies.
E N D
CSE-503 :: Database Management System Normalization 2 National Institute of Technology Agartala Aug-Dec,2010 ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
Outline • Functional dependencies • Normalization
Normalization – use Functional Dependencies Product Support Coverage ProductNumber,Date -> Time, EmployeeID, PhoneNumber EmployeeID, Date, Time -> ProductNumber PhoneNumber, Date, Time -> EmployeeID, ProductNumber EmployeeID, Date -> PhoneNumber Date -> PayRate PayRate -> Withholding
ProductNumber Date Time EmployeeID PhoneNumber PayRate Withholding Illustrate FDs Product Coverage Support ProductNumber,Date -> Time, EmployeeID, PhoneNumber EmployeeID, Date, Time -> ProductNumber PhoneNumber, Date, Time -> EmployeeID, ProductNumber EmployeeID, Date -> PhoneNumber Date -> PayRate PayRate -> Withholding
Functional Dependencies • A functional dependency is a constraint between two sets of attributes in a relational database. • If X and Y are two sets of attributes in the same relation T, then X Y means that X functionally determines Y so that • the values of the attributes in X uniquely determine the values of the attributes in Y • for any two tuples t1 and t2 in T, t1[X] = t2[X] implies that t1[Y] = t2[Y] • if two tuples in T agree in their X column(s), then their Y column(s) should also be the same.
FD and Keys • Key constraint is a special kind of functional dependency • K is a superkey for relation schema R if and only if K → R • K is a candidate key for R if and only if • K → R, and • for no α ⊂ K, α → R • Key is on LHS, all attributes are on RHS • ROLL ROLL, Name, Address • For a key, no two rows share the same values, thus by default, when ever a tuple agrees on LHS it agrees on the RHS.
WELCOME 2ND DAY OF NORMALIZATION
Armstrong’s Axioms of FDs • Reflexivity: If X Y then X Y (trivial FD) • Name, Address Name • Augmentation: If X Y then X Z YZ • If Town Zip then Town, Name Zip, Name • Transitivity: If X Y and Y Z then X Z But keep in mind 2 other rules that are useful: • Union: If X → Y and X → Z, then X → YZ • Decomposition: • If X → YZ, then X → Y and X → Z
Soundness • Axioms are sound: • If an expression f: X Ycan be derived from a set of FDs F using the axioms, then f is a FD. We say F entailsf. Completeness • Axioms are complete: • If F entails f , then f can be derived from F using the axioms • As a result, to determine if F entails f, use the axioms in all possible ways to generate F+(the set of possible FD’s is finite so this can be done) and see if f is in F+
Functional Dependency Closure (F+) • Set F of Functional Dependencies (given) Relation: • EmpProj: SSN, Pnumber, Hours, Ename, Pname, Plocation • FDs F: • {SSN → Ename} • Pnumber →{Pname, Plocation} • {SSN, Pnumber} →Hours} • Closures: • {SSN}+ = {SSN, Ename} • {Pnumber}+ = {Pnumber, Pname, Plocation} • F+ • {SSN, Pnumber}+ = {SSN, Pnumber, Ename, Pname, Plocation, Hours}
Generating F+ F AB C AB BCD A D AB BD AB BCDE AB CDE D E BCD BCDE union decomp aug trans aug Thus, AB BD, AB BCD, AB BCDE, and AB CDE are all elements of F+
Attribute Closure • Calculating attribute closure is a more efficient way of checking entailment • The attribute closure of a set of attributes, X, with respect to a set of functional dependencies, F, (denoted X+F) is the set of all attributes, A, such that X A • X +F1 is not necessarily the same asX +F2 • Checking entailment: Given a set of FDs, F, then X Y if and only if X+F Y (by union & decomposition rule)
Computation of Attribute Closure Example Problem: Compute the attribute closure of AB with respect to the set of FDs : AB C (a) A D (b) D E (c) AC B (d) Solution: Initially closure = {AB} Using (a) closure = {ABC} Using (b) closure = {ABCD} Using (c) closure = {ABCDE}
Computation of Attribute Closure X+F closure := X; --since X X+F repeat old := closure; if there is an FD Z V in F such that Z closure thenclosure := closure V untilold = closure -- If T closure then X T is entailed by F
Example - Computing Attribute Closure X XF+ A {A, D, E} AB {A, B, C, D, E} (Hence AB is a key) B {B} D {D, E} F: AB C A D D E AC B Is AB E a FD? Yes Is D C a FD? No Result: XF+ allows us to determine FDs entailed by F of the form X Y
Normal Forms • Each normal form is a set of conditions on a schema that guarantees certain properties (relating to redundancy and update anomalies) • The two commonly used normal forms are • third normal form (3NF) and • Boyce-Codd normal form (BCNF)
Levels of Normalization 1 NF 2 NF 3 NF BCNF
Normal Forms • Considerations: • Relational design by analysis • Normal forms are based on functional dependencies (FDs) • Intuitive, perhaps, but identifying a strictly controlled procedure allows a programmatic process • Should consider 2 additional properties • Lossless join (nonadditive join property) • required • Dependency preservation property • use when possible
Lossless Joins and Dependency Preservation • If relation R and FDs F hold over R, then decomposing R into R1 and R2 is lossless if the closure of F contains either: • FD R1 ∩ R2 -> R1 or • FD R1 ∩ R2 -> R2 • If the closure of the attributes in R1, independent of those attributes in R2, unioned with the closure of attributes of R2, independent of those attributes in R1, are equivalent to the closure F, then dependency is preserved
First Normal Form (1NF) • A relational schema R is in first normal form if the domains of all attributes of R are atomic (Atomicity) • Domain is atomic if its elements are considered to be indivisible units • Non-atomic values complicate storage and encourage redundant (repeated) storage of data • Requirements: • 1NF disallows multivalued attributes, or composite attributes, or their combinations, by requiring only single atomic (indivisible) values in the domain of an attribute
Business Rules Example 1NF • Staffing hours (S) are on a per project activity (activities within projects) basis - AN • Managers (PM) and their departments (D) are assigned to projects (PN) • A department is assigned to a project managers • A project manager is assigned to projects
Prime Attribute: An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R Non Prime Attribute: An attribute of relation schema R is called a non prime attribute of R if it is not a member of any candidate key.
Second Normal Form (2NF) • 2NF based on the concept of full functional dependency. • A functional dependency is full functional dependency if removal of any attribute A from X means that the dependency does not hold anymore; i.e. for any attribute AX, (X-{A}) does not functionally determine Y. • If FD X→Y, removal of A eliminates the FD • Partial functional dependency : • For any attribute AX, (X-{A})→Y • If A can be removed and FD remains, X→Y is a partial functional dependency (a violation of 2NF)
Partial functional dependency (a violation of 2NF) {SSN,PNUMBERHOURS is a full dependency. {SSN,PNUMBER} ENAME is partial because ENO ENAME holds
Second Normal Form (2NF) • Definition of 2NF: • A relational schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R.
2NF 1NF Staffing is on a per project activity ( and activities within projects) basis Managers and their departments are assigned to projects {PN,AN→h} PN →{PM,D}
WELCOME 3rd DAY OF NORMALIZATION
Third Normal Form (3NF) • Definition of 3NF: • A relational schema R is in 3NF if it satisfies 2NFand no nonprime attribute of R is transitively dependent on the primary key. • 3NF is based on the concept of transitive dependency of nonprime attributes on another nonprime attribute. {X→Y,Y→Z}⊨X→Z • Transitive dependencies - is a 3NF violations • LHS of FD should be superkey, or RHS is a prime attribute.
3NF A department is assigned to a project manager A project manager is assigned to projects PN→PM →D 2NF
3 NF BCNF Boyce-Codd Normal Form (BCNF) • BCNF is a simpler form of 3NF that is more restrictive. • Every relationship in BCNF is also in 3NF; however 3NF is not necessarily in BCNF. • Definition of BCNF: • A relation schema R is in BCNF if whenever a nontrivial functional dependencies X→A holds in R, then X is a superkey of R. • LHS of a FD should be superkey • Note: • Each attribute is identified by nothing but the key • Sometimes too restrictive, may not be dependency-preserving with regard to closure
BCNF KEY A1 A1 A1 …
New example: LOT 1NF 2NF
New example: LOT 3NF
Let there exist a new FD5; • FD5 violates BCNF in LOTS1A because AREA is not a superkey of LOTS1A • FD5 satisfy 3NF in LOTS1A because COUNTRY_NAME is a prime attribute, but this condition does not exist in definition of BCNF • So decompose LOTS1A in to LOTS1AX and LOTS1AY FD5
Other dependences and normal forms (not commonly used) • Multivalued dependences (4NF) • If X and Y are subsets of attributes of relation schema R: • MVD X↠Y independent of the values in other attributes • R is in 4NF if for every MVD X↠Y that holds over R, one of the following is true: • YX or XY = R (trivial MVD), or • X is a superkey
MVD An employee can be assigned to any project and, within those projects, to any activities, but the assignments are consistent for that employee A project or activity can have any number of employees assigned to it EN→P EN→A