230 likes | 495 Views
Normalization. Why Normalization?. To remove potential redundancy in design Redundancy causes several anomalies : insert, delete and update Redundancy wastes storage, and often slows down query processing Examples to follow next. What is Normalization?.
E N D
Normalization cs3431
Why Normalization? • To remove potential redundancy in design • Redundancy causes several anomalies: insert, delete and update • Redundancy wastes storage, and often slows down query processing • Examples to follow next. cs3431
What is Normalization? • Normalization uses concept of dependencies • Functional Dependencies • Technique used: Decomposition • Break R (A, B, C, D) into R1 (A, B) and R2 (B, C, D) cs3431
Insert Anomaly Student Question: Could we insert any professor ? Note: We cannot insert a professor who has no students. Insert Anomaly: We are not able to insert “valid” value/(s) cs3431
Delete Anomaly Student Question: Can we delete a student and keep a professor info ? Note: We cannot delete a student that is the only student of a professor. Delete Anomaly: We are not able to perform a delete without losing some “valid” information. Note: In both cases, minimum cardinality of Professor in the corresponding ER schema is 0 cs3431
Update Anomaly Student Question: Can we simply update a professor’s name ? Note: To update the name of a professor, we have to update in multiple tuples. Update Anomaly: To update a value, we have to update multiple rows. Update anomalies are due to redundancy. Note the maximum cardinality of Professor in the corresponding ER schema is * cs3431
Normalization • Need a method to find such “dependencies” exist between attributes • Functional dependencies • Need a method to remove such harmful dependencies, when they exist • Relational decomposition cs3431
Keys : Revisited • A key for a relation R (a1, a2, …, an) is a set of attributes, K, that together uniquely determine the values for all attributes of R. • A key is minimal: no subset of K is a key. • A superkey need not be minimal • A prime attribute: an attribute that is part of a key cs3431
Functional Dependencies (FDs) Student Suppose we have the FD: sName address That is,there is a function from sName to address Meaning: For any two rows in the Student relation with the same value for sName, the value for address must be same. cs3431
FD and Keys Student Primary Key : <sNumber> FD : sName address • Questions : • Does a key implies functional dependencies? Which ones ? • Does a functional dependency imply keys ? Which ones ? Observation : Any key (primary or candidate) or superkey of a relation R functionally determines all attributes of R. cs3431
Properties of FDs • Consider A, B, C, Z are sets of attributes • Reflexive (trivial FD): if A B, then A B • Transitive: if A B, and B C, then A C • Augmentation: if A B, then AZ BZ • Union: if A B, A C, then A BC • Decomposition: if A BC, then A B, A C Note: Sound and completeinference rules for FDs cs3431
Inferring FDs • Suppose we have : • a relation R (A, B, C) and • functional dependencies A B, B C, C A • Questions : • What is a key for R? • Should we split R into multiple relations? • We can infer A ABC, B ABC, C ABC. • Hence A, B, C are all keys. cs3431
Reasoning About FDs • An FD f is implied bya set of FDs F if f holds whenever all FDs in F hold. • = closure of F is the set of all FDs that are implied by F. • Computing closure of a set of FDs can be expensive. • Size of closure is exponential in # attrs! cs3431
Reasoning About FDs • Instead of computing closure F+ of a set of FDs • Too expensive • Typically, we just need to know if a given FD X Y is in closure of a set of FDs F. • Algorithm for efficient check: • Compute attribute closureof X (denoted ) wrt F: • Set of all attributes A such that X A is in • There is a linear time algorithm to compute this. • Check if Y is in X+ . If yes, then X A in F+. cs3431
Reasoning About FDs (Contd.) • Does F = {A B, B C, C D E } imply A E? • Question : • i.e, is A E in the functional set closure ? • Equivalent Question : • Is E in the attribute closure ? cs3431
Algorithm for Inference of FDs • Computing the closure of set of attributes {A1, A2, …, An}, denoted {A1, A2, …, An}+ • Let X = {A1, A2, …, An} • If there exists a FD : B1, B2, …, Bm C, such that every Bi X, then X = X C • Repeat step 2 until no more attributes can be added. • {A1, A2, …, An}+ = X cs3431
Inferring FDs: Example 1 • Consider R (A, B, C, D, E) • with FDs A B, B C, CD E • Does A E? (Is A E in F+ ?) • Rephrase as : Is E in A+ ? • Let us compute {A}+ • {A}+ = {A, B, C} • Therefore, A E is false cs3431
Inferring FDs: Example 2 • Given R (A, B, C), and • FDs : A B, B C, C A • What are possible keys for R ? • Compute the closure of attributes: • {A}+ = {A, B, C} • {B}+= {A, B, C} • {C}+= {A, B, C} • So keys for R are <A>, <B>, <C> cs3431
Decomposing Relations StudentProf FDs: pNumber pName Student Professor cs3431
Decomposition Decomposition: Must be Lossless (no spurious tuples) cs3431
Decomposition: Lossless Join Property Student Professor StudentProf Spurious Tuples cs3431
Normalization • How do I decide if I need to further decompose? • Once decided, what is the algorithm for decomposing? cs3431