370 likes | 388 Views
Schema Refinement and Normal Forms Chapter 19. The Evils of Redundancy. Redundancy causes several problems associated with relational schemas: Redundant storage Insert anomalies Delete anomalies Update anomalies What are the problems in the following schema?
E N D
The Evils of Redundancy • Redundancycauses several problems associated with relational schemas: • Redundant storage • Insert anomalies • Delete anomalies • Update anomalies • What are the problems in the following schema? Hourly_Emps (ssn, name, lot, rating, hourly_wages, hours_worked)
Example • Hourly_Emps (ssn, name, lot, rating, hourly_wages, hours_worked) • Abbreviate an attribute name to a single letter • Refer to a relation schema by a string of letters, one per attribute. • In the example, hourly_wages is determined by rating. This IC is an example of a functional dependency (RW). • What does it lead to?
Example • Problems: • Redundant storage • Update anomaly: Can we change W in just the 1st tuple of SNLRWH? • Solutions to the anomalies?
Example • Problems: • Insertion anomaly: What if we want to insert an employee and don’t know the hourly wage for his rating? • Solution? • Null values • What if the s is unknow?
Example • Problems: • Deletion anomaly: If we delete all employees with rating 5, we lose the information about the wage for rating 5! • Solutions?
Fighting the Evils of Redundancy • Redundancyis at the root of evil - true or not? • Integrity constraints, in particularfunctional dependencies, can be used to identify schemas with such problems and to suggest refinements. • RW • Decomposition(replace SNLRWH with SNLRH and RW) Wages Hourly_Emps2
Fighting the Evils of Redundancy • RW • Decomposition(replace SNLRWH with SNLRH and RW) • Do we solve all this problems? • Redundant storage • Insert anomalies • Delete anomalies • Update anomalies Hourly_Emps2 Wages
Fighting the Evils of Redundancy • Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD). • Problems related to decomposition: • Is there reason to decompose a relation? • Normal forms help us to decide whether or not to decompose a relation in further. • How? • What problems (if any) does the decomposition cause? • Lossless join decomposition • Dependency-preservation decomposition
Properties of decomposition • Lossless join decomposition • Enable us to recover any instance of the decomposed relation from corresponding instances of the smaller relations. • Dependency-preservation decomposition • Enable us to enforce any constraints on the original relation by simple enforcing some constraints on each of smaller relations. • Key constrains Wages Hourly_Emps2
Good? • List N, W, H • How?
Good DB designer • Normal forms and what problems they do or do not alleviate? • Decomposition and what problems with decompositions. • Questions: • Is a relation in a given normal form? • Is a decomposition dependency preserving?
Functional Dependencies (FDs) • A functional dependencyX Y holds over relation R if, for every allowable instance r of R: • t1 r, t2 r, t1.X =t2.X implies t1.Y =t2.Y • i.e., given two tuples in r, if the X values agree, then the Y values must also agree. (X and Y are sets of attributes.)
Functional Dependencies (FDs) • An FD is a statement about all allowable relations. • Must be identified based on semantics of application. • Given some allowable instance r1 of R, we can check if it violates some FD f, but we cannot tell if f holds over R! • K is a candidate key for R means that K R • However, KR does not require K to be minimal!
FDs- example FD: ABC Question: Do all tuples in the relation satisfy the Functional Dependents ABC Can we add a tuple <a1,b1,c2,d1> to the relation? Why?
exercise • List all the functional dependencies that this relation instance satisfies. • Z → Y, X → Y, and XZ → Y
Example: Constraints on Entity Set • Consider relation obtained from Hourly_Emps: • Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked) • Notation: We will denote this relation schema by listing the attributes: SNLRWH • This is really the set of attributes {S,N,L,R,W,H}. • Sometimes, we will refer to all attributes of a relation by using the relation name. (e.g., Hourly_Emps for SNLRWH) • Some FDs on Hourly_Emps: • ssn is the key: SSNLRWH • rating determines hrly_wages: RW
Wages Example (Contd.) Hourly_Emps2 • Problems due to RW : • Update anomaly: Can we change W in just the 1st tuple of SNLRWH? • Insertion anomaly: What if we want to insert an employee and don’t know the hourly wage for his rating? • Deletion anomaly: If we delete all employees with rating 5, we lose the information about the wage for rating 5! Will 2 smaller tables be better?
Reasoning About FDs • Given some FDs, we can usually infer additional FDs: • For example: • Workers(ssn,name,lot,did,since) • Ssndid holds, • did lot holds, implies ssnlot • An FD f is implied bya set of FDs F if f holds whenever all FDs in F hold. • F+ = closure of F is the set of all FDs that are implied by a given set F of FDs.
Closure of a set of FDs • Infer or compute the closure of a given set of F of FDs • Following three rules (Armstrong’s Axioms) • Armstrong’s Axioms (X, Y, Z are sets of attributes): • Reflexivity: If XY, then YX • Augmentation: If XY, then XZYZ for any Z • Transitivity: If XY and YZ, then XZ • These are sound and completeinference rules for FDs!
Soundness and Completeness • Completeness: given F, the rules allows us to determine all dependencies in • Soundness: we cannot generate FDs not in • Addition rules: • Union: if XY, XZ then XYZ • Decomposition: if XYZ, then XY and XZ
Soundness and Completeness • R=(A,B,C) and F={AB, BC} • Reflexivity: if Y is subset of X, then XY • Transitivity : if XY and YZ then XZ • AC • Augmentation : if XY then XZYZ for any Z • ACBC • ABBC • ABAC • ABC F+ ={A B, BC, AC, ACBC,ABAC. …}
Reasoning About FDs • FD is a statement about the world as we understand it • Ssndid, did lot implies ssnlot • Is it correct to state that “Since X functionally determines Y, if we know X, we know Y.” Or “if XY, then X identifies Y”
Reasoning About FDs (Contd.) • Couple of additional rules (that follow from AA): • Union: If X Y and X Z, then XYZ • Decomposition: If XYZ, then XY and XZ • Example: Contracts(cid,sid,jid,did,pid,qty,value), and: • C is the key: CCSJDPQV • Project purchases each part using single contract: JPC • Dept purchases at most one part from a supplier: SDP
Reasoning About FDs (Contd.) • C is the key: CCSJDPQV • Project purchases each part using single contract: JPC • Dept purchases at most one part from a supplier: SDP • JPC, CCSJDPQV imply JPCSJDPQV • SDP implies SDJJP • SDJJP, JPCSJDPQV imply SDJCSJDPQV
Exercise • Consider a relation R with five attributes ABCDE. You are given the following dependencies: A→B, BC→E, and ED→A. • List all keys for R. • CDE, ACD, BCD
Normal Forms • Returning to the issue of schema refinement, the first question to ask is whether any refinement is needed! • If a relation is in a certain normal form(BCNF, 3NF etc.), it is known that certain kinds of problems are avoided/minimized. This can be used to help us decide whether decomposing the relation will help.
Normalization • Often performed as a series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form. • Four most commonly used normal forms are first (1NF), second (2NF), third (3NF) and Boyce-Codd (BCNF) normal forms. • Based on functional dependencies among the attributes of a relation. • A relation can be normalized to a specific form to prevent the possible occurrence of update anomalies
First Norm Forms(1NF) • There are no repeating or duplication fields. • Each cell contains only a single value. • Each record is unique. • Identified by primary key
Second Normal Forms (2NF) • All non-key fields depend on all components of the primary key. • Guaranteed when primary key is a single field.
Third Normal Forms (3NF) • No non-key field depends on another. • All non-key fields depend only on primary key.