550 likes | 1.03k Views
CS157B. Lecture 3 Functional Dependency and Normal Forms. Prof. Sin-Min Lee Department of Computer Science. Database Design Process. Application 1. Application 2. Application 3. Application 4. External Model. External Model. External Model. External Model. Application 1.
E N D
CS157B Lecture 3 Functional Dependency and Normal Forms Prof. Sin-Min Lee Department of Computer Science
Database Design Process Application 1 Application 2 Application 3 Application 4 External Model External Model External Model External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Conceptual Model Logical Model Internal Model Application 3 Conceptual requirements Application 4 Conceptual requirements
Relational Database Model Relations Source: ESRI Advanced ArcInfo
Attribute Relationships Functional Dependency: refers to the relationships between attributes within a relation. If the value of attribute A determines the value of attribute B, then attribute B is functionally dependent upon attribute A.
Functional Dependencies X -> Y means: • X functionally determines Y • Y depends on X • Values of Y component depend on, determined by values of X component
Functional Dependencies Given t1 and t2: • if t1[X] = t2 [X] then t1[Y] = t2 [Y] (1) • In other words if the values of X are equal, then Y value are equal • Values of X component uniquely (functionally) determine values of Y component iff (1)
Data Normalization • Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data. • The process of decomposing relations with anomalies to produce smaller, well-structured relations. • Primary Objective: Reduce Redundancy,Reduce nulls, • Improve “modify” activities: • insert, • update, • delete, • but not read • Price: degraded query, display, reporting
Normal Forms • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF) • Boyce-Codd Normal Form (BCNF) • Fourth Normal Form (4NF) • Fifth Normal Form (5NF)
Unnormalized Relations First normal form Functional dependencyof nonkey attributes on the primary key - Atomic values only Second normal form No transitive dependency between nonkey attributes Third normal form Boyce- Codd and Higher Full Functional dependencyof nonkey attributes on the primary key All determinants are candidate keys - Single multivalued dependency Normalization
Unnormalized Relations • First step in normalization is to convert the data into a two-dimensional table • In unnormalized relations data can repeat within a column
First Normal Form • To move to First Normal Form a relation must contain only atomic values at each row and column. • No repeating groups • A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation.
Second Normal Form • A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key. • That is, every nonkey attribute needs the full primary key for unique identification
Third Normal Form • A relation is said to be in Third Normal Form if there is no transitive functional dependency between nonkey attributes • When one nonkey attribute can be determined with one or more nonkey attributes there is said to be a transitive functional dependency. • The side effect column in the Surgery table is determined by the drug administered • Side effect is transitively functionally dependent on drug so Surgery is not 3NF
Functional Dependency and Keys • Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute. • Candidate Key: Each non-key field is functionally dependent on every candidate key.
Normalization – most used • Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF). • Based on functional dependencies among the attributes of a relation. • A relation can be normalized to a specific form to prevent possible occurrence of update anomalies.
First Normal Form • No multi-valued attributes. • Every attribute value is atomic. • Why are the following tables not in 1NF Employee (ssn, Name, Salary, Address, ListOfSkills) Department (Did, Dname, ssn)
Second Normal Form • 1NF and every non-key attribute is fully functionally dependent on the primary key. • Every non-key attribute must be defined by the entire key, not by only part of the key. • No partial functional dependencies. Assuming that we have a composite PK (LicensePlate, OwnerSSN) for the Vechicle Table below, why is the table not in 2NF ? Vehicle (LicensePlate, Brand, Model, PurchasePrice, Year, OwnerSSN, OwnerName)
Third Normal Form & BCNF • 2NF and no transitive dependencies (functional dependency between non-key attributes = BCNF) Why are the following tables not in 3NF or BCNF ? • Why is Employee [ssn, name, salary, did, dname] • Customer
3NF & BCNF • It is very rare for a Table to be in 3NF and not be in BCNF (violation of BCNF). • Given a Relation R with attributes A, B and C where A and B are together the composite PK, IF A, B -> C and C -> B THEN R is in 3NF and is not in BCNF Example: Student, course -> Instructor Instructor -> Course
Steps in Normalization • 1NF: a table, without multivalued attributes • if not, then decompose • 2NF: 1NF and every non-key attribute is fully functionally dependent on the primary key • if not, then decompose • 3NF: 2NF and no transitive dependencies • if not, then decompose • GENERAL: • Each table should describe a single theme • Modification anomalies are minimized Hint: THE KEY, THE WHOLE KEY AND NOTHING BUT THE KEY
Consider the following scheme from an airline database system: • ( P (pilot) , F (flight# ), D (date), T (scheduled time to depart) ) • We have the following FD's : • F ----> T PDT ----> F FD ----> P • Provide some superkeys: • PDT is a superkey, and FD is a superkey. • Is PDT a candidate key? • PD is not a superkey, nor is DT, nor is PT. • So, PDT is a candidate key. • FD is also a candidate key, since neither F or D are superkeys. EXAMPLE - OBTAIN CANDIDATE KEYS
CLOSURE OF A SET OF FD'S • If F is a set of functional dependencies for a relation R, the set of all functional dependencies that can be derived from F, denoted by F+, is called the CLOSURE of F. • We can use Armstrong's axioms, and the 3 derived rules, to compute the closure of F, F+.
WORKING TO GET THE CLOSURE F+ • GIVEN: scheme (A, B, C, G, H, I) • GIVEN: FD set (A--->B, A--->C, CG--->H, CG--->I, B--->H) • Some members of F+ are • A--->H {Transitivity Rule applied to A--->B and B--->H) • CG--->HI {Union Rule applied to CG--->H and CG--->I} • AG--->I {By Augmentation Rule, AG--->CG; then Transitivity}
THE CLOSURE OF A SET OF ATTRIBUTES • GIVEN: FD set F and a given attribute A (or set of attributes A) • FIND : The set of attributes functionally dependent on A, called the closure of A, and denoted by A+ • IMPORTANT USE FOR THIS: To determine if A is a superkey, we compute A+, the set of attributes functionally dependent on A. If A+ consists of ALL the attributes in the relation, then A is a superkey • HOW DO WE FIND A+? The following algorithm does the trick!
ALGORITHM TO FIND THE CLOSURE OF ATTRIBUTE A, DENOTED BY A+ • result := A; • while { result changes } • for each functional dependency B--->C • begin • if B is contained in result, then result := result U C ' end • endwhile • A+ := result
EXAMPLE TO FIND THE CLOSURE A+ OF AN ATTRIBUTE A • GIVEN: Relation R with attributes W, X, Y, Z and FD's W ---> Z YZ ---> X WZ ---> Y • FIND : WZ+ • PSEUDO TRACE OF THE ALGORITHM: • result := WZ • from first 2 FD's, no change to "result" • from WZ ---> Y, since WZ is contained in result, we • get result := WZY • since YZ is contained in result, we get result := WZYX • Thus, every attribute in R is in WZ+, so WZ is a superkey!
Normalization • Normalization of data - method for analyzing schemas • Unsatisfactory schemas decomposed into smaller ones with desirable properties • Objectives of normalization • good relation schemas disallowing update anomalies
Formal framework • database normalized to any degree (1, 2, 3, 4, 5, etc.) • normalization is not done in isolation • need: • lossless join • dependency preservation • additional normal forms meet other desirable criteria
Normal Forms • 1st, 2nd, 3rd, BCNF consider only FD and key constraints • constraints must not be hard to understand or detect • need not normalize to highest form (e.g. for performance reasons)
1NF - 1st normal form • part of the formal definition of a relation • disallow multivalued attributes, composite attributes and their combination • In 1NF single (atomic, indivisible) values
Normalize into 1NF? • How to normalize nested relations into 1NF? • Remove nested relation attributes into new relation • propagate PK • combine PK and partial PK • recursively unnest - multilevel nesting • useful in converting hierarchical schemes into 1NF
Difficulties with 1NF • insert, delete, update • Determine if describe entity identified by PK? • If not, called non-full FDs • we need full FDs for good inserts, deletes, updates
Second Normal Form - 2NF • Uses the concepts of FDs, PKs and this definition: • An FD is a Full functional dependency if: given Y -> Z Removal of any attribute from Y means the FD does not hold any more
2NF • A relation schema R is in 2NF if: • Relation is in 1NF • Every non-prime attribute A in R is fully functionally dependent on the primary key Prime attribute - attribute that is a member of the primary key K • R can be decomposed into 2NF relations via the process of 2NF normalization • Remove partial dependencies • create new relations where partials are full
Simplifying Functional Dependencies through Normalization Normalization: the identification of functional dependencies and the modifications required to structurally change the database to remove undesirable dependencies