170 likes | 322 Views
Your name here. Improving Schemas and Normalization. What are redundancies and anomalies? What are functional dependencies and how are they related to schema quality? What is a superkey? What is a inference rule and how can we infer functional dependencies
E N D
Improving Schemas and Normalization • What are redundancies and anomalies? • What are functional dependencies and how are they related to schema quality? • What is a superkey? • What is a inference rule and how can we infer functional dependencies • How are keys determined by functional dependencies • How can we modify a schema to improve it? • What are normal forms and why are they important?
Redundancy and Anomalies in Relation Schemas • Anomalies occur when data is inconsistent • Redundancy of values is the source of anomalies • Update anomaly occurs when values are inconsistent • if title, genre, length or rating changed in any one or two of the green rows
Redundancy and Anomalies in Relation Schemas • Anomalies occur when data is inconsistent • Redundancy of values is the source of anomalies • Deletion anomaly caused by deletion of row with videoId1243 (pink) • Information about movie is deleted along with video • Insertion anomaly caused by last row (blue) • Length and rating are inconsistent with other rows
Functional Dependencies Between Attributes • A functional dependency is a strong connection between two or more attributes in a table. • one attribute is functionally dependent on another attribute when any two rows of the table that have the same value of the second attribute must have the same value for the first • Example: movieId determines title, genre, length, rating • Each row with movieId 123 has the same values for other attributes • FD2: movieId {title, genre, length, rating}
City, State, Zipcode Dependencies • FD4: zipcode {city, state} • FD5: {street, city, state} zipcode
Superkeys and Keys • A key constraint is a functional dependency • Example: accountId is key of Customer • FD6: accountId {lastName, firstName, street, city, state, zipcode} • A superkey is a set of attributes that determine the rest of the attributes of a schema • FD7: {accountId, lastName} (firstName, street, city, state, zipcode}
Using Functional Dependencies • Functional dependencies are used for • Determining keys • Finding sources of redundancy and hence trouble • Functional dependencies are declared • Designer defines FDs based on the semantics of the schemas • Additional dependencies can be found from those that are declared • Keys and redundancies are based on the full set of FDs • All declared FDs • FDs inferred by applying inference rules
Inferring Additional Functional Dependencies • Main inference rules • Rule 1: Reflexivity, a set of attributes X determines a subset Y of itself: • If X Y, then X Y. • Rule 2:Augmentation, a set of attributes Z can be added to both sides of X Y: • If X Y, then XZ YZ. • Rule 3:Transitivity, we can follow chains of dependencies from X to Y to Z: If X Y and Y Z, then X Z. • Additional rules for convenience • Rule 4:Decomposition, we can remove a set of attributes Z from the right side of X YZ: • if X YZ, then X Y. • Rule 5: Union, we can put two dependencies X Y and X Z together if they have the same left side Z: • if X Y and X Z then X YZ • Rule 6:Pseudo-transitivity, a combination of augmentation by adding W to both sides of X Y and transitivity in going from WX to WY to Z: • if X Y and WY Z, then WX Z. • Apply rules to FDs to find new rules • Closure is the set of all FDs that can be inferred
Example of Inference • Consider these how to infer FD7 from FD6 • FD6: accountId {lastName, firstName, street, city, state, zipcode} • FD7: {accountId, lastName} (firstName, street, city, state, zipcode} • Infer FD8 with augmentation: add lastName to left side • FD8: {accountId, lastName} (firstName, street, city, state, zipcode, lastName} • Use decomposition: remove lastName from right side • FD7: {accountId, lastName} (firstName, street, city, state, zipcode}
Determining Keys from Functional Dependencies • Start with closure of functional dependencies • Any functional dependency that includes all attributes has a superkey as the left side • If no subset of the left side is a super key • The left side is a key • A set of attributes is a key if and only if the above holds • Some terminology • Key is a set of attributes that determine all other attributes • Keyattribute is an attribute that is part of a key • Non-key attribute is an attribute that is not part of any key • Primary key is one of the keys that has been selected to identify the objects of the schema • Secondary key is a key that is not the primary key
Normalization • Normalization is the process of transforming some objects into a structural form that satisfies some collection of rules • Any schema that is in normal form is guaranteed to have certain quality characteristics • Each normal form has a rule that describes what kinds of functional dependencies the normal form allows. • Normalization is the process of transforming schemas in order to remove violations of the normal form rules. • Normalization is applied independently to each relation schema in a database schema. • A a database schema is said to be in normal form if each of its relation schemas is in the normal form.
Third Normal Form • A relation schema is in third normal form (3NF) if for every functional dependency • The left side (determinant) is a superkey or • The right side attributes are all key attributes • A functional dependency is a 3NF violation if • The left side is not a superkey and • The right side attributes are all non-key attributes • Consider the schema and FDs • VideoMovie:(videoId, dateAcquired, movieId, title, genre, length, rating) • FD1: movieId title • FD2: movieId {title, genre, length, rating} • FD9: videoId (dateAcquired, movieId} • FD10: videoId movieId • FD11: videoId (title, genre, length, rating} • FD12: videoId (dateAcquired, movieId, title, genre, length, rating} • FD1, FD2 are 3NF violations • FD9, FD10, FD11, FD12 are not 3NF violations because videoId (left side) is a key
Decomposition • Remove violations by decomposition • Create a new schema from FD • Remove right hand attributes of FD from original schema • Left side of FD becomes foreign key in original schema • Consider the schema and 3NF violations • VideoMovie:(videoId, dateAcquired, movieId, title, genre, length, rating) • FD1: movieId title • FD2: movieId {title, genre, length, rating} • Can decompose by either FD1 or FD2 • Better to use the larger FD • New schemas • Video: (videoId, dateAcquired, movieId references Movie) • Movie: (movieId, title, genre, length, rating)
First and Second Normal Form • The traditional presentation of 3NF includes two other normal forms: 1NF and 2NF • E.F. Codd (1970) defined several normal forms • Subsequent analysis simplified the definition of 3NF • 1NF specifies that every attribute must be single valued • 1NF has been incorporated into definition of relational model • 2NF makes a technical distinction about why an FD is a violation • The goal of normalization is to achieve 3NF • 2NF is an intermediate step and never a goal of normalization
Boyce Codd Normal Form • A schema is in BCNF if every functional dependency has a superkey as its determinant • No exclusion for key attributes in left side • Important in the context of multi-attribute keys • Consider the example of schema and FD • R6: (street, city, state, zipcode, secondary key {street, zipcode}) • FD4: zipcode {city, state} • FD4 has BCNF violation even though city and state are key attributes • FD4 is not a 3NF violation • Decomposition of R6 by FD4 into R7 and R8 in BCNF • R7: (street, zipcode references R8) • R8: (zipcode, city, state) • Note that the schemas have one key each. • Multiple keys have been removed by decomposition
Case in Point: Normalizing a Car Registration Schema • Example from text • Illustrates the way that normalization can be a source of schema definition • Process of design • Define relevant FDs • Apply inference rules • Normalize • Rename resulting schemas