730 likes | 972 Views
The Relational Model – Functional Dependencies & Normalization. Optimal Database Design Selection Of Appropriate Relations/Tables For A Given Set Of Attributes Minimize Update Anomalies Redundancy Update Inconsistent Data Additions Deletions. Objectives. Definition of Anomaly.
E N D
The Relational Model – Functional Dependencies & Normalization
Optimal Database Design • Selection Of Appropriate Relations/Tables For A Given Set Of Attributes • Minimize Update Anomalies • Redundancy • Update • Inconsistent Data • Additions • Deletions Objectives
Definition of Anomaly • Something that deviates from our expectations
CUSTNUMB CUSTNAME CUSTADDR SNUMB SLSRNAME Example 123 Jones, R. 19 Oak St. 3 Adams, M. 456 Lan, J. 4 Pine St. 6 Smith, R. 461 Chu, W. 22 Main St. 12 Brown, M. 489 Obie, S. 76 High St. 6 Smith, R. 514 Wise, R 17 Birch St. 3 Adams, M. ... ... ... ... ... 999 Side, E. 87 Bay St. 12 Brown, N.
Redundancy • Why repeat the Sales Rep Name for Adams in each record? Suppose Adams has 500 customers? That means 500 times you repeat Adams’ name! • Update • Suppose Slsr Mary Adams marries and changes her name? How many rows do we need to update? • Inconsistent data • Notice Brown's first initial varies : M, N • Additions • New Slsr J. Doe can't be entered until he has a customer • Deletion • Delete all customers of Adams, and we lose the name of the salesrep Adams Specific Anomalies In This Relation
CUSTNUMB CUSTNAME CUSTADDR SNUMB 123 Jones, R. 19 Oak St. 3 Decomposition Of Relations The previous table can be decomposed into the following two tables 456 Lan, J. 4 Pine St. 6 461 Chu, W. 22 Main St. 12 489 Obie, S. 76 High St. 6 514 Wise, R 17 Birch St. 3 ... ... ... ... 12 999 Side, E. 87 Bay St. SNUMB SLSRNAME 3 Adams, M. 6 Smith, R. 12 Brown, M.
REDUNDANCY • NONE EXISTS • UPDATE • JUST CHANGE MARY ADAMS' LAST NAME (ONCE) IN salesrep relation • INCONSISTENT DATA • IMPOSSIBLE - M. BROWN'S NAME APPEARS ONLY ONCE! • ADDITIONS • ADD NEW SLSR J. DOE TO salesrep relation • DELETIONS • WE CAN DELETE ALL OF ADAMS' CUSTOMERS AND STILL HAVE ADAMS IN salesrep Notice That This Decomposition Resolved All Database Anomalies
Conceptual Tools Needed For Decomposition • Functional Dependencies • Lossless Join Decomposition • Normal Forms
Functional Dependencies • Common Issue in Designing a New Database From Existing Data • We have obtained one or more tables of existing data (such as from a spreadsheet or extracts from an existing corporate database). • The data is to be stored in a new database. • DATABASE DESIGN QUESTION: Should the data be stored as received, or should it be transformed for storage?
Should We Combine ORDER_ITEM and SKU_DATA into One Table (SKU_DATA)? Should we store these two tables as they are, or should we combine them into one table in our new database?
But First— • We need to understand: • The relational model • Relational model terminology
The Relational Model • Introduced in 1970 • Created by E.F. Codd • He was an IBM engineer • The model used mathematics known as “relational algebra” • Now the standard model for commercial DBMS products.
Important Relational Model Terms • Entity • Relation • Functional Dependency • Determinant • Candidate Key • Composite Key • Primary Key • Surrogate Key • Foreign Key • Referential integrity constraint • Normal Form • Multivalued Dependency (new for us)
Entity • An entity is some identifiable thing that users want to track: • Customers • Computers • Sales
Relations • A relation is a two-dimensional table that has the following characteristics: • Rows contain data about an entity. • Columns contain data about attributes of the entity. • All entries in a column are of the same kind. • Each column has a unique name. • Cells of the table hold a single value. • The order of the columns is unimportant. • The order of the rows is unimportant. • No two rows may be identical
An INVALID relation (Cells in a valid relation are supposed to hold a single value, but the Phone “cell” for Employees 400 and 700 have multiple phone numbers)
Alternative Terminology • Although not all tables are relations, as we have seen on the previous slides, the terms table and relation are generally used interchangeably. • The following sets of terms are equivalent:
Functional Dependency • A functional dependency occurs when the value of one (set of) attribute(s) determines the value of a second (set of) attribute(s): StudentID StudentName StudentID (DormName, DormRoom, Fee) • The attribute on the left side of the functional dependency is called the determinant. • Functional dependencies may be based on equations: ExtendedPrice = Quantity X UnitPrice (Quantity, UnitPrice) ExtendedPrice • But, function dependencies are definitely not equations!
Functional Dependencies Are Not Equations: An Example We can deduce the following set of Functional Dependencies from the above diagram • ObjectColor Weight • ObjectColor Shape • ObjectColor (Weight, Shape) • But, does Shape functionally determine anything? (NO!)
Composite Determinants Composite determinant: a determinant of a functional dependency that consists of more than one attribute. (StudentName, ClassName) (Grade)
Functional Dependency Rules(Not a complete list) • If A (B, C), then A B and A C • If (A,B) C, then neither A nor B determines C by itself
Functional Dependency Review • A functional dependency occurs when the value of one (or set of) attribute(s) determines the value of a second (or set of) attribute(s): StudentID StudentName StudentID (DormName, DormRoom, Fee) • The attribute on the left side of the functional dependency is called the determinant, the attribute on the right side is called the dependent. • Functional dependencies may be based on equations: ExtendedPrice = Quantity X UnitPrice (Quantity, UnitPrice) ExtendedPrice • Function dependencies are not equations
Composite Determinants • Composite determinant: A determinant of a functional dependency that consists of more than one attribute • Example of a Composite Determinant: (StudentName, ClassName) (Grade)
Find the functional dependencies in the SKU_DATA Table Ask yourself the question – if we know the value of a particular attribute, will that value determine a unique value of some other attribute? (If “yes,” then we have a functional dependency between the attributes.)
Functional Dependencies in the SKU_DATA Table • SKU (SKU_Description, Department, Buyer) • SKU_Description (SKU, Department, Buyer) • Buyer Department
Functional dependencies in ORDER_ITEM Table • (OrderNumber, SKU) (Quantity, Price, ExtendedPrice) • Note that OderNumber by itself does not functionally determine any other attribute • While SKU, from the data, does appear to functionally determine Price, we always need to be very careful in making inferences from data. Prices may change in the future, and the price might often be tied to a particular order. So, we would prefer to use the composite of SKU and OrderNumber as a determinant in a functional dependency, rather than SKU by itself. • (Quantity, Price) (ExtendedPrice) • Note that this is derived from the equation ExtendedPrice = Quantity * Price
When are determinant values unique? • A determinant has unique values (i.e., all values are different) in a relation if, and only if, it functionally determines every other attribute in the relation • So, in SKU_Data, SKU has all different (unique) values, and it functionally determines every attribute in the table. On the other hand, Buyer, though a determinant, does not have unique values, and does not functionally determine all the other attributes in the relation. • So, you cannot find the determinants of all functional dependencies simply by looking for unique values in one column
A B C D E a(1) b(1) c(1) d(1) e(1) a(1) b(1) c(2) d(1) e(1) a(2) b(1) c(1) d(1) e(1) a(2) b(2) c(1) d(2) e(1) a(2) b(2) c(2) d(3) e(2) BC ----> D (True or False?) B ----> A (True or False?) D ----> BE (True or False?) AB ----> C (True or False?)
The Answers BC ----> D (True or False?) B ----> A (True or False?) D ----> BE (True or False?) AB ----> C (True or False?)
Deducing Functional Dependencies • Since BC ----> D and D ----> BE, can we conclude that BC ----> BE ? • YES! (We will call this transitivity) • If BC ----> D and BC ----> A, can we conclude that D ----> A ? • NO! Nor can we conclude A ----> D.
A superkey is an attribute or a set of attributes that identify an entity UNIQUELY. • In a relation (table), a SUPERKEY is any column or set of columns whose values can be used to distinguish onerow from another. • Since a superkey identifies each item uniquely, it functionally determines all the attributes of a relation. • STUID is a superkey • SOCSEC is a superkey • STUNAME is NOT a superkey • STUID,STUNAME IS a superkey • STUID,ANY OTHER SET OF ATTRIBUTES is a superkey Superkeys & FD's
The Formal Theory Definition Of A Superkey • A set of attributes K is a superkey of relation (table) R, if K ----> R • In other words, a superkey functionally determines all the attributes in R
A superkey is a candidate key if it is minimal, i.e., if X is a superkey, then X minus {any attribute of X} is NOT a superkey. • A primary key is a candidate key which we choose to be THE "key." More On Superkeys
Superkey: a set of attributes which functionally determines all of the attributes in the relation • Candidate key:from the set of superkeys, we eliminate all those superkeys which have "extra" attributes (a superkey will have an "extra" attribute if, when we remove this attribute, the resulting set of attributes is also a superkey). • Primary key: if there is more than 1 candidate key, then the candidate key we choose for THE key is called the primary key - if there is exactly 1 candidate key, then that candidate key is the primary key. Superkeys, Candidate Keys And Primary Keys
Consider the following scheme from an airline database system:( P(pilot) , F(flight# ), D(date), T (scheduled time to depart) ) We have the following FD's : • F ----> T PDT ----> F FD ----> P Provide some superkeys: • PDT is a superkey, and FD is a superkey. • Is PDT a candidate key? • PD is not a superkey, nor is DT, nor is PT. • So, PDT is a candidate key. • FD is also a candidate key, since neither F or D are superkeys. Example - Obtain Candidate Keys
Surrogate Keys • A surrogate key is an artificial attribute/column added to a relation to serve as a primary key: • Often DBMS supplied • Short, numeric and never changes – an ideal primary key! • Has artificial values that are meaningless to users • Normally hidden in forms and reports
Example of Surrogate Keys (NOTE: The primary key of the relation is underlined below) • RENTAL_PROPERTY without surrogate key: RENTAL_PROPERTY (Street, City,State/Province, Zip/PostalCode, Country, Rental_Rate) • RENTAL_PROPERTY with surrogate key: RENTAL_PROPERTY (PropertyID, Street, City, State/Province, Zip/PostalCode, Country, Rental_Rate
A functional dependency is defined to be trivial if it is satisfied by every relation • Example of a trivial functional dependency: • AB ----> A is satisfied by every relation involving A. Trivial FD's
Generalization and rule for trivial FD's: • An FD is trivial if it has the form: X ----> Y, where Y is a subset of X. • So, ABCD ----> ABC is a trivial FD. • A trivial FD does not make a significant statement about real world constraints - we are thus only interested in non-trivial FD's. Trivial Fd's
Another FD “Rule” • If (A,B) C, then neither A nor B by itself will functionally determine C.
There are numerous "normal forms" which are categorizations based upon the kinds of “problems” that relations have. • These will be discussed: • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF) • Boyce-Codd Normal Form (BCNF) Normal Forms
A relation is in first normal form (1NF) iff every attribute in every row can contain only a single value. A 1NF relation cannot have any row that contains a repeating grouping of attribute values. FIRST NORMAL FORM
Ordnumb Orddte Partnumb Numbord 12489 30109 AX12 11 12491 30209 BT04 1 BZ66 1 12495 30409 CX11 2 *We can convert the above table to 1NF by flattening * Ordnumb Orddte Partnumb Numbord 12489 30109 AX12 11 12491 30209 BT04 1 12491 30209 BZ66 1 12495 30409 CX11 2 Example Of A Relation Not In 1NF
Definition: an attribute is a non-key attribute if it is not a part of the primary key • Definition: A relation is in second normal form (2NF) if it is in first normal form and no non-key attribute is dependent on only a portion of the primary key (when the primary key is composite - consisting of 2 or more attributes) Second Normal Form
Ordnumb Orddte Partnumb PartDesc Numbord Quoprice 12489 90509 AX12 MOUSE 11 14.95 12491 90509 BT04 DRV270G 1 120.99 12491 90509 BZ66 DRV180G 1 80.95 12495 90709 AX12 MOUSE 4 14.95 *****The following FD's hold on this relation******* Ordnumb ----> Orddte Partnumb ---> PartDesc Ordnumb, Partnumb ----> Numbord, Quoprice ******The relation is NOT in 2NF because ...********* PartDesc is dependent on only a portion of primary key, and similarly for Orddte Example Of A Relation In 1NF, But Not 2NF