630 likes | 796 Views
Logical Database Design and the Relational Database. Professor Chen School of Business Administration Gonzaga University Spokane, WA 99258. What is a Relation?. A relation is a named , two-dimensional table of data.
E N D
Logical Database Design andthe Relational Database Professor Chen School of Business Administration Gonzaga University Spokane, WA 99258
What is a Relation? • A relation is a named, two-dimensional table of data. • Each relation consists of a set of named columns and an arbitrary number of unnamed rows. • Not all table are relations
Emp_ID Name Dept_Name Salary 100 Margaret Simpson Marketing 48,000 140 Allen Beeton Accounting 52,000 110 Chris Lucero Info. System 43,000 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 Fig. 6-1: EMPLOYEE1 Relation with sample data EMPLOYEE1
Relational Definitions • Relation • Every relation has a unique name. • Every attribute value is atomic (singled-value) (Fig. 6-1) • Every row is unique. • Attributes in tables have unique names. • The order of the columns is irrelevant. • The order of the rows is irrelevant.
Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X Java 8/12/199X Fig. 6-2: Eliminating multi-valued attributes (a) Table with repeating groups (Un-Normalized)
Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X Java 8/12/199X Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X 100 Margaret Simpson Marketing 48,000 Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X 110 Chris Lucero Info. System 43,000 C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X 150 Susan Martin Marketing 42,000 Java 8/12/199X
Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X 100 Margaret Simpson Marketing 48,000 Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X 110 Chris Lucero Info. System 43,000 C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X 150 Susan Martin Marketing 42,000 Java 8/12/199X Fig. 6-2: Eliminating multi-valued attributes (b) EMPLOYEE2 Relation (Normalized) EMPLOYEE2
Relational Keys and Structures • Primary Key • Composite Key • Foreign Key • One-to-Many Relationship • Many-to-Many Relationship • Intersection Data • Candidate Key • Surrogate Key
Fig. 6-3: Schema for four relations (Pine Valley Furniture) Graphical Representation
Fig. 6-3: Schema for four relations (Pine Valley Furniture) Graphical and Text Representations CUSTOMER(Customer_ID, Customer_name,Address, City,State,Zip) ORDER(Order_ID, Order_Date,Customer_ID) ORDER_LINE(Order_ID, Product_ID,Quantity) PRODUCT(Product_ID, Product_Description, Product_Finish,Unit_Price, On_Hand)
Integrity Constraints • Domain Constraints • Allowable values for an attribute. • A domain definition contains: domain name, data type, size, meaning, and allowable values/range (if applicable). • Entity Integrity • No primary key attribute may be null. • Operational Constraints • Business rules (see Chapter 4)
Fig. 6-5: Referential integrity constraints (Pine Valley Furniture) pk fk pk ck/pk fk fk pk
Referential Integrity (Summary) • Addition - you can’t add (insert) an ORDER record if Customer_ID (FK) does not exist (or does not match) a Customer_ID (PK) in the CUSTOMER table. • Deletion- you can’t delete a CUSTOMER record if there is (are) related Customer_ID in the ORDER table.
Integrity Constraints • Referential Integrity • A rule that states that either each foreign key value must match a primary key value in the other relation or else the foreign key value must be null. • For example: Delete Rules • Restrict • Cascade • Set-to-Null
Referential Integrity (Conclusion) pk common field fk pk
Order of Entering Data and Referential Integrity LOCATION FACULTY STUDENT
Well-Structured Relations • A well-structured relation contains minimal redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies. • The following anomalies should be removed for a well-structured relation: • Insertion Anomaly • Deletion Anomaly • Modification Anomaly
Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X 100 Margaret Simpson Marketing 48,000 Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X 110 Chris Lucero Info. System 43,000 C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X 150 Susan Martin Marketing 42,000 Java 8/12/199X Is EMPLOYEE2 a Well-Structured relation? EMPLOYEE2
Insertion Anomaly: Inserting a new row, the user must supply values for both Emp_ID (PK) and Course_Title (CK and FK). This is an (insertion) anomaly, since the user should be able to enter employee data without knowing (supplying) course (title) data. EMPLOYEE2 Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X 100 Margaret Simpson Marketing 48,000 Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X 110 Chris Lucero Info. System 43,000 C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X 150 Susan Martin Marketing 42,000 Java 8/12/199X
Deletion Anomaly: Deleting the employee number 140, it results in losing not only the employee’s information but also the course had an offering that completed on that date. EMPLOYEE2 Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X 100 Margaret Simpson Marketing 48,000 Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X 110 Chris Lucero Info. System 43,000 C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X 150 Susan Martin Marketing 42,000 Java 8/12/199X
Modification Anomaly: If the employee number 100 gets a salary increase, we must record the increase in each of the rows for that employee (two occurences); otherwise the data will be inconsistent. EMPLOYEE2 Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X 100 Margaret Simpson Marketing 48,000 Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X 110 Chris Lucero Info. System 43,000 C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X 150 Susan Martin Marketing 42,000 Java 8/12/199X
Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X 100 Margaret Simpson Marketing 48,000 Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X 110 Chris Lucero Info. System 43,000 C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X 150 Susan Martin Marketing 42,000 Java 8/12/199X Emp_ID Name Dept_Name Salary 100 Margaret Simpson Marketing 48,000 140 Allen Beet Accounting 52,000 110 Chris Lucero Info. System 43,000 190 Lorenzo Davis Finance 55,000 150 Sususan Martin Marketing 42,000 Emp_IDCourse_ Date_ Title Completed 100 SPSS 6/19/199X 100 Surveys 10/7/199X 140 Tax Acc 12/8/199X 110 SPSS 1/12/199X 110 C++ 4/22/199X 150 SPSS 6/19/199X 150 Java 8/12/199X Fig. 6-7: Normalized Relations from EMPLOYEE2 EMPLOYEE2 EMPLOYEE1 EMP_COURSE
Normalization • Normalization is the process of decomposing relations with anomalies to produce smaller, well-structured and stable relations.
Data Normalization • Normalization is a formal process for deciding which attributes should be grouped together in a relation (see Fig.6-7 next). • Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data.
Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X 100 Margaret Simpson Marketing 48,000 Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X 110 Chris Lucero Info. System 43,000 C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X 150 Susan Martin Marketing 42,000 Java 8/12/199X Emp_ID Name Dept_Name Salary 100 Margaret Simpson Marketing 48,000 140 Allen Beet Accounting 52,000 110 Chris Lucero Info. System 43,000 190 Lorenzo Davis Finance 55,000 150 Sususan Martin Marketing 42,000 Emp_IDCourse_ Date_ Title Completed 100 SPSS 6/19/199X 100 Surveys 10/7/199X 140 Tax Acc 12/8/199X 110 SPSS 1/12/199X 110 C++ 4/22/199X 150 SPSS 6/19/199X 150 Java 8/12/199X Fig. 6-7: Normalized Relations from EMPLOYEE2 EMPLOYEE2 EMPLOYEE1 EMP_COURSE
Functional Dependencies and Keys • Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute. • Candidate Key:An attribute, or combination of attributes, that uniquely identifies a row in a relation. • A candidate key is always a determinant, while a determinant may or may not be a candidate key. • Each non-key field is functionally dependent on every candidate key.
Fig. 6-22: Steps in normalization
First Normal Form • Normal form is a state of a relation that results from applying simple rules regarding functional dependencies (or relationships between attributes) to that relation. • No multi-valued attributes. • Every attribute value is atomic. • Fig. 6-2a, b.
Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X Java 8/12/199X Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X 100 Margaret Simpson Marketing 48,000 Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X 110 Chris Lucero Info. System 43,000 C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X 150 Susan Martin Marketing 42,000 Java 8/12/199X
Second Normal Form • 1NF and every non-key attribute is fully functionally dependent on the primary key. • Every non-key attribute must be defined by the entire key (either a single PK or a CK), not by only part of the key. • No partial functional dependencies. • Fig. 6-1,6-23a
Fig. 6-23: A Process of 1NF to 2NF (EMPLOYEE2 - - 1NF) Emp_ID Course_Title Name Dept_Name Salary Date_Completed Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X 100 Margaret Simpson Marketing 48,000 Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X 110 Chris Lucero Info. System 43,000 C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X 150 Susan Martin Marketing 42,000 Java 8/12/199X (b) Functional Dependencies in EMPLOYEE2 Partial Depend.
Fig. 6-23: A Process of 1NF to 2NF Emp_ID Name Dept_Name Salary 100 Margaret Simpson Marketing 48,000 140 Allen Beeton Accounting 52,000 110 Chris Lucero Info. System 43,000 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 Emp_ID Name Dept_Name Salary EMPLOYEE1 (a) Functional dependencies in EMPLOYEE1 (2NF)
Emp_ID Name Dept_Name Salary Emp_ID Course_Title Date_Completed Fig. 6-23: Summary on Normalization EMPLOYEE2 Emp_ID Course_Title Name Dept_Name Salary Date_Completed Partial Depend. EMPLOYEE1 EMP_COURSE 2NF 3NF ?
Emp_ID Name Dept_Name Salary Course_ Date_ Title Completed 100 Margaret Simpson Marketing 48,000 SPSS 6/19/199X 100 Margaret Simpson Marketing 48,000 Surveys 10/7/199X 140 Allen Beeton Accounting 52,000 Tax Acc 12/8/199X 110 Chris Lucero Info. System 43,000 SPSS 1/12/199X 110 Chris Lucero Info. System 43,000 C++ 4/22/199X 190 Lorenzo Davis Finance 55,000 150 Susan Martin Marketing 42,000 SPSS 6/16/199X 150 Susan Martin Marketing 42,000 Java 8/12/199X Emp_IDCourse_ Date_ Title Completed 100 SPSS 6/19/199X 100 Surveys 10/7/199X 140 Tax Acc 12/8/199X 110 SPSS 1/12/199X 110 C++ 4/22/199X 150 SPSS 6/19/199X 150 Java 8/12/199X Emp_ID Name Dept_Name Salary 100 Margaret Simpson Marketing 48,000 140 Allen Beet Accounting 52,000 110 Chris Lucero Info. System 43,000 190 Lorenzo Davis Finance 55,000 150 Sususan Martin Marketing 42,000 Fig. 6-23: Summary on Normalization EMPLOYEE2 (1NF) EMPLOYEE1 (3NF) EMP_COURSE (3NF)
Third Normal Form • 2NF and no transitive dependencies (functional dependency between non-key attributes.) • Fig. 6-23, 6-24, 6-25.
Fig. 6-24: Relation with transitive dependency (a) SALES relation with simple data
What Anomalies might be in SALES relation? • Insertion anomaly ? • Deletion anomaly ? • Modification anomaly ?
Fig. 6-24: (b) Transitive dependency in SALES relation Cust_ID ---> Name,Salesperson, Region and Salesperson ---> Region therefore ... Cust_ID ---> Region
Fig. 6-25: Removing a transitive dependency (a) Decomposing the SALES relation
Snum Origin Destination Distance 409 Seattle Denver 1,537 618 Chicago Dallas 1,058 723 Boston Atlanta 1,214 824 Denver Las Angeles 1,150 629 Minneapolis St. Louis 587 353 Seattle Denver 1,537 Fig. 6-26: Another example with transitive dependencies Insertion anomaly? Deletion anomaly? Modification anomaly?
SHIPMENT Snum Origin Destination Distance Snum Origin Destination 409 Seattle Denver 618 Chicago Dallas 723 Boston Atlanta 824 Denver Las Angeles 629 Minneapolis St. Louis 353 Seattle Denver Origin Destination Distance Seattle Denver 1,537 Chicago Dallas 1,058 Boston Atlanta 1,214 Denver Las Angeles 1,150 Minneapolis St. Louis 587 Seattle Denver 1,537 Fig. 6-26: Another example with transitive dependencies
Fig. 6-22: Steps in normalization