920 likes | 1.38k Views
Building the Database E-R Diagrams and Normalization of Database Tables. MIS 304 Winter 2006. Review of Last Class. The relational database is set of linked tables. The links are accomplished by using keys: Primary keys to identify a unique value in each row (entity instance).
E N D
Building the Database E-R Diagrams and Normalization of Database Tables MIS 304 Winter 2006
Review of Last Class • The relational database is set of linked tables. • The links are accomplished by using keys: • Primary keys to identify a unique value in each row (entity instance). • Foreign keys use an attribute from another table(s) primary key to connect the tables. • In order to query two linked tables we need to do a join. • Joins are powerful but potentially costly in terms of computer resources.
Class Objective • Fill out our understanding of E-R diagrams. • Look at how databases get “loaded” • Understand the concept of a “Normal Form” • Understand why these concepts are important. • Be able to move a database or table from one level “Form” to another.
The Building Blocks of an E-R Diagram • Entity • Attribute • Relationship • Identifier
Data Models: Degrees of Data Abstraction Figure 3.1
The Entity Relationship (E-R) Model • Represents conceptual view • Main Components • Entities • Corresponds to entire table, not row • Represented by rectangle • Attributes • Relationships
Attributes • Characteristics of entities • Domain is set of possible values • Primary keys underlined Figure 3.6
The Entity Relationship (E-R) Model • A derived attribute is not physically stored within the database; instead, it is derived by using an algorithm. • Example: AGE can be derived from the data of birth and the current date.
Multivalued Attribute in Relational DBMS • The relational DBMS cannot implement multivalued attributes. • Possible courses of action for the designer • Within the original entity, create several new attributes, one for each of the original multivalued attribute’s components (Figure 4.9). • Create a new entity composed of the original multivalued attribute’s components
A New Entity Set Composed of Multivalued Attribute’s Components
Splitting the Multivalued Attributes into New Attributes
Connectivity and Cardinality in an ERD Figure 3.12
Weak Entity • Existence-dependent on another entity • Has primary key that is partially or totally derived from parent entity
Relationship Participation • The participation is optionalif one entity occurrence does not require a corresponding entity occurrence in a particular relationship. • An optional entity is shown by a small circle on the side of the optional entity.
Relationship Degree Unary • Single entity, “Recursive” • Exists between occurrences of same entity set • Binary • Two entities associated • Ternary • Three entities associated
The Crow’s Foot Representation of the Invoicing Problem
Database Tables and Normalization • Normalization is a process for assigning attributes to entities. It reduces data redundancies and helps eliminate the data anomalies. • Normalization works with data modeling to make the database more useful. • The highest level of normalization is not always the most desirable.
Normalization Stages • Normalization works through a series of stages called normal forms: • First normal form (1NF) • Second normal form (2NF) • Third normal form (3NF) • Fourth normal form (4NF) • Fifth normal form (5NF)
The Need for Normalization • Case of a Construction Company • Building project -- Project number, Name, Employees assigned to the project. • Employee -- Employee number, Name, Job classification • The company charges its clients by billing the hours spent on each project. The hourly billing rate is dependent on the employee’s position. • Periodically, a report is generated. • The table whose contents correspond to the reporting requirements is shown in Table 5.1.
Artifacts • Companies have ingrained processes and procedures. • There are often visible by looking at the paper trail. • Data reports and forms are a big part of that trail. • You can look at the output artifacts as well as the input ones.
A Table Whose Structure Matches the Report Format
Database Tables and Normalization • Problems with the Figure 5.1 • The project number is intended to be a primary key, but it contains nulls. • The table displays data redundancies. • The table entries invite data inconsistencies. • The data redundancies yield the following anomalies: • Update anomalies. • Addition anomalies. • Deletion anomalies.
Conversion to First Normal Form • A relational table must not contain repeating groups. • Repeating groups can be eliminated by adding the appropriate entry in at least the primary key column(s).
What do you notice? • How much redundancy is there in the table? • What are the keys?
Tools to help in Normalization • Data Models • Dependency Diagrams, A new tool.
Data Models • Last time we saw how to use a data model to evaluate the Cost/Benefit of adding and removing relations between tables.
Dependency Diagram (1NF) • Dependency Diagram • The primary key components are bold, underlined, and shaded in a different color. • The arrows above entities indicate all desirable dependencies, i.e., dependencies that are based on PK. • The arrows below the dependency diagram indicate less desirable dependencies -- partial dependencies and transitive dependencies. Figure 4.4
Forms of Dependency • Dependency • Partial Dependency • Transitive Dependency
Database Tables and Normalization • 1NF Definition • The term first normal form (1NF) describes the tabular format in which: • All the key attributes are defined. • There are no repeating groups in the table. • All attributes are dependent on the primary key.
Conversion to Second Normal Form • Starting with the 1NF format, the database can be converted into the 2NF format by using the Dependency Diagram and: • Writing each key component on a separate line, and then writing the original key on the last line and • Writing the dependent attributes after each new key. PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
2NF Definition • A table is in 2NF if: • It is in 1NF and • It includes no partial dependencies; that is, no attribute is dependent on only a portion of the primary key. (It is still possible for a table in 2NF to exhibit transitive dependency; that is, one or more attributes may be functionally dependent on nonkey attributes.)
Effect on Queries • How does this differ from the 1NF version. • How does this effect the SQL queries you are likely to run on the database. • Number of “Joins” • Number of “Sub Queries”
Conversion to Third Normal Form • Create a separate table with attributes in a transitive functional dependence relationship. PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM, EMP_NUM, HOURS) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR)
3NF Definition • A table is in 3NF if: • It is in 2NF and • It contains no transitive dependencies.
The Query Effects • Now how many “Joins” do we have? • What is stored where?
Additional DB Enhancements Figure 4.6
Boyce-Codd Normal Form (BCNF) • A table is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key. (A determinant is any attribute whose value determines other values with a row.) • If a table contains only one candidate key, the 3NF and the BCNF are equivalent. • BCNF is a special case of 3NF.
3NF Table Not in BCNF Figure 4.7
Sample Data for a BCNF Conversion Table 5.2
Decomposition into BCNF Figure 4.9
Normalization and Database Design • Database Design and Normalization Example:(Construction Company) • Summary of Operations: • The company manages many projects. • Each project requires the services of many employees. • An employee may be assigned to several different projects. • Some employees are not assigned to a project and perform duties not specifically related to a project. Some employees are part of a labor pool, to be shared by all project teams. • Each employee has a (single) primary job classification. This job classification determines the hourly billing rate. • Many employees can have the same job classification.
Normalization and Database Design • Normalization should be part of the design process • E-R Diagram provides macro view • Normalization provides micro view of entities • Focuses on characteristics of specific entities • May yield additional entities • Difficult to separate normalization from E-R diagramming • Business rules must be determined