1 / 39

Normalizing Your Database

Redundancy breeds errors. Same data defined in multiple places is BAD ... CAT 333-22-1111 Admin. 25. What is wrong with the following?

Kelvin_Ajay
Download Presentation

Normalizing Your Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    Slide 1:Normalizing Your Database and Why you WANT to do it!

    INFYS540 Lesson 7 Chapter 5 Appendix

    Slide 2:Why do we make our “databases” in spreadsheets?

    We use a few massive tables “Lots of tables make the database complex” Discomfort with databases and multiple tables Because we “think it’s simple” Skip organizing the data into relational tables Go straight to designing forms NAME POSITION SPOUSE CHILDREN PHONE Jones Chief Gloria, Karen 3274 Smith Clerk Betty 3241 Jones Chief Mary Glorai, Karen 3296 these problems are calld data redundancy problem: same data stored many times in different placesthese problems are calld data redundancy problem: same data stored many times in different places

    Slide 3:Data Redundancy Problems

    Redundancy breeds errors Same data defined in multiple places is BAD Spelling/typographical error prone Lack of data integrity Inability to perform simple queries Inflexibility and inscalability Impossible to MAINTAIN!

    Slide 4:Shared Data

    Poorly organized data prevents sharing that data with other “databases” Think of all the “databases” that lists your name, department, etc.: Messiah College Phone List Database Students Using College Networked Computers Students Using Dining Facility Students Using Nursing Facility if can’t share data, can’t share database we haven't even discussed the clinic, post office, library.....if can’t share data, can’t share database we haven't even discussed the clinic, post office, library.....

    Slide 5:Relational Database

    PROJECTCHIEF Project Project Chief Computing 333-22-1111 Intranet 987-65-4321 Contracting 123-45-6789 CAT 333-22-1111 DEPARTMENTS Dept Dept. Director Room MLD 181-94-5676 B115 C2G 987-65-4321 123 M&B 123-45-6789 147 EMPLOYEES LName FName SSN Dept Jones Mike 123-45-6789 M&B Smith Tony 987-65-4321 C2G Lee Bruce 567-89-1234 MLD Doodle Yankee 333-22-1111 M&B 1 1 What is a candidate key? What is a primary key? What is a foreign key? relational database works by linking key fields and extracting important information by following links without data redundancy There are other types of databases: object oriented, hyperlink, knowledge bases from this point forward I will only be talking about relational databases DB1relational database works by linking key fields and extracting important information by following links without data redundancy There are other types of databases: object oriented, hyperlink, knowledge bases from this point forward I will only be talking about relational databases DB1

    Slide 6:Database Management System

    Computer program designed to help a user store and retrieve data Access, Oracle, DB2

    abcd 1234 defg 12 asd 123 kghb 112 abcd 1234 defg 12 asd 123 kghb 112 abcd 1234 123 asd 123 12 kghb 112 34

    Slide 7:DBMS

    abcd 1234 defg 12 asd 123 kghb 112 tables forms query abcd 1234 defg 12 asd 123 kghb 112 abcd 1234 123 asd 123 12 kghb 112 34 reports labels Data Dictionary Outside Applications DBMS Engine attached import DBMS - the brain of the system -- tables-- can link to tables through networks many systems use same tables form - convenient way to layout entering, changing, or viewing records you can include lists of values to choose from, use color to highlight, display messages when incorrect values are entered, have Access automatically fill in data display the results of calculations reports - display data on printed page with totals automatically calculated and displayed statistical analysis, mailing labels link to outside applications Security/encryption Error Correction In this class we will be concerned with relational databasesDBMS - the brain of the system -- tables-- can link to tables through networks many systems use same tables form - convenient way to layout entering, changing, or viewing records you can include lists of values to choose from, use color to highlight, display messages when incorrect values are entered, have Access automatically fill in data display the results of calculations reports - display data on printed page with totals automatically calculated and displayed statistical analysis, mailing labels link to outside applications Security/encryption Error Correction In this class we will be concerned with relational databases

    Slide 8:First Things First Purpose of the DB

    Who will use it What type of tasks What are the data sources What output is required

    Slide 9:Data Modeling Determine Data Requirements

    Entity Class something that can be identified in environment each entity class is a separate table each entity becomes a separate row in a table Attributes property or characteristic of entity each characteristic of an entity class become a column each characteristic of an entry become an entry in table Keys one or more attributes that uniquely identified an entity Constraints values or rules the DBMS must enforce

    Slide 10:Example

    Employee SSN L Name F Name Rank Spouse Children Office Phone# Home Phone# Office Room# Dept Dept. Chief EmpProj Project Name Employee SSN Function Must know all constraints on data project name is unique only one chief per project employees can have more than one phone# employees can have only one office many employees can use the same office there are many other constraints you must know about, like SSN uniquely identifies each person DBMS can enforce the constraints automatically there are many other constraints you must know about, like SSN uniquely identifies each person DBMS can enforce the constraints automatically

    Slide 11:Purpose of Normalization

    Take advantage of the powerful tools available in a DBMS There are five levels of Normalization The higher the Normal Form the “better” and more efficient the database But, increasing the levels of Normal Form takes time and effort For most applications, 3rd Normal Form will solve most potential problems with a DB

    Slide 12:Normalizing Database

    Process of creating well-structured tables. Improve performance, integrity of data 5-step process (w/ 2 rules) to achieve Third Normal Form (3NF) First two steps put DB into a form so you can normalize it Normalization is the process of evaluating a table and if necessary breaking it down into sub components. can go beyond 3rd normal form, but that is beyond the scope of this class just as writing a paper, each paragraph must have a single theme, in database design each table must have a single theme, if not break the table into two or more tables. Normalization is the process of evaluating a table and if necessary breaking it down into sub components. can go beyond 3rd normal form, but that is beyond the scope of this class just as writing a paper, each paragraph must have a single theme, in database design each table must have a single theme, if not break the table into two or more tables.

    Slide 13:Rule #1 in Databases

    Never design redundant data into a Database duplicate data is not consistent duplicate data wastes space if employee changes position/rank every file that contains that data must be updated so, how do you reduce redundancy... if employee changes position/rank every file that contains that data must be updated so, how do you reduce redundancy...

    Slide 14:Step 1. Primary Keys

    A primary key is one or more data fields (columns) that uniquely identify each record in the table What would the primary key be below? “table of employees, assigned to a department.” EMPLOYEES LName FName SSN Dept Jones Mike 123-45-6789 Math Smith Tony 987-65-4321 M&B Lee Bruce 567-89-1234 Science using access, set primary key in Design Tableusing access, set primary key in Design Table

    Slide 15:Step 1. Primary Keys

    Answer: The SSN It is the only “guaranteed” unique column in the table. Names are easily repeated. EMPLOYEES LName FName SSN Dept Jones Mike 123-45-6789 Math Smith Tony 987-65-4321 M&B Lee Bruce 567-89-1234 Science

    Slide 16:Step 1. Primary Keys

    Now try the following example: “A table of projects assigned to employees, listing the project name and the employee’s function on the project.” EmpProj Counter SSN Project Function 1 123-45-6789 Dining Designer 2 123-45-6789 Computing Designer 3 987-65-4321 Contracting Designer 4 444-55-6666 Intranet Webmaster 5 222-99-7777 Dining Overwatch A Counter --The MS Access Default Key

    Slide 17:Step 1. Primary Keys

    It is the combination of the SSN and the Project fields. Why? EMPLOYEES’ PROJECTS Counter SSN Project Function 1 123-45-6789 Dining Designer 2 123-45-6789 Computing Designer 3 987-65-4321 Contracting Designer 4 444-55-6666 Intranet Webmaster 5 222-99-7777 Dining Overwatch

    Slide 18:Step 1. Primary Keys

    Because, you can have the following: EMPLOYEES’ PROJECTS Counter SSN Project Function 1 123-45-6789 Dining Designer 2 123-45-6789 Dining Designer 3 987-65-4321 Intranet Designer 4 444-55-6666 Intranet Webmaster 5 222-99-7777 Dining Overwatch Redundant records! (Redundancy = BAD)

    Slide 19:Rule #2 about Databases

    NEVER Use a Counter as a Primary Key

    Slide 20:Step 2: Eliminate Many-to-Many Relationships

    What is wrong with the following table? “a table of personnel authorized access to a project” PROJECTS QUERY ACCESS Project Access_1 Access_2 Access_3 Dining 222-99-7777 181-94-5676 Computing 222-99-7777 181-94-5676 Intranet 987-65-4321 818-49-6765 123-45-6789 You have a table that is relating many projects to many people having access. The table restricts you to 3 persons authorized. The table wastes space if less than 3 are authorized How can you easily query it to see who has access? Must query on Access_1, Access_2, and Access_3 fields Must use a union query to combine the results waitYou have a table that is relating many projects to many people having access. The table restricts you to 3 persons authorized. The table wastes space if less than 3 are authorized How can you easily query it to see who has access? Must query on Access_1, Access_2, and Access_3 fields Must use a union query to combine the results wait

    Slide 21:Step 2: Eliminate Many-to-Many Relationships

    Here’s essentially what this table looks like within the Access relationships diagram: Projects: Project Project Chief Department Access_1 Access_2 Access_3 Employees: SSN Last Name First Name .... has access to info about

    Slide 22:Step 2: Eliminate Many-to-Many Relationships

    Here’s how you model it in a database: Break it up into two one-to-many relationships Projects: Project Project Chief Department .... Employees: SSN Last Name First Name .... Access to Project Info: Project SSN 1 1

    Slide 23:Step 2: Eliminate Many-to-Many Relationships

    How to do it: The primary key of the new table is the composite of the primary keys of the existing tables. Primary key of Projects = Project Name Primary key of Employees = SSN New table primary key of Project Name and SSN DB2DB2

    Slide 24:Step 2: Eliminate Many-to-Many Relationships

    No artificial restrictions on number of people with access You can add attributes about the types of access granted You can easily query who has access to information about each project EMPLOYEE LName FName SSN Jones Mike 123-45-6789 Smith Tony 987-65-4321 Lee Bruce 567-89-1234 Doodle Yankee 333-22-1111 PROJ QUERY ACCESS Project SSN Dining 222-99-7777 Dining 181-94-5676 Computing 222-99-7777 Computing 181-94-5676 Intranet 987-65-4321 Intranet 818-49-6765 Intranet 123-45-6789 PROJECT Project ProjectChief Dept Computing 333-22-1111 MATH Intranet 987-65-4321 M&B Contracting 123-45-6789 M&B CAT 333-22-1111 Admin

    Slide 25:What is wrong with the following?

    “A table of PCs, which are loaded with many different applications, and assigned to a user.” PCSerial# LoadedSoftware Assigned 10291 Word, Powerpoint, ccMail Jones 10301 Word, Powerpoint, Lotus Notes Smith 10311 Word, LotusNotes, Borland C++ Hacker first, there is a space between LotusNotes no integrity checkingfirst, there is a space between LotusNotes no integrity checking

    Slide 26:“Atomic” - the data occupying a field cannot be further broken down. i.e., no multi-data entries i.e., “No attributes can have more than one value for a single instance of an entity” PCSerial# LoadedSoftware Assigned 10291 Word, Powerpoint, ccMail Jones If not atomic, updating is complex and error prone If not atomic, can not easily query the database

    Step 3: Achieving 1NF: All Data must be Atomic Take five minutes and put this table into 1NF.Take five minutes and put this table into 1NF.

    Slide 27:Step 3 Answer

    PCSerial# LoadedSoftware Assigned 10291 Word Jones 10291 Powerpoint Jones 10291 ccMail Jones 10301 Word Smith 10301 Powerpoint Smith 10301 LotusNotes Smith 10311 Word Hacker 10311 LotusNotes Hacker 10311 Borland C++ Hacker

    Slide 28:Step 3. Achieving 1NF: All Data must be Atomic

    Another source of redundancy: calculated fields TotalYTD Age DaysRemaining Solution: Use a Query! Remove all calculated fields from table and create a query ...then use the query whenever you need up-to-date data

    Slide 29:Step 4. Achieving 2NF: Eliminate Partial Dependencies

    What is a partial dependency? Look at the table. What’s redundant? “A table of functions an employee is assigned to for a project, and the project chief.” EMPLOYEES’ PROJECTS SSN Project Function Project Chief 123-45-6789 Dining Designer 222-99-7777 123-45-6789 Computing Designer 333-88-5656 123-45-6789 Intranet Member 987-65-4321 987-65-4321 Intranet Designer 987-65-4321 444-55-6666 Intranet Webmaster 987-65-4321 222-99-7777 Dining Overwatch 222-99-7777

    Slide 30:Step 4. Achieving 2NF: Eliminate Partial Dependencies

    Function depends on the entire primary key: SSN and Project. ProjectChief is dependent on just a portion of the primary key if eliminate 123456789 you loose the fact that 333885656 is the Computing chief likewise can’t add a project chief unless assign an employee to a projectif eliminate 123456789 you loose the fact that 333885656 is the Computing chief likewise can’t add a project chief unless assign an employee to a project

    Slide 31:Step 4. Achieving 2NF: Eliminate Partial Dependencies

    Why is this bad? Well, what’s wrong with the following? EMPLOYEES’ PROJECTS SSN Project Function Project Chief 123-45-6789 Dining Designer 222-99-7777 123-45-6789 Computing Designer 333-88-5656 123-45-6789 Intranet Member 987-65-4321 987-65-4321 Intranet Designer 987-65-4321 444-55-6666 Intranet Webmaster 222-99-7777 222-99-7777 Dining Overwatch 222-99-7777

    Slide 32:Step 4. Achieving 2NF: Eliminate Partial Dependencies

    A partial dependency (PD) occurs when a non-key field depends on only a part of the primary key, and not the whole primary key. PDs are a relation. So, we need a new table..... EMPLOYEES’ PROJECTS SSN Project Function Project Chief 123-45-6789 Dining Designer 222-99-7777 123-45-6789 Computing Designer 333-88-5656 123-45-6789 Intranet Member 987-65-4321 987-65-4321 Intranet Designer 987-65-4321 444-55-6666 Intranet Webmaster 987-65-4321 222-99-7777 Dining Overwatch 222-99-7777

    Slide 33:Step 4. Achieving 2NF: Eliminate Partial Dependencies

    Here’s how it should look...... EMPLOYEES’ PROJECTS SSN Project Function 123-45-6789 Dining Designer 123-45-6789 Computing Designer 123-45-6789 Intranet Member 987-65-4321 Intranet Designer 444-55-6666 Intranet Webmaster 222-99-7777 Dining Overwatch PROJECTS Project Project Chief Dining 222-99-7777 Computing 333-88-5656 Intranet 987-65-4321

    Slide 34:Step 5: Achieving 3NF: Eliminate Transitive Dependencies

    What is wrong with the following table? PROJECTS Project Project Chief Dept. Dept. Director Room Dining 222-99-7777 Admin 181-94-5676 B115 Computing 333-88-5656 Admin 181-94-5676 B115 Intranet 987-65-4321 M&B 818-49-6765 123 Contracting 187-87-8787 M&B 818-49-6765 123 CAT 333-22-1111 Grounds 123-45-6789 147

    Slide 35:Step 5: Achieving 3NF: Eliminate Transitive Dependencies

    We have fields dependent on a non-key field: The Director and Room fields clearly relate to the Dept., and have nothing to do with the project. (Dept is a “determinant” that is not a candidate key) PROJECTS Project Project Chief Dept. Dept. Director Room Dining 222-99-7777 Admin 181-94-5676 B115 Computing 333-88-5656 Admin 181-94-5676 B115 Intranet 987-65-4321 M&B 818-49-6765 123 Contracting 187-87-8787 M&B 818-49-6765 123 CAT 333-22-1111 GRND 123-45-6789 147

    Slide 36:Step 5: Achieving 3NF: Eliminate Transitive Dependencies

    A transitive dependency occurs when a non-key field depends on another non-key field. Why is this bad?. A typo appeared in the Contracting line. A database without the transitive dependency would not have allowed this to happen. PROJECTS Project Project Chief Dept. Dept. Director Room Dining 222-99-7777 Admin 181-94-5676 B115 Computing 333-88-5656 Admin 181-94-5676 B115 Intranet 987-65-4321 M&B 818-49-6765 123 Contracting 187-87-8787 M&B 818-49-6765 124 CAT 333-22-1111 GRND 123-45-6789 147 also deleting the cat tool eliminated the GRND from the database can’t add a dept without adding projectsalso deleting the cat tool eliminated the GRND from the database can’t add a dept without adding projects

    Slide 37:Step 5: Achieving 3NF: Eliminate Transitive Dependencies

    How to do it: a. Which fields are dependent on a non-key field in the table? (Director, Room) b. Which fields are these dependent on? (Dept) c. Create a new table with (b) as the primary key. d. Put (a) in the new table. e. Remove (a) from the old table.

    Slide 38:Step 5: Achieving 3NF: Eliminate Transitive Dependencies

    Here are the new tables. PROJECTS Project Project Chief Dept. Dining 222-99-7777 Admin Computing 333-88-5656 Admin Intranet 987-65-4321 M&B Contracting 187-87-8787 M&B CAT 333-22-1111 GRND DEPARTMENTS Dept. Name Dept. Director Room Admin 181-94-5676 B115 M&B 818-49-6765 123 GRND 123-45-6789 147

    Slide 39:Data Analysis: Normalization An entity is in first normal form (1NF) if there are no attributes that can have more than one value for a single instance of the entity. An entity is in second normal form (2NF) if it is already in 1NF, and if the values of all non-primary key attributes are dependent on the full primary key – not just part of it. An entity is in third normal form (3NF) if it is already in 2NF, and if the values of its non-primary key attributes are not dependent on any other non-primary key attributes.

    Any attributes that can have multiple values actually describe a separate entity, possibly an entity (and relationship) that we haven’t yet included in our data model . Any non-key attributes that are dependent on only part of the primary key should be moved to any entity where that partial key becomes the full key. Again, this may require creating a new entity and relationship on the model. Any non-key attributes that are dependent on other non-key attributes must be moved or deleted. Again, new entities and relationships may have to be added to the data model. Any attributes that can have multiple values actually describe a separate entity, possibly an entity (and relationship) that we haven’t yet included in our data model . Any non-key attributes that are dependent on only part of the primary key should be moved to any entity where that partial key becomes the full key. Again, this may require creating a new entity and relationship on the model. Any non-key attributes that are dependent on other non-key attributes must be moved or deleted. Again, new entities and relationships may have to be added to the data model.

    Slide 40:Common Sense Test

    Sometimes it is not worth normalizing a table example: zip codes is a functional dependency city/state are attributes of the zip code and not a person’s address may not want to normalize a table if it is significantly easier to process as is duplicates are not important

    Slide 41:Conclusion

    Rule1: Never design redundant data into a database Rule2: Never use a counter as Primary Key Identify proper primary keys (1NF) Break up many-to-many relationships (1NF) 1NF: Break all data into atomic components 2NF: Identify/eliminate partial dependencies 3NF: Eliminate transitive dependencies Common sense test

    Slide 42:What is a Good Data Model? A good data model is simple. As a general rule, the data attributes that describe an entity should describe only that entity. A good data model is essentially non-redundant. This means that each data attribute, other than foreign keys, describes at most one entity. A good data model should be flexible and adaptable to future needs. We should make the data models as application-independent as possible to encourage database structures that can be extended or modified without impact to current programs.

    While a data model effectively communicates database requirements, it does not necessarily represent a good database design. It may contain structural characteristics that reduce flexibility and expansion, or create unnecessary redundancy. Therefore, we must ‘prepare’ the data model for database design and implementation. While a data model effectively communicates database requirements, it does not necessarily represent a good database design. It may contain structural characteristics that reduce flexibility and expansion, or create unnecessary redundancy. Therefore, we must ‘prepare’ the data model for database design and implementation.

    Slide 43:Database Design

    Introduction The design of any database will usually involve the DBA and database staff. They will handle the technical details and cross-application issues. It is useful for the systems analyst to understand the basic design principles for relational databases. No additional notes provided.No additional notes provided.

    Slide 44:Goals and Prerequisites to Database Design The data model may have to be divided into multiple data models to reflect database distribution and database replication decisions. Data distribution refers to the distribution of either specific tables, records, and/or fields to different physical databases. Data replication refers to the duplication of specific tables, records, and/or fields to multiple physical databases. Each sub-model or view should reflect the data to be stored on a single server.

    No additional notes provided.No additional notes provided.

    Slide 45:The Database Schema The design of a database is depicted as a special model called a database schema. A database schema is the physical model or blueprint for a database. It represents the technical implementation of the logical data model. A relational database schema defines the database structure in terms of tables, keys, indexes, and integrity rules. A database schema specifies details based on the capabilities, terminology, and constraints of the chosen database management system.

    No additional notes provided.No additional notes provided.

    Slide 46:The Database Schema Transforming the logical data model into a physical relational database schema rules and guidelines: Each fundamental, associative, and weak entity is implemented as a separate table. The primary key is identified as such and implemented as an index into the table. Each secondary key is implemented as its own index into the table. Each foreign key will be implemented as such. Attributes will be implemented with fields. These fields correspond to columns in the table.

    No additional notes provided.No additional notes provided.

    Slide 47:The Database Schema Transforming the logical data model into a physical relational database schema rules and guidelines: (continued) The following technical details must usually be specified for each attribute. Data type. Each DBMS supports different data types, and terms for those data types. Size of the Field. Different DBMSs express precision of real numbers differently. NULL or NOT NULL. Must the field have a value before the record can be committed to storage? Domains. Many DBMSs can automatically edit data to ensure that fields contain legal data. Default. Many DBMSs allow a default value to be automatically set in the event that a user or programmer submits a record without a value.

    Data type. For example, different systems may designate a large alphanumeric field differently (e.g., MEMO in Access and LONG VARCHAR in Oracle). Also, some databases allow the choice of no compression versus compression of unused space (e.g., CHAR versus VARCHAR in Oracle). Size of the Field. For example, in Oracle, a size specification of NUMBER (3,2) supports a range from -9.99 to 9.99. NULL or NOT NULL. Again, different DBMSs may require different reserved words to express this property. Primary keys can never be allowed to have null values. Domains. This can be a great benefit to ensuring data integrity independent from the application programs. If the programmer makes a mistake, the DBMS catches the mistake. But for DBMSs that support data integrity, the rules must be precisely specified in a language that is understood by the DBMS. Many of the above specifications were documented as part of a complete data model. If that data model was developed with a CASE tool, the CASE tool may be capable of automatically translating the data model into the language of the chosen database technology. Data type. For example, different systems may designate a large alphanumeric field differently (e.g., MEMO in Access and LONG VARCHAR in Oracle). Also, some databases allow the choice of no compression versus compression of unused space (e.g., CHAR versus VARCHAR in Oracle). Size of the Field. For example, in Oracle, a size specification of NUMBER (3,2) supports a range from -9.99 to 9.99. NULL or NOT NULL. Again, different DBMSs may require different reserved words to express this property. Primary keys can never be allowed to have null values. Domains. This can be a great benefit to ensuring data integrity independent from the application programs. If the programmer makes a mistake, the DBMS catches the mistake. But for DBMSs that support data integrity, the rules must be precisely specified in a language that is understood by the DBMS. Many of the above specifications were documented as part of a complete data model. If that data model was developed with a CASE tool, the CASE tool may be capable of automatically translating the data model into the language of the chosen database technology.

    Slide 48:The Database Schema Transforming the logical data model into a physical relational database schema rules and guidelines: (continued) Supertype/subtype entities present additional options as follows: Most CASE tools do not currently support object-like constructs such as supertypes and subtypes. Most CASE tools default to creating a separate table for each entity supertype and subtype. If the subtypes are of similar size and data content, a database administrator may elect to collapse the subtypes into the supertype to create a single table. Evaluate and specify referential integrity constraints.

    Would you ever want to compromise the third normal form entities when designing the database? For example, would you ever want to combine two third normal form entities into a single table (that would, by default, no longer be in third normal form)? Usually not! Although a DBA may create such a compromise to improve database performance, he or she should carefully weigh the advantages and disadvantages. Although such compromises may mean greater convenience through fewer tables or better overall performance, such combinations may also lead to the possible loss of data independence—should future, new fields necessitate resplitting the table into two tables, programs will have to be rewritten. As a general rule, combining entities into tables is not recommended. Please refer to figure 11.16 on page 422 in the textbook. The authors apologize that this figure is not available at this time.Would you ever want to compromise the third normal form entities when designing the database? For example, would you ever want to combine two third normal form entities into a single table (that would, by default, no longer be in third normal form)? Usually not! Although a DBA may create such a compromise to improve database performance, he or she should carefully weigh the advantages and disadvantages. Although such compromises may mean greater convenience through fewer tables or better overall performance, such combinations may also lead to the possible loss of data independence—should future, new fields necessitate resplitting the table into two tables, programs will have to be rewritten. As a general rule, combining entities into tables is not recommended. Please refer to figure 11.16 on page 422 in the textbook. The authors apologize that this figure is not available at this time.

    Slide 49:Data and Referential Integrity There are at least three types of data integrity that must be designed into any database - key integrity, domain integrity and referential integrity. Key Integrity: Every table should have a primary key (which may be concatenated). The primary key must be controlled such that no two records in the table have the same primary key value. The primary key for a record must never be allowed to have a NULL value.

    No additional notes provided.No additional notes provided.

    Slide 50:Data and Referential Integrity Domain Integrity: Appropriate controls must be designed to ensure that no field takes on a value that is outside of the range of legal values. Referential Integrity: A referential integrity error exists when a foreign key value in one table has no matching primary key value in the related table.

    For example, if GRADE POINT AVERAGE is defined to be a number between 0.00 and 4.00, then controls must be implemented to prevent negative numbers and numbers greater than 4.00. Not long ago, application programs were expected to perform all data editing. Today, most database management systems are capable of data editing. For the foreseeable future, the responsibility for data editing will continue to be shared between the application programs and the DBMS. The architecture of relational databases implements relationships between the records in tables via foreign keys. The use of foreign keys increases the flexibility and scalability of any database, but it also increases the risk of referential integrity errors. For example, an INVOICES table usually includes a foreign key, CUSTOMER NUMBER, to ‘reference back to’ the matching CUSTOMER NUMBER primary key in the CUSTOMERS table. What happens if we delete a CUSTOMER record? There is the potential that we may have INVOICE records whose CUSTOMER NUMBER has no matching record in the CUSTOMERS table. Essentially, we have compromised the referential integrity between the two tables. For example, if GRADE POINT AVERAGE is defined to be a number between 0.00 and 4.00, then controls must be implemented to prevent negative numbers and numbers greater than 4.00.Not long ago, application programs were expected to perform all data editing. Today, most database management systems are capable of data editing. For the foreseeable future, the responsibility for data editing will continue to be shared between the application programs and the DBMS. The architecture of relational databases implements relationships between the records in tables via foreign keys. The use of foreign keys increases the flexibility and scalability of any database, but it also increases the risk of referential integrity errors.For example, an INVOICES table usually includes a foreign key, CUSTOMER NUMBER, to ‘reference back to’ the matching CUSTOMER NUMBER primary key in the CUSTOMERS table. What happens if we delete a CUSTOMER record? There is the potential that we may have INVOICE records whose CUSTOMER NUMBER has no matching record in the CUSTOMERS table. Essentially, we have compromised the referential integrity between the two tables.

    Slide 51:Referential Integrity: Referential integrity is specified in the form of deletion rules as follows: No restriction. Any record in the table may be deleted without regard to any records in any other tables. Delete:Cascade. A deletion of a record in the table must be automatically followed by the deletion of matching records in a related table. Delete:Restrict. A deletion of a record in the table must be disallowed until any matching records are deleted from a related table. Delete:Set Null. A deletion of a record in the table must be automatically followed by setting any matching keys in a related table to the value NULL.

    No additional notes provided. No additional notes provided.

    Slide 52:Database Design

    Roles Some database shops insist that no two fields have exactly the same name. This presents an obvious problem with foreign keys A role name is an alternate name for a foreign key that clearly distinguishes the purpose that the foreign key serves in the table. The decision to require role names or not is usually established by the data or database administrator. Some database shops insist that no two fields have exactly the same name. This constraint serves to simplify documentation, help systems, and metadata definitions. This presents an obvious problem with foreign keys. By definition, a foreign key must have a corresponding primary key. During logical data modeling, using the same name suited our purpose of helping the users understand that these foreign keys allow us to match up related records in different entities. But in a physical database, it is not always necessary or even desirable to have these redundant field names in the database.Some database shops insist that no two fields have exactly the same name. This constraint serves to simplify documentation, help systems, and metadata definitions. This presents an obvious problem with foreign keys. By definition, a foreign key must have a corresponding primary key. During logical data modeling, using the same name suited our purpose of helping the users understand that these foreign keys allow us to match up related records in different entities. But in a physical database, it is not always necessary or even desirable to have these redundant field names in the database.

    Slide 53:Database Prototypes Prototyping is not an alternative to carefully thought out database schemas. On the other hand, once the schema is completed, a prototype database can usually be generated very quickly. Most modern DBMSs include powerful, menu-driven database generators that automatically create a DDL and generate a prototype database from that DDL. A database can then be loaded with test data that will prove useful for prototyping and testing outputs, inputs, screens, and other systems components.

    No additional notes provided.No additional notes provided.

    Slide 54:Database Capacity Planning A database is stored on disk. The database administrator will want an estimate of disk capacity for the new database to ensure that sufficient disk space is available. Database capacity planning can be calculated with simple arithmetic as follows. For each table, sum the field sizes. This is the record size for the table. For each table, multiply the record size times the number of entity instances to be included in the table. This is the table size.

    This simple formula ignores factors should as packing, coding, and compression, but by leaving out those possibilities, you are adding slack capacity. For each table, sum the field sizes. This is the record size for the table. Avoid the implications of compression, coding, and packing – in other words, assume that each stored character and digit will consume one byte of storage. Note that formatting characters (e.g., commas, hyphens, slashes, etc.) are almost never stored in a database. Those formatting characters are added by the application programs that will access the database and present the output to the users. For each table, multiply the record size times the number of entity instances to be included in the table. It is recommended that growth be considered over a reasonable time period (e.g., three years). This is the table size. This simple formula ignores factors should as packing, coding, and compression, but by leaving out those possibilities, you are adding slack capacity. For each table, sum the field sizes. This is the record size for the table. Avoid the implications of compression, coding, and packing – in other words, assume that each stored character and digit will consume one byte of storage. Note that formatting characters (e.g., commas, hyphens, slashes, etc.) are almost never stored in a database. Those formatting characters are added by the application programs that will access the database and present the output to the users. For each table, multiply the record size times the number of entity instances to be included in the table. It is recommended that growth be considered over a reasonable time period (e.g., three years). This is the table size.

    Slide 55:Database Capacity Planning Database capacity planning can be calculated with simple arithmetic as follows. (continued) Sum the table sizes. This is the database size. Optionally, add a slack capacity buffer (e.g., 10%) to account for unanticipated factors or inaccurate estimates above. This is the anticipated database capacity.

    This simple formula ignores factors should as packing, coding, and compression, but by leaving out those possibilities, you are adding slack capacity. For each table, sum the field sizes. This is the record size for the table. Avoid the implications of compression, coding, and packing – in other words, assume that each stored character and digit will consume one byte of storage. Note that formatting characters (e.g., commas, hyphens, slashes, etc.) are almost never stored in a database. Those formatting characters are added by the application programs that will access the database and present the output to the users. For each table, multiply the record size times the number of entity instances to be included in the table. It is recommended that growth be considered over a reasonable time period (e.g., three years). This is the table size. This simple formula ignores factors should as packing, coding, and compression, but by leaving out those possibilities, you are adding slack capacity. For each table, sum the field sizes. This is the record size for the table. Avoid the implications of compression, coding, and packing – in other words, assume that each stored character and digit will consume one byte of storage. Note that formatting characters (e.g., commas, hyphens, slashes, etc.) are almost never stored in a database. Those formatting characters are added by the application programs that will access the database and present the output to the users. For each table, multiply the record size times the number of entity instances to be included in the table. It is recommended that growth be considered over a reasonable time period (e.g., three years). This is the table size.

    Slide 56:Database Structure Generation CASE tools are frequently capable of generating SQL code for the database directly from a CASE-based database schema. This code can be exported to the DBMS for compilation. Even a small database model can require 50 pages or more of SQL data definition language code to create the tables, indexes, keys, fields, and triggers. Clearly, a CASE tool’s ability to automatically generate syntactically correct code is an enormous productivity advantage. Furthermore, it almost always proves easier to modify the database schema and re-generate the code, than to maintain the code directly.

    Figure 11.8 on pages 427 and 428 is a sample page of code generated by System Architect from the SoundStage database schema. The authors apologize that this figure is not available at this time. Figure 11.8 on pages 427 and 428 is a sample page of code generated by System Architect from the SoundStage database schema. The authors apologize that this figure is not available at this time.

    The Next Generation of Database Design Introduction Relational database technology is widely deployed and used in contemporary information system shops. One new technology is slowly emerging that could ultimately change the landscape dramatically – object database management systems. The heir apparent to relational DBMSs, object database management systems store true objects, that is, encapsulated data and all of the processes that can act on that data. Because relational database management systems are so widely used, we don’t expect this change to happen quickly. It is expected that these vendors will either build object technology into their existing relational DBMSs, or they will create new, object DBMSs and provide for the transition between relational and object models.

    Slide 57:

    Summary Introduction Conventional Files Versus the Database Database Concepts for the Systems Analyst Data Analysis for Database Design File Design Database Design The Next Generation of Database Design
More Related