200 likes | 269 Views
Normalization. A337. Structure. What is a database? Tables of information Rows are referred to as records Columns are referred to as fields or attributes Record identifier is referred to as a record key Types Relational - Most common, Object-Oriented
E N D
Normalization A337
Structure • What is a database? • Tables of information • Rows are referred to as records • Columns are referred to as fields or attributes • Record identifier is referred to as a record key • Types • Relational - Most common, Object-Oriented • Hierarchical, Network (much older types) A337 - Reed Smith
Database structure • Two approaches to the structure issue: • Conceptual (you start with the question of “what information should I have?”) • ERD from “scratch” • Empirical (you already know what data there will be - you just want to organize it into tables) – • NORMALIZATION A337 - Reed Smith
Database Tables and Normalization • Normalization - Process for evaluating and correcting table structures to minimize data redundancies • Works through a series of stages called normal forms: • Normal form (1NF) • Second normal form (2NF) • Third normal form (3NF) • There are higher forms but are rarely necessary
Normalization • Why? • Data structures need to: • Minimize redundancy • Avoid insertion, update, and deletion anomalies • How? • Restructure information such that: • Only flat (rectangular) files exist (1st normal form) – No Nulls • All items in each record depend upon) the primary record key (2nd normal form) – No Partial Dependencies • If a field depends upon another then the “other” must be a primary key (3rd normal form) – No Transitive Dependencies A337 - Reed Smith
Normalize the following table: A337 - Reed Smith
What is wrong with this solution? A337 - Reed Smith
First Normal Form • Eliminate Nulls/Repeating Groups – • Eliminate repeating groups by eliminating nulls, filling in cells with implied values with actual values • Select a primary key • may be a composite key
1NF: A337 - Reed Smith
What is wrong with this solution? • Partial Dependencies • For example, the sales order number is not relevant in the determination of the item name • Similarly, the customer code and customer name do not depend upon the Item ID, they only depend upon the sales order number. A337 - Reed Smith
Second Normal Form • Eliminate Partial Dependencies • Write each key component on separate line, and then write the original (composite) key on the last line • Each component will become the key in a new table • Identify the Dependent Attributes • Determine which attributes are dependent on which other attributes
Creating 3 Tables • SO_Number, Item_Number • SO_Number, • Item_Number
2NF: A337 - Reed Smith
What is wrong with this solution? • Transitive Dependencies • Notice that the third column of the Sales Orders file has the Customer name and that depends upon the customer number. • But the customer number is not the primary key A337 - Reed Smith
Third Normal Form • For every transitive dependency, write its determinant as a PK for a new table • Identify the attributes dependent on each determinant identified in Step 1 and identify the dependency • Remove the dependent attributes in transitive relationship(s) from each table that has such a transitive relationship Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
3NF: A337 - Reed Smith
Denormalization • Creation of normalized relations is important database design goal • Processing requirements should also be a goal • If tables decomposed to conform to normalization requirements • Number of database tables expands
Denormalization (continued) • Joining larger number of tables takes additional disk input/output (I/O) operations and processing logic • Reduces system speed • Conflicts among design efficiency, information requirements, and processing speed are often resolved through compromises that may include denormalization A337 Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel