140 likes | 317 Views
Normalization (Codd, 1972). Practical Information For Real World Database Design. Requirements for Relational DB. Table format Supports Boolean Algebra Selects, joins, projects + 5 other operations to define queries Supports mathematical, relational, and logical operators (And-Or-Not)
E N D
Normalization(Codd, 1972) Practical Information For Real World Database Design
Requirements for Relational DB • Table format • Supports Boolean Algebra • Selects, joins, projects + 5 other operations to define queries • Supports mathematical, relational, and logical operators (And-Or-Not) • Codd’s Twelve Rules (abstracted) • Null values can be present except in primary key • All data represented in tables • Must be able to update views • Access data with table name, field name, or value • Data and programs should be independent • Should enforce integrity and validity constraints
Normalization Defined • Normalization • Purpose is to avoid potential update problems called anomalies • Assigned attributes to entities using 1NF, 2NF, 3NF, 4NF, 5NF • Denormalization • Moving back a level to gain better performance in the real-world database; in practice 3NF is most common, however, to gain efficiency and speed, minor changes may need to be made
Why Normalization Is Important • If not done, updates are less efficient (larger tables, possibly more than one update per data item change) • If not done, indexing is more cumbersome – impractical to build large databases • If not done, no simple strategies for creating views required by users
Design Rules • Determine Business Rules • A company manages many different projects • Each project requires the services of many employees • Employees may be assigned to work on more than one project • Each employee has a job classification • Many employees have the same job classification • Translate business rules to validity constraints and relationships
Design Rules, Continued • Analyze documents, interview key users, etc. to develop a field list • Determine entities to be used (see next slide for definition) • Determine relationships between entities • Assign the attributes to the entities • Identify primary and foreign keys • Check for 1NF, 2NF, and 3 NF
Definitions • Entity – the subject to be modeled by the database file (table or relation) • Primary Key – the field value that uniquely identifies the entity entry (row, tuple, record); all other attributes are functionally dependent on it; can’t be null • Foreign Key – the field (attribute, column) that relates the table to a pre-existing table • Functional Dependence - determines or depends on, e.g. advisor name depends on advisor ID
Definitions, Continued • Views • Selected group of records (select) • Selected group of fields (project) • Selected group of records and fields from two or more tables (join) • A query • A report • A set of labels
Definitions, Continued • Determinant • Determines value of another attribute; e.g. primary key • Indexes • Tables that contain record numbers only arranged in an order based on some field value • Entity Integrity • Every table must have a field to uniquely identity each record and there must be a field value for every record • Referential Integrity • If a record has a value in a foreign key field, it must match an exiting value in the original table to which it is linked
1 Normal Form • Table does not contain repeating groups • To put it another way, each record has at least one field that differentiates it from every other record in the file; e.g. a unique primary key Examples: Faculty ID is primary key and the same faculty id is associated with two or more courses Solve by creating a course file Faculty ID is primary key and the same faculty id is associated with two or more offices Solve by redesigning database to include offices as a separate table
2 Normal Form • Table must be in 1 Normal Form • No non-key attribute is dependent on only part of the concatenated key • Concatenated key (two or more fields taken together represent primary key) • In course table, concatenated key is faculty ID and Catalog No – every field in table must be dependent on both faculty ID and catalog number
Anomalies Avoided By 2 NF • Only have one data item to change when update is made • Avoids “loose” data when deletes are made • When a part number is deleted, could lose reference to invoice • How do you add a new course when there is no associated faculty ID? • A new office with no assigned faculty? • Avoids inconsistent data
3 Normal Form • The only determinants are candidate keys • Candidate keys in student file are social security number and patron ID (both are unique) • To put it another way, there are no transitive dependencies • If student file contains Dept ID (foreign key) and department name, this is a transitive dependency
4 and 5 Normal Form • 4 Normal Form • There are no multivalued dependencies – is like Boyce Codd • 5 Normal Form • Holds only theoretical interest