CS 430 Database Theory

CS 430Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level

Logical Design • Define the Relations and Attributes for a Relational Database • Key problems: • What tables will there be? • What columns will each table have? • Additional issue: • Where will Integrity Constraints be enforced • Intrinsic (Database design), Triggers and other DBMS mechanisms, Application

Outline • Informal Guidance • First Normal Form • Functional Dependencies and Second, Third and Boyce-Codd Normal Forms • Join Dependencies and Fourth and Fifth Normal Forms • Material from Chapters 10 and 11 in Text Book

“Informal” Guidance • Informal means: without a formal (i.e. mathematical) theory • Outline: • Keep to real world semantics • Avoid redundant information • Avoid update anomalies • Minimize use of null values • Avoid generation of spurious tuples

Real World Semantics • (To the extent possible) Your design should track the semantics of the real world • “Database is a record of executed policy” • Each relation should correspond to a single real world concept (e.g. Person) • Attributes should be attributes of that real world concept • Each attribute of each relation should correspond to an independent real world fact • You should be able to easily explain to the user the meaning of all the relations and attributes • You should choose semantically meaningful names

Avoid Redundant Informationand Update Anomalies • See Figures 10.3 and 10.4 for examples • Insertion Anomalies • Can’t insert data without having the redundant information available (or filling with nulls) • No way to insert a (false) subentity without inserting the parent (e.g. Department can’t exist if they have no Employees) • Deletion Anomalies • May delete a (false) subentity (e.g. delete a department by deleting the last employee)

Update Anomalies (Continued) • Modification (Update) Anomalies • Must change an attribute in all the duplicated locations or have multiple versions of same fact • Bottom line: • Design the database to avoid Update Anomalies • If not, document the discrepancies and make sure that these are handled correctly by applications that update the database • Data is sometimes duplicated to improve performance

Minimize use of null values • Problem: null may mean: • Value is not applicable • Value is not known • Value is missing • Value will be provided later • Another problem: Large numbers of nulls can waste space • Advice: • Try to avoid null values • Restrict null to exceptional cases • Use a separate table when nulls are common • This usually implies either a class/sub-class relationship or an entity with multiple states

Avoid Generation of Spurious Tuples • Natural Joins should represent True facts • See example Figures 10.5 and 10.6 • Equi-joins of non-key attributes (primary or foreign keys) are problematic • Non-key attributes represent unique facts associated with the entity designated by the key • Advice: • Design relations so that they are joined via primary or foreign keys

Normal Forms • Formal basis for design of database • First normal form • Follows Relational Model • Second, Third, and Boyce-Codd Normal Forms • Individual relations have the right keys • Fourth and Fifth Normal Forms • Multi-value and Join dependencies • Collection of Relations behave correctly

Practical Guidance on Normal Forms • Do a preliminary design of the database first • Use ER modeling and your understanding of the problem • Follow the informal guidance • Check that the design observes normalization rules • Iterate when there is a “bug” in the design • Document discrepancies you choose to accept • If performance becomes a problem, consider “denormalization” of the database • Most frequent denormalization: Storing joins • Beware premature optimization

First Normal Form • “Tables are square” • No repeated values • No variant records • Today: This is part of the relational model • When Codd defined it: This was counter to standard practice • Tomorrow: Object-Relational and XML databases allow repeated values • See Figure 10.9 for example

Fixing First Normal Form Violations Possible strategies: • (Preferred) Create a new table to store multi-valued attributes • Repeat the whole tuple for each value • Usually creates redundancy problems and update anomalies • Store the values as multiple attributes in a single tuple • Requires knowing the maximum number of values or creating overflow relations • Creates problems when querying the database • Code multiple values into a single string • Yuck!

CS 430 Database Theory