150 likes | 305 Views
Normalization. By Albert Lin. Basics. Process of efficiently organizing data in a database. Goals Eliminate redundant data Ensure data dependency sensibility. Benefits. Faster sorting and index creation Larger clustered indexes Narrow and compact indexes Fewer indexes per table.
E N D
Normalization By Albert Lin
Basics • Process of efficiently organizing data in a database. • Goals • Eliminate redundant data • Ensure data dependency sensibility
Benefits • Faster sorting and index creation • Larger clustered indexes • Narrow and compact indexes • Fewer indexes per table. • Fewer null values • Less inconsistencies
Brief Description of Normal Forms • 1st Normal Form • No repeating groups • 2nd Normal Form • Each column depends entirely on the primary key. • 3rd Normal Form • Each column depends directly on the primary key. • 4th Normal Form • Isolate independent multiple relationships. • 5th Normal Form • Isolate Semantically related multiple relationships. • Boyce-Codd Normal Form • Non Trivial dependencies between candidate keys. • Optimal Normal Form • Limited to simple Facts • Domain-Key Normal Form • No modification anomolies
Additional Information • 1st Normal Form • Addresses the structure of an isolated table. • 2nd, 3rd and Boyce Codd Normal Form • Addresses 1-1 and 1-many relationships. • 4th and 5th Normal Form • Many-Many relationships. • These forms are cumulative.
Anomalies • Insertion Anomalies • Duplication of data enforced • impossible due to entity integrity • Deletion Anomalies • Leads to undesired loss of data. • Modification Anomalies • Modification of multiple rows can lead to inconsistencies. • Databases that are sufficiently normalized can reduce anomalies from occurring.
Tips • Make a table for each list. • Use non-meaningful primary keys • Eliminate Repeating Groups • Eliminate Columns not dependent on Primary Key • Each table should have an identifier • Should only store data for a single type of entity. • Avoid null columns
Additional Tips • Records are free, new fields are expensive. • Know when data requires duplication • Use referential integrity • The higher normal form generally results in faster data retrieval.
Questions to ask • What data do you need? • What are you going to do with the data? • How are the data related? • What is the future of the data?
Problems with the database • In a large scale database, information would be repeated numerous times, resulting in redundant data • Retrieval of data would be difficult and long • Index creation would be difficult.
In Practice • Many databases are “de-normalized” to some degree. • This is due to performance issues. • It may require fewer joins and result in faster retrievals. • However, before doing “de-normalization” performance issues must exist and de-normalization must dramatically improve it before introducing a suboptimal design • A de-normalized table can be harder to update.
Overall topics • Normalization helps organization and speed of organizing a database, which can help a company produce a database system that is quick and easily accessible. • The higher the normal form, the less chance anomalies will arise. • In case of performance issues, de-normalization can be done in order to increase performance.
Final Motto: Strive for Single Themed Tables.
References • Litt, Steve. “Normalization”. 1996 http://www.troubleshooters.com/littstip/ltnorm.html • “Rules of Data Normalization”. 2005. http://www.datamodel.org/NormalizationRules.html • “Normalization” http://msdn.microsoft.com/library/default.asp?url=/library/en-us/createdb/cm_8_des_02_2oby.asp • Chung,Luke. “Database Normalization Tips”. FMS. 2001. http://www.fmsinc.com/tpapers/genaccess/databasenorm.html • Janert, Phillip. “Practical Database”. IEEE. 2004. http://www.devx.com/ibm/Article/20859 • Reus, Bernhard. “Databases”. University of Sussex. 2004. http://www.informatics.sussex.ac.uk/users/bernhard/db2005/Slides/dbXIII.pdf • Wyllys, R.E. “Steps in Normalization” University of Texas at Austin. 2003. http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%206.%20Anomalies%20and%20Normalization