Introduction to Database Design Methodology

Introduction to Database Design Methodology Chapter 6 Data Redundancy and Anomalies

Learning Goals • To explain the problems associated with poorly-designed database • To identify good and poor table structures

Table of Contents • Problems with Poorly-designed Database • Data Redundancy and Data Inconsistency • Anomalies in Modifying Data • Example of Poorly-designed Tables

Problems with Poorly-designed Database

Problems with Poorly-designed Database • In relational database, a poorly-designed database is the one with poorly-designed tables. • Poorly-designed tables may lead to: • unnecessary data redundancy, which will • increase the chance of data inconsistency • require more storage space for storing the redundant data • different anomalies in modifying data in the database

Data Redundancy and Data Inconsistency Remark: • Data redundancy occurs when the same data are stored more than once in one or more tables in a relational database. • Data inconsistency means that the various copies of the data in a database no longer agree.

Anomalies in Updating Data • Anomalies refer to the inconsistencies or errors that may arise when a user attempts to modify data in a database containing redundant data. • There are three main types of anomalies in modifying data in a database: • Update AnomalyIn updating a data item in a database, data inconsistency will arise if not all the corresponding copies of redundant data are updated. • Insertion AnomalyIt may not be able to insert certain data in a database unless some data of other entities are stored. • Deletion AnomalyDeletion of some data may result in unintended loss of data in other entities.

Example of Poorly-designed Tables Example 1 • Suppose you are a pop song fan and would like to develop a database to store the information of pop songs. The original design of the database consists of a table with the following structure: Song_Artist Assume that each song is associated with one artist only.

Example of Poorly-designed Tables Example 1 (con’t) Questions: • What is the primary key of the above table structure? • Are there any redundant data in the above table? If yes, give an example. • Give some examples of anomalies in updating data in the above table. Song_Artist

Example of Poorly-designed Tables Example 1 (con’t) Answers: • What is the primary key of the given table structure? Clearly, song_id is the primary key field as its value is unique for each record. • Are there any redundant data in the given table? If yes, give an example. Some data in the artist and artist_web fields are redundant. For example, records with song_id “0001” and “0004” have the same artist and artist_web values. • Give some examples of anomalies in modifying data in the above table. (See next slide)

Example of Poorly-designed Tables Example 1 (con’t) Example of Update Anomaly: Suppose the artist_web of Alan Chan should be amended to www.alanyyy.org. Carelessly, you only updated the artist_web of the first record (song_id = “0001”). The other records of Alan Chan (e.g. the 4th record) have not been updated. The term for describing this situation is called data ___________ . Song_Artist

Example of Poorly-designed Tables Example 1 (con’t) Example of Insertion Anomaly: Suppose we want to insert the information of a new artist, King Kong (art_id = “099”), into the above table. King Kong has not yet published any song. The information of King Kong cannot be stored unless we also enter his song information. (Why?) Song_Artist

Example of Poorly-designed Tables Example 1 (con’t) Example of Deletion Anomaly: Peter Pan had only published one song (“Freedom”) up to now. Suppose you want to delete the record of the song “Freedom” (song_id=“0002”) from the table. What would be the problem of deleting the record (song_id=“0002”) from the above table? Song_Artist

Activity 1 Task 1

Task 1 Complete Task 1 in Activity 1 of Chapter 6.

Example of Poorly-designed Tables Example 1 (con’t) Suppose the original table Song_Artist is decomposed into two tables Song and Artist: Song_Artist Song Artist (relationship)

Example of Poorly-designed Tables Example 1 (con’t) Questions • Are there any redundant data in the above tables? • Will similar anomalies occur when modifying the records with this design? Answers • Except the data in the foreign key (art_id) of the table Song, there is no redundant data in the new tables. • All the anomalies in modifying data in the database have been removed.

Activity 1 Task 2

Task 2 Complete Task 2 in Activity 1 of Chapter 6.

Conclusions from Example 1 Conclusions from the findings in Example 1: • Poorly-designed table structures may lead to: • Unnecessary data redundancy • Anomalies in: • Updating • Insertion • Deletion • Poor table structures can be “repaired” by decomposing the tables into smaller ones. • In the above Example, it is quite nature to decompose the original table Song_Artist into the tables Song and Artist. • If we construct the ER diagram for the problem, the two basic entities that would be identified are probably Song and Artist.

Techniques for Constructing Well-Designed Tables • In Chapters 4 and 5, you learnt how to construct anER diagram and to transform the diagram into table structures for designing a relational database. • Although this approach can help us identify the entities, attributes and relationships in a database, the identification process is mainly based on the collected users’ requirements and some common senses. • It is still possible that the resulting tables are not well-designed.

Techniques for Constructing Well-Designed Tables • Badly-structured tables may arise from the errors in the original ER diagram or in the process of translating the ER diagram into tables. • Therefore, we need a more standardized technique for: • Checking whether the tables are well-structured; • “Repairing” poorly-designed table structures (mainly by decomposing the original tables); • Assisting the construction of tables from the collected users’ requirements. • The technique we will study in next Chapter is called “normalisation”, which was introduced by Dr. E.F. Codd in 1972. (http://en.wikipedia.org/wiki/Database_normalization )

References • http://en.wikipedia.org/wiki/Database_normalization • http://adbc.kennesaw.edu/ • http://mis.chna.edu.tw/teacher_web/hhhsu/資料庫系統_9.ppt • http://web.ydu.edu.tw/~taniah/f7970-03.ppt • Peter Rob, Carlos Coronel, Database Systems – Design, Implementation, & Management, 5th Ed., Thomson Learning, Chapter 4. • Thomas Connolly, Carolyn Begg, Database Solutions, end Ed., Pearson Addison Wesley, Chapter 8.

Introduction to Database Design Methodology