Denormalization

Denormalization - Causes redundancy, but fast performance & no referential integrity - Denormalize when • specific queries occur frequently, • a strict performance is required and • it is not heavily updated -So, denormalize only when there is a very clear advantage to doing so and document carefully the reason for doing so

typical denormalization techniques • Flatten a repeating group in one table Instead of EMP (E#, Ename) SKILL (E#, Skill) Use EMP (E#, Skill, Ename) when Emp has a smaller # of attributes. - This means use Method 2 of 1NF algorithm. But know the danger of this method as we discussed in MVD.

Cont’ (2) Embed stable Code-Interpretation (Reference) Table. Instead of FLIGHT (F#, Departs, From_Code, To_Code) CODE (Code, Airport_Name) Use FLIGHT (F#, Departs, From_AP, From_Code, To_AP, To_Code)

Cont’ Combine1:1 or 1:N (a) when N is small and (b) the record on the "one" side is small (thus the amount of redundancy will be small) Instead of SALE (S#, SPName, SaleDate), SALE_ITEMS (S#, Line#, Code, Qty) Use SALE(S#, Line#, SPName, SaleDate, Code, Qty) -- "How many T179's did we sell yeaterday?" can be answered without join. • Another example: Order_Item(O#, I#, C#, Cname, I_Desc, Qty, I_Price)

Cont’ (4) When the other entity in is not interesting by itself Order(O#, ODate, OShipTerms, PmtTerms, Cname, CAddr) (5) Replicate non-frequently updated attributes to avoid JOIN WORK_ON (ESSN, P_NUM, PName, Hours)

Problems of denormalization • Makes row longer • Makes data transfer longer • Needs more memory for memory processing • Cause redundancy and expensive update

Adding redundant data - Add summary attributes or derived attributes - Redundant relationships can improve performance with the cost of update overhead

Schema translation • Reduce #of relations for JOIN by using mapped translation • Handling null values • Combine 1:1 relationships • Relax participation constraints • Divide the big table into two, if A & B are distinct in R(A, B) • Ignore FDs based on co-occurring attributes, which are not updated ZIP --> CITY

Primary key - Most frequently used attributes - Prefer small sized attributes (used in indexes, Ref. integrity)

Index - Create a set of appropriate indexes optimzing queries (This will be discussed more in physical DB chapters.)

Denormalization

Denormalization

Presentation Transcript

Database Normalization and Denormalization

What is Denormalization?

AN EMPIRICAL EXAMINATION OF THE IMPACT OF DATA MODEL DENORMALIZATION ON THE

Denormalization