100 likes | 267 Views
Denormalization. - Causes redundancy, but fast performance & no referential integrity - Denormalize when • specific queries occur frequently, • a strict performance is required and • it is not heavily updated
E N D
Denormalization - Causes redundancy, but fast performance & no referential integrity - Denormalize when • specific queries occur frequently, • a strict performance is required and • it is not heavily updated -So, denormalize only when there is a very clear advantage to doing so and document carefully the reason for doing so
typical denormalization techniques • Flatten a repeating group in one table Instead of EMP (E#, Ename) SKILL (E#, Skill) Use EMP (E#, Skill, Ename) when Emp has a smaller # of attributes. - This means use Method 2 of 1NF algorithm. But know the danger of this method as we discussed in MVD.
Cont’ (2) Embed stable Code-Interpretation (Reference) Table. Instead of FLIGHT (F#, Departs, From_Code, To_Code) CODE (Code, Airport_Name) Use FLIGHT (F#, Departs, From_AP, From_Code, To_AP, To_Code)
Cont’ Combine1:1 or 1:N (a) when N is small and (b) the record on the "one" side is small (thus the amount of redundancy will be small) Instead of SALE (S#, SPName, SaleDate), SALE_ITEMS (S#, Line#, Code, Qty) Use SALE(S#, Line#, SPName, SaleDate, Code, Qty) -- "How many T179's did we sell yeaterday?" can be answered without join. • Another example: Order_Item(O#, I#, C#, Cname, I_Desc, Qty, I_Price)
Cont’ (4) When the other entity in is not interesting by itself Order(O#, ODate, OShipTerms, PmtTerms, Cname, CAddr) (5) Replicate non-frequently updated attributes to avoid JOIN WORK_ON (ESSN, P_NUM, PName, Hours)
Problems of denormalization • Makes row longer • Makes data transfer longer • Needs more memory for memory processing • Cause redundancy and expensive update
Adding redundant data - Add summary attributes or derived attributes - Redundant relationships can improve performance with the cost of update overhead
Schema translation • Reduce #of relations for JOIN by using mapped translation • Handling null values • Combine 1:1 relationships • Relax participation constraints • Divide the big table into two, if A & B are distinct in R(A, B) • Ignore FDs based on co-occurring attributes, which are not updated ZIP --> CITY
Primary key - Most frequently used attributes - Prefer small sized attributes (used in indexes, Ref. integrity)
Index - Create a set of appropriate indexes optimzing queries (This will be discussed more in physical DB chapters.)