130 likes | 284 Views
INFS 6510 – Competitive Intelligence Systems. Normalization vs. Denormalization. Relational Databases (RDBMS). Collection of linked tables Tables linked by Primary Key / Foreign Key relationships (Referential Integrity) Primary Key – column whose values make each record unique
E N D
INFS 6510 – Competitive Intelligence Systems Normalization vs. Denormalization
Relational Databases (RDBMS) • Collection of linked tables • Tables linked by Primary Key / Foreign Key relationships (Referential Integrity) • Primary Key – column whose values make each record unique • Foreign Key – value in column that links to Primary Key in another table • SQL – Structured Query Language (language to access data in relational tables)
Relational DB Example Cust #Cust Name 100 Bob 101 Sue 102 Juan Order #Prod#QtyCust# 1 QR22 1 100 2 QR22 25 100 3 SB56 3 102 CUSTOMER TABLE ORDER TABLE Foreign Key Primary Key
Database Structure & Design • 2 Approaches: Conflict 1. Optimize for Data Capture i.e., CapturingTransactions 2. Optimize for Data Access i.e., Queries & Reporting
Approach #1: Optimize for Data Capture • To optimize for data capture, you must: • Eliminate redundancy of data (or else wasted space & processing occurs) • Ensure data integrity (or else data anomalies) • Ensure that changes in data (modifications, deletions, etc. only have to happen in one place) • Normalization – process in which a DBMS is optimized for data capture • All data “redundancy” is removed from Database • Has multiple forms (0, 1st, 2nd, 3rd, et al.)
Moving from 0NF to 1NFRule: Make a separate table for each set of related attributes, and give each table a primary key of unique values. Cust # CustName 100, 100, 101 Bob, Sue, Juan CUSTOMER TABLE ONF Cust #Cust Name 100 Bob 101 Sue 102 Juan CUSTOMER TABLE 1NF Primary Key Created with Unique values
Dependency on Primary Key 100 Bob 100 Bob Moving from 1NF to 2NFRule: Eliminate any repeating values caused by a dependency on a “keyed” column (i.e., either Primary or Foreign) Cust #Cust NameOrder# 100 Bob 1 100 Bob 2 101 Sue 3 TABLE X 1NF Cust #Cust Name 100 Bob 101 Sue Order #Cust# 1 100 2 100 3 101 CUSTOMER TABLE ORDER TABLE 2NF
Dependency b/t 2 non-key columns PGH 2 days PGH 2 days Moving from 2NF to 3NFRule: Eliminate any repeating values caused by a dependency on a “non-keyed” column (i.e., dependency on ANY column) Cust #CityOrder#ShipTime 100 PGH 1 2 days 101 PGH 2 2 days 102 LA 3 5 days TABLE X 2NF City #CityShipTime 10 PGH 2 days 20 LA 5 days Cust #City# 100 10 101 10 102 20 SHIP TIME TABLE CUSTOMER TABLE 3NF
Approach #2: Optimize for Data Access(in a separate, read-only Data Warehouse) • To optimize for data access, you must: • Allow data redundancy • Reduce the number of table joins (links among tables) • Denormalizing – Adding redundancy & reducing joins in a DBMS
Denormalizing – Most Common Approach • Star Schema (Clustering) • Fact (core or transaction) Tables in middle of star • Dimensional (structural or “lookup”) Tables around “points” of star Cust #CustName 100 Bob 101 Sue 102 Juan Prod #ProdName QR22 Rake SR56 Spade TW43 Mulch CUSTOMER TABLE PRODUCT TABLE Order #DateCust#Prod#Rep# 1 06/15/XX 100 QR22 1000 2 07/19/XX 100 QR22 1000 3 08/30/XX 101 SR56 2000 ORDER TABLE Rep #RepName 1000 Lee 2000 James 3000 Natasha Date Quarter 06/29/XX 2 Bob 06/30/XX 2 Sue 07/01/XX 3 Juan REP TABLE DATE/TIME
Denormalizing (continued)• Stars are linked via common (i.e., Conformed) Dimensions to form Data Warehouse Cust #CustName 100 Bob 101 Sue 102 Juan CUSTOMER TABLE Order #DateCust#Prod#Rep# 1 06/15/XX 100 QR22 1000 2 07/19/XX 100 QR22 1000 3 08/30/XX 101 SR56 2000 ORDER TABLE ORDER TABLE Prod #ProdName QR22 Rake SR56 Spade TW43 Mulch PRODUCT TABLE Date Quarter 06/29/XX 2 Bob 06/30/XX 2 Sue 07/01/XX 3 Juan CUSTOMER TABLE TIME DATE/TIME Rep #RepName 1000 Lee 2000 James 3000 Natasha REP TABLE Prod#ProdName Stock DateUnits QR22 Rake 03/23/XX 150 TW43 Mulch 04/15/XX 1452 SR56 Spade 05/01/XX 997 INVENTORY TABLE
Mapping Normalized Tables to Denormalized (Data Warehouse) TablesUsing ETL Tools