90 likes | 452 Views
Alternative Database topology: The star schema. D.W. O.L.A.P Data mining . Customer. Cust Purchases. Product Ref. Product Code ProdRef Eff. Date. Customer ID Activity Date Product Code. Customer ID Status Date. ProdRef End Date Product Name Unit Price Product Category
E N D
Alternative Database topology: The star schema D.W. O.L.A.P Data mining
Customer Cust Purchases Product Ref Product Code ProdRef Eff. Date Customer ID Activity Date Product Code Customer ID Status Date ProdRef End Date Product Name Unit Price Product Category Product Type Product Sub Type Cust Addr State Cust ZIP Code Customer Type Customer Status ... Product Name Sales Rep ID Qty Purchased Total Dollars Promotion Flag Cust Averages Outlet Reference Customer ID Cust Average Date Sales Rep Ref Store ID Sales Rep ID Store Name Store Location Distribution Channel Cust Avg. End Date Cust Avg. Rev. Cust Longevity Sales Person Name Store ID The Atomic Schema
Dimension Table 3 Dimension Table 2 Dimension Table 4 The Star Schema Dimension Table 1 Fact Table Dimension Key 1 Dimension Key 3 Dimension Key 1 Dimension Key 2 Dimension Key 3 Dimension Key 4 Description 1 Aggregatn Lvl 1.1 Aggregatn Lvl 1.2 Aggregatn Lvl 1.n Description 3 Aggregatn Lvl 3.1 Aggregatn Lvl 3.2 Aggregatn Lvl 3.n Fact 1 Fact 2 Fact 3 Fact 4 . . . Fact n Dimension Key 2 Dimension Key 4 Description 2 Aggregatn Lvl 2.1 Aggregatn Lvl 2.2 Aggregatn Lvl 2.n Description 4 Aggregatn Lvl 4.1 Aggregatn Lvl 4.2 Aggregatn Lvl 4.n
Dimension Table 1 Dimension Key 1 Description 1 Aggregatn Lvl 1.1 Aggregatn Lvl 1.2 Aggregatn Lvl 1.n Dimension Table • Describes the data that has been organized in the Fact Table • Key should either be the most detailed aggregation level necessary (e.g. country vs. county), if possible, or... • Surrogate keys may be necessary, but will decrease the natural value of the key • Manageable number of aggregation levels
Fact Table Dimension Key 1 Dimension Key 2 Dimension Key 3 Dimension Key 4 Fact 1 Fact 2 Fact 3 Fact 4 . . . Fact n Fact Table • Quantifies the data that has been described by the Dimension Tables • Key made up of unique combination of values of dimension keys • ALWAYS contains date or date dimension • Fact values should be additive • Aggregations of quantities or amounts from atomic level • No percentages or ratios • May be non-additive, time-variant data
Customer Location Selling Responsibility Cust ZIP Code Purchases 1 Sales Rep ID City State/Province Country Sales Rep ID Product Code Cust ZIP Code Customer Type Week Ending Date Sales Rep Name Store ID Store Name Store Location Sales Channel Customer Type Customer Type Days of Activity Unit Price Total Quantity Total Dollars Returned Qty Returned Dollars Promotion Qty Cust Type Desc Product Product Code Date Information Product Name Prod. Category Product Type Prod Sub Type Week Ending Date Month Quarter Year For Example:
Star Schema Query Select E.Month, B.Customer_Type, C.Product_Type, D.Store_Location, sum(A.Total_Quantity) From Purchases_1 A, Customer_Type B, Product C, Selling_Responsibility D, Date_Information E Where B.Customer_Type = A.Customer_Type and C.Product_Code = A.Product_Code and D.Sales_Rep_ID = A.Sales_Rep_ID and E.Week_Ending_Date = A.Week_Ending_Date and E.Year = “1996” and C.Product_Category = “V” Group by E.Month, B.Customer_Type, C.Product_Type, D.Store_Location;
Weekly Monthly D1 D3 D1 D3 Date Date D2 D4 D2 D4 Answer: Distinct Time Period Fact Tables • Create separate fact tables to account for different time periods • Date still part of each fact table key • Same dimension tables used by both fact tables • Improves overall performance (loading and accessing) for each time period • Will not increase amount of managed redundancy