380 likes | 558 Views
Creating the Dimensional Model. Data Warehouse Database Design Objectives. Easy to understand Optimum performance Adaptable. Data Warehouse Data Type. Description. Fact (detail, atomic, raw) base data. Business measures. Dimension data. Query drivers. Derived fact data.
E N D
Data Warehouse Database Design Objectives • Easy to understand • Optimum performance • Adaptable
Data Warehouse Data Type Description Fact (detail, atomic, raw) base data Business measures Dimension data Query drivers Derived fact data Calculated data Summary (aggregate) data Pre-calculated data Metadata Warehouse map Data Warehouse Data Types
Designing the Dimensional Model • Phase III: Defining the dimensional model
Item_nbrItem_descQuantityDiscnt_priceUnit_priceOrder_amount…Item_nbrItem_descQuantityDiscnt_priceUnit_priceOrder_amount… Order History (Dimension table) Product (Dimension table) Customer (Dimension table) Channel (Dimension table) (Facttable) Star Dimensional Modeling
Advantages of Using a Star Dimensional Model • Supports multidimensional analysis • Design improves performance • Optimizers yield better execution plans • Parallels end-user perceptions • Provides an extensible design • Broadens the choices for data access tools
Fact Tables • Contain numerical metrics of the business • Can hold large volumes of data • Can grow quickly • Can contain base, derived, and summarized data • Are typically additive • Are joined to dimension tables through foreign keys that reference primary keys in the dimension tables
Factless Fact Tables • Event tracking • Coverage
More on Factless Fact Tables Employee dimension Emp_PK Grade dimension Grade_PK Emp_FK Sal_FK Age_FK Ed_FK Grade_FK Salary dimension Sal_PK Education dimension Ed_PK Age dimension Age_PK PK = Primary Key & FK = Foreign Key
Identify Base and Derived Measures • Identify candidate facts • Remove duplicate facts • Discover and document the underlying calculations • Cross reference base facts • Obtain final derived fact approval
Base and Derived Data Payrolltable Emp_FK Month_FK Salary Comm Comp 101 05 1,000 0 1,000 102 05 1,500 100 1,600 103 05 1,000 200 1,200 104 05 1,500 1,000 2,500 Basedata Deriveddata
Translating Business Measures into a Fact Table Business measures Facts Fact Business Measures Base Number of Items Number of Items Item Amount Base Amount Base Item Cost Cost Derived Profit Profit
Fact Table Measures Nonadditive: Cannot be added along any dimension Additive: Added across all dimensions Semiadditive: Added along some dimensions
Dimension Tables • Contain descriptors of the business • Contain relatively static data • Contain textual and discrete data • Are usually smaller than fact tables • Are joined to a fact table through a foreign key reference
Translating Business Dimensions into Dimension Tables DimensionTables Business Dimensions Product Store SourceSystemInformation Customer Day
Translating a Product Business Dimension into a Product Dimension Table Business Dimension Business DimensionProduct for Product Product_Id (Natural Key) Product Category Supplier Type Warehouse Promotion Catalog Product_name Product_desc Category Supplier_Id Product_status List_price Catalog_Id Product_type Product_code Promotion_Code Warehouse_location
Date Dimension • Should contain the attributes required by all fact tables. • Can be categorized into 4 groups • Date formats • Calendar date attributes : day, month name, year • Fiscal attributes : fiscal week, fiscal period • Indicator columns : Boolean values such as it is a national holiday
Slowly Changing Dimensions Business Dimension for Product Business DimensionProduct Product_Id (Natural Key) Product_PK (surrogate key) Product Category Supplier Type Warehouse Promotion Catalog Product_name Product_desc Category Supplier_Id Product_status List_price Catalog_Id Product_type Product_code Promotion_code Warehouse_location Valid_from_date Valid_to_date Where Product_key is a calculated number stored within the database
Types of Database Keys • Primary keys (PKs) • Foreign keys (FKs) • Composite keys • Surrogate keys
Using Surrogate Keys • Advantages of surrogate keys include: • Control over data • Reduced fact table size • Avoid using the following as data • warehouse keys: • OLTP Production (natural) keys • Smart keys (embedded meaning) Surrogate key Product key: 38972
1 1 Smith 200 030199 2 2 Jones 300 050599 3 3 Harvey 300 060599 22 1 Smith 400 061001 1 073258 Coffee Hot YUBN 032200 2 073258 Coffee Hot MAXH 110100 3 011172 Pop Cold SCHW 061001 4 011173 Tea Hot RRSE 061001 Surrogate Keys Example Emp_FK Salesperson_ID Salesperson_Name Manager_ID Emp_Change_Date Surrogate Keys Prod_FK Prod_ID Prod_Name Prod_Grouping Brand_Code Prod_Change_Date
Cost Discount Margin Sales_amt Units Sales_channel_FK(surrogate key) Sales_product_FK (surrogate_key) Sales_promotion_FK (surrogate key) Sales_time Adding a Surrogate Key • Channel, product, promotion, and time surrogate keys are added. • These can be used to build aggregate tables. Business Dimension for Sales
Bracketed Dimensions • Enhance performance and analytical capabilities • Create groups of values for attributes with many unique values, such as income ranges and age brackets • Minimize the need for full table scans by pre-aggregating data
Bracket dimension Bracket_PK Income fact Customer_PKBracket_FK Customer dimension Customer_PKBracket_FK Bracket_PK Income (10Ks) Marital Status Gender Age 1 60-90 Single Male <21 2 60-90 Single Male 21-35 3 60-90 Single Male 35-55 4 60-90 Single Male >55 5 60-90 Single Female <21 6 60-90 Single Female 21-35 Bracketing Dimensions
Models for Hierarchical Data • Analytical activities using hierarchies are supported through different models: • Business • Multiple • Multiple time • Hierarchical data is stored in dimension tables. • Dimensions can contain one or more hierarchies.
Identifying Analytical Hierarchies Business hierarchies describe organizational structure and logical parent-child relationships within the data. Storedimension Organizationhierarchy Region Store IDStore DescLocationSizeTypeDistrict IDDistrict DescRegion IDRegion Desc District Store
Multiple Hierarchies Storedimension Store IDStore DescLocationSizeTypeDistrict ID District DescRegion IDRegion DescCity IDCity DescCounty IDCounty DescState IDState Desc Organizationhierarchy Geography hierarchy Region State City County Store District Store
Multiple Time Hierarchies Fiscaltimehierarchy Calendartimehierarchy Fiscal year Calendar year Fiscal quarter Calendar quarter Fiscal month Calendar month Fiscal week Calendar week
Drilling Up and Drilling Down Market Hierarchy Group Region 1 Region 2 District 1 District 2 District 3 District 4 Store 1 Store 2 Store 3 Store 4 Store 5 Store 6
Drilling Across Market hierarchy Group Region Region District District City hierarchy Store Store City City Stores> 20,000 sq. ft.
Documenting the Granularity of Dimensions • Is an important design consideration • Determines the level of detail • Is determined by business needs Low-level grain (Transaction-level data) High-level grain (Summary data)
Fiscal Year Fiscal Quarter Fiscal Month Fiscal Week Day Defining Time Granularity Fiscal Time Hierarchy Current dimension grain Future dimension grain
A Star Dimensional Model • Denormalized Model Order History Product History_FKCustomer_FKProduct_FKChannel_FK Item_nbrItem_descQuantityDiscnt_priceUnit-priceOrder_amt… Product_PK History_PK . . . . . . . . Customer Channel Customer_PK Channel_PK . . . . . . . .
Star Dimensional Model Characteristics • The model is easy for users to understand. • Primary keys represent a dimension. • Nonforeign key columns are values. • Facts are usually highly normalized. • Dimensions are completely denormalized. • Fast response to queries is provided. • Performance is improved by reducing table joins. • End users can express complex queries. • Support is provided by many front-end tools.
Channel_PK Web_PK Channel_desc Snowflake Model Order History Product History_FKCustomer_FKProduct_FKChannel_FK Item_nbrItem_descQuantityDiscnt_priceUnit-priceOrder_amt… History_PK Product_PK . . . . . . . . Customer Channel Customer_PK . . . . Web Web_PK Web_url
Constellation Configuration Atomicfact
Updating the Meta Data • Dimensions and attributes detail (Primary key, attribute definition, and so on) • Facts and measures detail (Measure description, additivity, and so on) • Data source definitions (Business owner, platform, description, and so on) • Source to target data mappings (Data type, length, target column description, and so on)