1 / 27

Modeling the Data Warehouse

Modeling the Data Warehouse. Chapter 7. Data Warehouse Database Design Phases. Defining the business model (conceptual model) Creating the dimensional model (logical model) Modeling summaries Creating the physical model. Select a business process. 2,3. Physical model.

marietta
Download Presentation

Modeling the Data Warehouse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling the Data Warehouse Chapter 7

  2. Data Warehouse Database Design Phases • Defining the business model (conceptual model) • Creating the dimensional model (logical model) • Modeling summaries • Creating the physical model Select a business process 2,3 Physical model

  3. Performing Strategic Analysis Phase 1: Defining the Business Model Select a business process • Performing strategic • analysis • Creating the business • (conceptual) model

  4. Creating the Business Model Phase 1: Defining the Business Model Performing strategic analysis • Creating the business (conceptual)model - Defining business requirements - Identifying the business measures - Identifying the dimensions - Identifying the grain - Identifying the business definitions and rules - Verifying data sources

  5. Creating the Business Model Phase 1: Defining the Business Model Performing strategic analysis • Creating the business (conceptual) model - Defining business requirements - Identifying the business measures - Identifying the dimensions - Identifying the grain - Identifying the business definitions and rules - Verifying data sources

  6. Business Requirements Drive the Design Process Primary input Business requirements Other inputs Existing metadata Production ERD model Research Nonrelational legacy systems

  7. Identifying Measures and Dimensions Measures Dimensions • The attribute varies • continuously: • Balance • United Sold • Cost • Sales • The attribute is perceived as • a constant or discrete value: • Description • Location • Color • Size

  8. Determining Granularity YEAR? QUARTER? MONTH? WEEK? DAY?

  9. Identifying Business Rules Location Geographic proximity 0 - 1 miles 1 - 5 miles > 5 miles Product Type Monitor Status PC 15 inch New Server 17 inch Rebuilt 19 inch Custom None Time Month>Quarter>Year Store Store>District>Region

  10. Creating the Dimensional Model • Identify fact tables - Translate business measures into fact tables - Analyze source system information for additional measures - Identify base and derived measures - Document additivity of measures • Identify dimension tables • Link fact tables to the dimension tables • Create views for users

  11. Dimension Tables Dimension tables have the following characteristics: • Contain textual information that represents the attributes of the business • Contain relatively static data • Are joined to a fact table through a foreign key reference Product Facts (units, price) Channel Time Customer

  12. Fact Tables Fact tables have the following characteristics: • Contain numeric measures (metric) of the business • May contain summarized (aggregated) data • May contain date-stamped data • Are typically additive • Have key value that is typically a concatenated key composed of the primary keys of the dimensions • Joined to dimension tables through foreign keys that reference primary keys in the dimension tables

  13. Fact table Product Facts (units, price) Channel Customer Time Dimension tables

  14. Star Schema Model Product Table Product_id Product_desc Store Table Store_id District_id • Central fact table • Radiating dimensions • Denormalized model Sales Fact Table Product_id Store_ id Item_id Day_id Sales_dollars Sales_units Time Table Day_id Month_id Year_id Item Table Item_id Item_desc

  15. Star Schema Model • Easy for users to understand • Fast response to queries • Simple metadata • Supported by many front end tools • Less robust to change • Slower to build • Does not support history

  16. Snowflake Schema Model Product Table Product_id Product_desc Store Table Store_id District_id District Table District_id District_desc Sales Fact Table Product_id Store_ id Item_id Day_id Sales_dollars Sales_units Dept Table Dept_id Dept_desc Mgr_id Mgr Table Dept_id Mgr_id Mgr_name Time Table Day_id Month_id Year_id Item Table Item_id Item_desc

  17. Snowflake Schema Model • Direct use by some tools • More flexible to change • Provides for speedier data loading • May become large and unmanageable • Degrades query performance • More complex metadata Country State County City

  18. Using Summary Data Phase 3: Modeling summaries • Provides fast access to precomputed data • Reduces use of I/O, CPU, and memory • Is distilled from source systems and precalculated summaries • Usually exists in summary fact tables

  19. Designing Summary Tables • Average • Maximum • Total • Percentage Units Sales($) Store Product A Total Product B Total Product C Total

  20. Summary Tables Example SALES BY MONTH/REGION Month Region Tot_Sales$ Jan 99 North 41,000 Jan 99 East 10,000 Feb 99 South 40,000 Mar 99 West 17,000 SALES FACTS Sales$ Region Month 10,000 North Jan 99 12,000 North Feb 99 11,000 South Jan 99 15,000 West Mar 99 18,000 South Feb 99 20,000 North Jan 99 10,000 East Jan 99 2,000 West Mar 99 SALES BY_MONTH Month Tot_Sales Jan 99 51,000 Feb 99 40,000 Mar 99 17,000

  21. Summary Management in Oracle8i Sales summary Region State City Product Time Summary advisor Summary usage Space requirements Summary recommendations

  22. Using Time in the Data Warehouse

  23. The Time Dimension • Time is critical to the data warehouse • A consistent representation of time is required for extensibility Sales fact Time dimension Where should the element of time be stored?

  24. Creating the Physical Model Phase 4: Creating the Physical Model • Translate the dimensional design to a physical model for implementation • Define storage strategy for tables and indexes • Perform database sizing • Define initial indexing strategy • Define partitioning strategy • Update metadata document with physical information

  25. Physical Model Design Tasks • Define naming and database standards • Perform database sizing • Design tablespaces • Develop initial indexing strategy • Develop data partition strategy • Define storage parameters • Set initialization parameters • Use parallel processing

  26. Using Data Modeling Tools • Tools with a GUI enable definition, modeling, and reporting • Avoid a mix of modeling techniques caused by: - Development pressure - Developers with lack of knowledge - No strategy • Determine a strategy • Write and publish formally • Make available electronically Spreadsheets CASE tools Paper and pencil

  27. Summary This lesson discussed the following topics: • Creating a business model • Creating a dimensional model • Modeling the summaries • Creating a physical model Select among business processes 2,3 Business model Dimensional model Physical model

More Related