1 / 31

MIS 451 Building Business Intelligence Systems

MIS 451 Building Business Intelligence Systems. Logical Design (1). Project Planning. Requirements Analysis. Logical Design. Physical Design. Data Staging. Data Analysis (OLAP). Introduction to Dimensional Modeling.

dino
Download Presentation

MIS 451 Building Business Intelligence Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MIS 451Building Business Intelligence Systems Logical Design (1)

  2. Project Planning Requirements Analysis Logical Design Physical Design Data Staging Data Analysis (OLAP)

  3. Introduction to Dimensional Modeling • Dimensional Modeling is a DW logical design technique that seeks to present data in a standard framework that is intuitive for data access and allows for high performance data access. • Intuitive: easy to write SQL • High performance: high performance SQL

  4. Dimensional Model (Star Schema) ER Model For detailed information, please refer handout 1.

  5. Introduction to Dimensional Modeling • Analytical Report: 2-dimension January sales report by customer state and product category • Query: list sales in Jan. by customer state and product category?

  6. Introduction to Dimensional Modeling Query based on Dimensional Model: Select State, PCName, SUM(Sales) From Sales S, Customer C, Product P, Time T Where S.Time_ Key = T.Time_Key and S.Product_ Key = P.Product_Key and S.Customer_Key = C.Customer_Key and T.Month= ’JAN’ Group by State, PCName Join: 4 tables Query based on ER Model: Select State, PCName, SUM(Price*Quantity) From OrderLine OL, Customer C, Product_Category PC, Product P, Order O Where OL.OID = O.OID and OL.PID = P.PID and O.CID = C.CID and to_char(O.OrderDate,’MON’) = ’JAN’ and P.PCID = PC.PCID Group by State, PCName Join: 5 tables

  7. Fact and Dimension Fact table Dimension table

  8. Fact and Dimension • There are two types of tables in dimensional modeling: • Fact table: attributes in fact tables are measurements for analysis or contents in reports. • Dimension table: attributes in dimension tables are constraints for the measurements or headers in reports. Dimensions Facts

  9. Facts and Dimensions

  10. Facts and Dimensions • How to identify facts and dimensions? • Requirements Analysis: • Analytical requirements: Marketing managers want to know sales performance for different product category in different states? • Information requirements: quantity of product sold, sales amount, product category, and customer states • ER Model

  11. F1: Calculation F: refers to special considerations for fact table or special type of fact table

  12. F1: Calculation • Normalization in RDB • 1NF • 2NF • 3NF • Non-volatile property of data warehouse enables DW design to resist normalization and improve query performance.

  13. D1: Slowly changing dimension D: refers to special considerations for dimension table or special type of dimension table

  14. CID CName State City 101 Jon Arizona Tucson 102 Tom Arizona Tucson 103 Mark Arizona Phoenix D1: Slowly changing dimension • Values of attributes in dimension tables may evolve over time. For example, customers moved from one city to another city. Phoenix Tom moved from Tucson to Phoenix

  15. CID CID CName CName State State City City 101 101 Jon Jon Arizona Arizona Tucson Tucson 102 102 Tom Tom Arizona Arizona Tucson Phoenix 103 103 Mark Mark Arizona Arizona Phoenix Phoenix D1: Slowly changing dimension • There are three ways to handle slowly changing dimension. • Method 1: Overwrite old values with new values

  16. D1: Slowly changing dimension Drawbacks of method 1: Historical information is totally lost. We will never know that customer 102 lived in Tucson before. Moreover, when listing sales by city, all the sales of customer 102 will be counted as part of Phoenix sales, although 102 was in Tucson before.

  17. CID CID CName CName State State Original City City Current City 101 Jon Arizona Tucson 101 Jon Arizona Tucson Tucson 102 Tom Arizona Tucson 102 Tom Arizona Tucson Phoenix 103 Mark Arizona Phoenix 103 Mark Arizona Phoenix Phoenix D1: Slowly changing dimension • Method 2: Add a new attribute to record current value of the changing attribute.

  18. D1: Slowly changing dimension Drawbacks of method 2: Only partial Historical information (original & current) is kept. Considering that customer 102 moved from Tucson to Flagstaff then to Phoenix, the customer information of customer 102 only includes Tucson and Phoenix.

  19. CID CName State City 101 Jon Arizona Tucson 102 Tom Arizona Tucson 103 Mark Arizona Phoenix D1: Slowly changing dimension • Method 3: Add a record whenever a dimension attribute changes.

  20. D1: Slowly changing dimension Method 3 keep all the information. However, Is there any problem?

  21. CID CName State City 101 Jon Arizona Tucson 102 Tom Arizona Tucson 103 Mark Arizona Phoenix D1: Slowly changing dimension • Method 4: warehouse key + method 3 • Warehouse key is a sequence of non-negative integers served as primary keys of tables in data warehouse. Warehouse key

  22. D1: Slowly changing dimension • Why warehouse key is needed in data warehouse? • Solve slowly changing dimension problem • Compared with natural keys (i.e., primary keys of tables in RDB, such as CID of customer table), warehouse keys have high join performance.

  23. D1: Slowly changing dimension • Warehouse key • Primary keys in dimensional tables are warehouse keys. • Primary key in fact table is a collection of warehouse keys of all/part of its associated dimensions.

  24. D1: Slowly changing dimension Notation: Primary key

  25. D2: Time Dimension D: refers to special considerations for dimension table or special type of dimension table

  26. D2: Time Dimension • Data warehouse needs an explicit time dimension table instead of just a time attribute (e.g, ORDERDATE). • Besides the time attribute, time dimension table includes the following additional attributes: • Day_of_week (1-7); Day_number_in_month (1-31); • Day_number_in_year (1-365) • Week_number (1-52); month (1-12), Quarter (1-4) • Holiday_flag (y/n) • Fiscal_quarter, Fiscal_year

  27. D2: Time Dimension • Time dimension can: • Save computation effort and improve query performance • Complex queries regarding calendar calculation are hidden from end users of data warehouse.

  28. D3: Snowflake D: refers to special considerations for dimension table or special type of dimension table

  29. D3: Snowflake Snowflake structure

  30. D3: Snowflake • Snowflake structure should be avoided in data warehouse design • Tradeoff of avoiding snowflake • Advantage: improve query performance • Disadvantage: require more storage space

More Related