300 likes | 532 Views
Contents of this slideshow :. What is a data warehouse? Multi-dimensional data modeling. A star shema datawarehouse has a central table ( the Fact table ) surrouded by dimension tables with on-to-many relationships towards the fact table. An example of a Datawarehouse:.
E N D
Contents of this slideshow: • What is a data warehouse? • Multi-dimensional data modeling
A star shema datawarehouse has a central table (theFact table) surrouded by dimension tables with on-to-many relationships towards the fact table. An example of a Datawarehouse: The fixed data base structure implies that application programs (drilling functions/aggregates) can be generated automatically!
A dimension hierarchy is a set of tables connected by one-to-many relationships towards the fact table: Dimension hierarchies: In a dimension hierarchiy it is possible to aggregate data from the fact table to the different levels of the hierachy. Drill-down= “de-aggregate” = break an aggregate into its constituents. Roll-up= aggregate along one or more dimensions.
Two different types of drilling: • Drilling in dimension hierarchies. • Drilling between dimensions.
Which star schemas or data marts can be build by using the illustrated integrated E-commerce/ERP data model?Which star schema would you recommend to be implemented first?
Data marts = Kimball uses the word for any multidimensional database/star schema. A galaxy is a set of multidimensional databases with conformed (fælles tilpassede) dimensions: The value chain Suppose an entreprise has a datamart for Purchase and another datamart for Sale as illustrated above. Is it possible to calculate the revenue per month for the last year by using such a galaxy?
Conformed dimensions = dimensions designed to be common for different data marts in order to make drill across operations possible.Conformed facts = measures with common units of measurement and granularities that make it possible to integrate measures from different fact tables. The value chain Is it possible to calculate the revenue per month for the last year if the datamart for Purchase and the datamart for Sale do not have conformed dimensions or facts?
Contents of this slideshow: • What is a datawarehouse? • Multi-dimensional data modelling
SELECT Product#, SUM(Qty*Price) AS omsætningFROM Orderdetails JOIN ProductsGROUP BY Product# Datawarehouse aggregating to the product level:
SELECT Product#, Salesman#, SUM(Qty*Price) AS omsætningFROM Orderdetails JOIN Products JOIN Salesmen GROUP BY Product#, Salesman#; Drill down to the Product per Salesman level: Where should the Price be stored?
A dimension hierarchi is a set of tables connected by one-to-many relationships towards the fact table: Dimension hierarchies: A Snowflake schema may in contrast to star schemas have dimension hierarchies. Describe advantage and disadvantage by using dimension hierarchies/Snowflake schema?
A Snowflake schema may have branches in the dimension hierarchies: Snowflake schema with branches: Are Customers related to the regions?
The aggregation level is the argument to the GROUP BY statement.
Roll up can be executed by removing one or more argument to the GROUP BY statement. Roll up to the top level: Roll up to the product level. Roll up to the top level.
Non-linear dimensions as e.g. the Date Dimension: • The granularity is day. • Many different hierarchies. • Two major problems: • Calender Week do not aggregate to year. • Type of Day distinguish between working day and holiday. However, they are idependent of the other dimensions (e.g. Easter). Fiscal Year Calendar Year Calendar Fiscal Quarter Quarter Calendar Fiscal Month Month Fiscal Week What aggregation level would you use to calculate the average sale in non-hollyday mondays per month? Calendar Type of Day Week Day of Week Day
The time dimension: • The granularity is minute. • The top level is a hole day. Day Part AM/PM Flag Hour Why do you think Kimball recommends to separate the date and time dimensions? Minute
Degenerated dimension = A dimension that is not created because nobody want to aggregate data to the degenerated level. Example: The Order dimension should be deleted while the Time and Customer attributes should be created as new dimensions to which it is meaningful to aggregate data.
Exercise: The figure illustrates an ER-diagram of a car rental company like Hertz or Avis. Design a snowflake shema, star shema or Galaxy for the car rental company!
Major problems in data warehouse design: Drilling in many-to-many relationships and tree structures. Inconsistensies caused by ”slowly changing dimensions”.
If the attributes of a dimension is dynamic (e.i. they may be updated) we say that they are slowly changing. May the Branch-size of a Branch-office change after e.g. a renovation?May the Branch-name of a Branch-office change? Slowly Changing Dimensions (SCD)
Soppose the attribute Branch-size is dynamic and aggregations is made to the levels (Branch-size, Year) or (Branch-size, Month) . Does this aggregation make sense and how would you solve possible problems? Exercise in SCD:
Exercise: Is the region of the customer a dynamic attribute of the customer? Does it make sense to aggregate the rental revenue to the region of the customers?
It is possible to cheat the application generator. That is, special very complicated data structures may function as many-to-many or networt relationships when they are dealt with as 1-to-many relationships.How would you recommend to design a datawarehouse where it is possible to aggregate Sale to the Stock locations used for the sale?
Design a data warehouse (or galaxy) for an ERP system with as many meaningful dimensions as possible:
End of session Thank you !!!
Table 1[S1] [S1]Prøv evt. At få tabelnavnet op… Denne side er helt blank
Where do the responses of SCDs store historic information? • Response 1 does not store historic information. • Response 2 store historic information in a new record version. • Response 3 store at one historic value in a new dimension attribute. • Response 4 store historic information in a new dimension relationship. • Response 5 store historic information in a new fact attribute. • Response 6 can sometimes deminish the aggregation error of response 1 as finer granularity in a state fact more acurately can be related to the right dimension record. • Response 7 store historic information in a new fact table.