1 / 28

ISCG 6425 Data Warehousing (DW) Week 3 DW Data Modeling

ISCG 6425 Data Warehousing (DW) Week 3 DW Data Modeling. Lecturer : Dr. Sira Yongchareon. OLTP Database Analysis (Northwind Case study) OLAP Database Requirement Analysis DW Data Modeling. Outline. OLTP ( O n- L ine T ransaction P rocessing) Databases

birch
Download Presentation

ISCG 6425 Data Warehousing (DW) Week 3 DW Data Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ISCG 6425Data Warehousing(DW)Week 3DWData Modeling Lecturer : Dr. Sira Yongchareon

  2. OLTP Database Analysis (Northwind Case study) OLAP Database Requirement Analysis DW Data Modeling Outline

  3. OLTP(On-Line Transaction Processing) Databases Database designed for day-to-day operations / transactions Queries contain SELECT, INSERT, UPDATE, DELETE Normalized schema (no duplicates, no inconsistency data) OLAP (On-Line Analytical Processing) Database Database designed for business analytics, summary reports, decision making Queries have SELECTonly(no INSERT, DELETE, UPDATE) De-normalized schema (lots of duplicates/redundancies) Revisit: OLTP vs OLAP Databases

  4. In year 1996, Northwind Retail Company deployed an MS-SQL database management system to store their transactional data related to their customers, products, and sales transactions. Their OLTP database was named northwind1. Case Study : Northwind This database model was drawn in MS Visio (we will use it in this course). Different drawing but similar to the ERD we have learnt.

  5. Northwind's Requirements Sales manager wants to list all product names and their totalrevenue generated in year 1998,group by product name Case Study : Northwind Can you write an SQL query that can answer the above question?

  6. Northwind's Requirements Sales manager wants to list all product names and their totalrevenue generated in year 1998, group by product name Case Study : Northwind Group by “product name” For a product ordered in 1998 Calculated revenue = (UnitPrice x Quantity)

  7. Northwind's Requirements Sales manager wants to list all product names and their totalrevenue generated in year 1998, group by product name Case Study : Northwind SELECTp.ProductName, sum(od.UnitPrice*Od.Quantity) as [Total Revenue] FROM Orders o, [Order Details] od, Products p WHEREp.ProductId=od.ProductID and o.OrderID=od.OrderID AND o.OrderDate BETWEEN '1998-01-01' AND '1998-12-31' GROUP BY p.ProductName

  8. Try one more….. Sales manager wants to list all beverageproducts and their totalrevenue (after discount)generated in Quarter 1 of 1996-1998 from a customer who lived in UK , group and order by product name Case Study : Northwind Can you write an SQL query that can answer the above question?

  9. Sales manager wants to list all beverageproducts and their totalrevenue (after discount)generated in Quarter 1 of 1996-1998 from a customer who lived in UK , group and order by product name Case Study : Northwind Only ‘UK’ Group by “product name” For a product ordered in Q1 of 1996, 1997 and 1998 Calculated revenue = (UnitPrice x Quantity)-%Discount CategoryName = ‘Beverages’

  10. Sales manager wants to list all beverageproducts and their totalrevenue (after discount)generated in Quarter 1 of 1996-1998 from a customer who lived in UK , group by and order by product name Case Study : Northwind Every time of select, system calculates these figures SELECT p.ProductName, sum((od.UnitPrice*Od.Quantity)-((od.UnitPrice*Od.Quantity)*od.Discount)) as [Total Revenue] FROM Orders o, [Order Details] od, Products p, Categories ca, Customers cu WHERE p.ProductId=od.ProductID AND o.OrderID=od.OrderID AND ca.CategoryID=p.CategoryID AND cu.CustomerID=o.CustomerID AND ca.CategoryName = 'Beverages' AND cu.Country ='UK' AND ((o.OrderDate BETWEEN '1996-01-01' AND '1996-03-30') OR (o.OrderDate BETWEEN '1997-01-01' AND '1997-03-30') OR (o.OrderDate BETWEEN '1998-01-01' AND '1998-03-30')) GROUP BY p.ProductName ORDER BYp.ProductName Look at this date checking statement!! This just for 3 years, what about for 10 years ?!?!?!

  11. Querying from OLTP database seems OK…. However, The query can be very complicated … Also requires to compute every time (e.g., total revenue = Qty * Price) The more complicated join, the slower respond Think about to join 10,000,000 records of Orders with 100,000 customers, and 10,000 products… Our northwind has all data in just one database, What about joining 100 databases (from different branches around the world) DW is our HERO… born to solve these problems!! What’s the problem?

  12. Revisit: DW Architecture

  13. Revisit: OLAP Schemas (1) Star Schema (2)Snowflake Schema (3) Fact constellation Schema

  14. OLAPhas a better (de-normalized) schema than OLTP Dimension tables are around a single fact table Query is much less complicated! Most measures (numbers) are pre-calculated, for example Total price = Qty x UnitPrice Tax cost = Qty x UnitPrice x TaxRate Profit = Revenue -Cost OLAP Database Solution

  15. OLAP Data modeling  called “Dimensional Modelling” Choose the right schema that best suits the queries required by users (managements) Star Schema is easy and simple to understand Analyse OLTP Database and Model Dimension tables (called Dimensional modeling) and data required for OLAP Model Fact table(s) A face table must contain some measure(s) DW Data Modeling : Overview

  16. Revisiting our Northwind example DW Data Modeling Star Schema Normalized OLTP Schema

  17. Identify the fact !! Recall the purposes of OLAP database? Answer queries that related to some “measure” and “time” DW Data Modeling: Fact table Q: What “fact subject” are we interested in ? Q: What “measure” we are looking at in Orders?

  18. Dimension tables Derived from the OLTP schema (not every table) Some are de-normalized PK of dimension table are auto generated (auto increment) Original “Natural Key” should remain !! But no longer as PK DW Data Modeling: Dimensions ? Q: Why does CategoryID not appear in dimProducts?

  19. DW Data Modeling : Dimensions De-normalizedProducts dimension Remember: Star schema has no normalized dimension tables

  20. Time dimension Attributes vary, depending on time period interested Don’t worry, time dimension table is auto-generated SQL Script provided for you  DW Data Modeling: Dimensions

  21. Date (Time) dimension : An example DW Data Modeling : Dimensions

  22. Design a Fact table and its dimensions DW Data Modeling: Fact table All related “Dimensions” are linked through their “Surrogatekeys” A fact table use a “composite” primary key

  23. Design a Fact table and its dimensions DW Data Modeling: Fact table Or, you can also implement “a surrogate key” for the PK of a fact table

  24. Northwind'sOLAP Database schema (Subject:Sale Orders) DW Data Modeling: An insight picture Time Dimension ALWAYSin OLAP!! Both RequiredDate and OrderDate reference to the same dimTimetable ButOrderDateKey is part of the PK De-normalized from Categories table A pre-calculated attribute. To be used for sum() function The key of each table is auto increment (generated by a system)

  25. OLAP Data modeling Choose the right schema that best suits the queries required by users (managements) Star vs. Snowflake vs. Fact constellation Usually, Star schema is chosen if data analysis requires only one FACT table and it is simpler and easier to query Model all Dimension tables and data required for OLAP Decide which dimension table is required Decide if any dimension table needs to be normalized Time dimension alwaysin OLAP schema!! Model Fact table(s) Decide how many fact table is required Each fact table, indicate measure(s), usually additive DW Data Modeling: A Summary

  26. Sales manager wants to list all beverageproducts and their totalrevenue (after discount)generated in Quarter 1 of 1996-1998 from a customer who lived in UK , group by and order by product name Northwind : A Query for OLAP Thef.TotalPriceis pre-calculated and stored in the fact table when generating a DW. SELECT p.ProductName, sum(f.TotalPrice) as [Total Revenue] FROM dimProducts p, dimCustomers c, factOrders f, dimTimet WHEREf.ProductKey=p.ProductKey AND f.CustomerKey=c.CustomerKey AND f.OrderDateKey=t.TimeKey AND p.CategoryName = 'Beverages‘ AND c.Country='UK‘ AND t.QuaterOfYear = 1 AND t.Year in (1996,1997,1998) GROUP BY p.ProductName ORDER BYp.ProductName Look at this date checking statement!! Much shorter and easier to understand than the one written for the OLTP database

  27. Compare them! SELECT p.ProductName, sum((od.UnitPrice*Od.Quantity)-((od.UnitPrice*Od.Quantity)*od.Discount)) as [Total Revenue] FROM Orders o, [Order Details] od, Products p, Categories ca, Customers cu WHERE p.ProductId=od.ProductID AND o.OrderID=od.OrderID AND ca.CategoryID=p.CategoryID AND cu.CustomerID=o.CustomerID AND ca.CategoryName = 'Beverages' AND cu.Country ='UK' AND ((o.OrderDate BETWEEN '1996-01-01' AND '1996-03-30') OR (o.OrderDate BETWEEN '1997-01-01' AND '1997-03-30') OR (o.OrderDate BETWEEN '1998-01-01' AND '1998-03-30')) GROUP BY p.ProductName ORDER BYp.ProductName Query in OLTP Database SELECT p.ProductName, sum(f.TotalPrice) as [Total Revenue] FROM dimProducts p, dimCustomers c, factOrders f, dimTimet WHEREf.ProductKey=p.ProductKey AND f.CustomerKey=c.CustomerKey AND f.OrderDateKey=t.TimeKey AND p.CategoryName= 'Beverages‘ AND c.Country ='UK‘ AND t.QuaterOfYear = 1 AND t.Year in (1996,1997,1998) GROUP BY p.ProductName ORDER BYp.ProductName Query in OLAP Database

  28. Some useful VDOs Dimensional modeling http://www.youtube.com/watch?v=cwpL-3rkRYQ Dimension tables vs. Fact tables http://www.youtube.com/watch?v=6k3nwXXpnMY Questions?

More Related