320 likes | 827 Views
ISCG 6425 Data Warehousing (DW) Week 3 DW Data Modeling. Lecturer : Dr. Sira Yongchareon. OLTP Database Analysis (Northwind Case study) OLAP Database Requirement Analysis DW Data Modeling. Outline. OLTP ( O n- L ine T ransaction P rocessing) Databases
E N D
ISCG 6425Data Warehousing(DW)Week 3DWData Modeling Lecturer : Dr. Sira Yongchareon
OLTP Database Analysis (Northwind Case study) OLAP Database Requirement Analysis DW Data Modeling Outline
OLTP(On-Line Transaction Processing) Databases Database designed for day-to-day operations / transactions Queries contain SELECT, INSERT, UPDATE, DELETE Normalized schema (no duplicates, no inconsistency data) OLAP (On-Line Analytical Processing) Database Database designed for business analytics, summary reports, decision making Queries have SELECTonly(no INSERT, DELETE, UPDATE) De-normalized schema (lots of duplicates/redundancies) Revisit: OLTP vs OLAP Databases
In year 1996, Northwind Retail Company deployed an MS-SQL database management system to store their transactional data related to their customers, products, and sales transactions. Their OLTP database was named northwind1. Case Study : Northwind This database model was drawn in MS Visio (we will use it in this course). Different drawing but similar to the ERD we have learnt.
Northwind's Requirements Sales manager wants to list all product names and their totalrevenue generated in year 1998,group by product name Case Study : Northwind Can you write an SQL query that can answer the above question?
Northwind's Requirements Sales manager wants to list all product names and their totalrevenue generated in year 1998, group by product name Case Study : Northwind Group by “product name” For a product ordered in 1998 Calculated revenue = (UnitPrice x Quantity)
Northwind's Requirements Sales manager wants to list all product names and their totalrevenue generated in year 1998, group by product name Case Study : Northwind SELECTp.ProductName, sum(od.UnitPrice*Od.Quantity) as [Total Revenue] FROM Orders o, [Order Details] od, Products p WHEREp.ProductId=od.ProductID and o.OrderID=od.OrderID AND o.OrderDate BETWEEN '1998-01-01' AND '1998-12-31' GROUP BY p.ProductName
Try one more….. Sales manager wants to list all beverageproducts and their totalrevenue (after discount)generated in Quarter 1 of 1996-1998 from a customer who lived in UK , group and order by product name Case Study : Northwind Can you write an SQL query that can answer the above question?
Sales manager wants to list all beverageproducts and their totalrevenue (after discount)generated in Quarter 1 of 1996-1998 from a customer who lived in UK , group and order by product name Case Study : Northwind Only ‘UK’ Group by “product name” For a product ordered in Q1 of 1996, 1997 and 1998 Calculated revenue = (UnitPrice x Quantity)-%Discount CategoryName = ‘Beverages’
Sales manager wants to list all beverageproducts and their totalrevenue (after discount)generated in Quarter 1 of 1996-1998 from a customer who lived in UK , group by and order by product name Case Study : Northwind Every time of select, system calculates these figures SELECT p.ProductName, sum((od.UnitPrice*Od.Quantity)-((od.UnitPrice*Od.Quantity)*od.Discount)) as [Total Revenue] FROM Orders o, [Order Details] od, Products p, Categories ca, Customers cu WHERE p.ProductId=od.ProductID AND o.OrderID=od.OrderID AND ca.CategoryID=p.CategoryID AND cu.CustomerID=o.CustomerID AND ca.CategoryName = 'Beverages' AND cu.Country ='UK' AND ((o.OrderDate BETWEEN '1996-01-01' AND '1996-03-30') OR (o.OrderDate BETWEEN '1997-01-01' AND '1997-03-30') OR (o.OrderDate BETWEEN '1998-01-01' AND '1998-03-30')) GROUP BY p.ProductName ORDER BYp.ProductName Look at this date checking statement!! This just for 3 years, what about for 10 years ?!?!?!
Querying from OLTP database seems OK…. However, The query can be very complicated … Also requires to compute every time (e.g., total revenue = Qty * Price) The more complicated join, the slower respond Think about to join 10,000,000 records of Orders with 100,000 customers, and 10,000 products… Our northwind has all data in just one database, What about joining 100 databases (from different branches around the world) DW is our HERO… born to solve these problems!! What’s the problem?
Revisit: OLAP Schemas (1) Star Schema (2)Snowflake Schema (3) Fact constellation Schema
OLAPhas a better (de-normalized) schema than OLTP Dimension tables are around a single fact table Query is much less complicated! Most measures (numbers) are pre-calculated, for example Total price = Qty x UnitPrice Tax cost = Qty x UnitPrice x TaxRate Profit = Revenue -Cost OLAP Database Solution
OLAP Data modeling called “Dimensional Modelling” Choose the right schema that best suits the queries required by users (managements) Star Schema is easy and simple to understand Analyse OLTP Database and Model Dimension tables (called Dimensional modeling) and data required for OLAP Model Fact table(s) A face table must contain some measure(s) DW Data Modeling : Overview
Revisiting our Northwind example DW Data Modeling Star Schema Normalized OLTP Schema
Identify the fact !! Recall the purposes of OLAP database? Answer queries that related to some “measure” and “time” DW Data Modeling: Fact table Q: What “fact subject” are we interested in ? Q: What “measure” we are looking at in Orders?
Dimension tables Derived from the OLTP schema (not every table) Some are de-normalized PK of dimension table are auto generated (auto increment) Original “Natural Key” should remain !! But no longer as PK DW Data Modeling: Dimensions ? Q: Why does CategoryID not appear in dimProducts?
DW Data Modeling : Dimensions De-normalizedProducts dimension Remember: Star schema has no normalized dimension tables
Time dimension Attributes vary, depending on time period interested Don’t worry, time dimension table is auto-generated SQL Script provided for you DW Data Modeling: Dimensions
Date (Time) dimension : An example DW Data Modeling : Dimensions
Design a Fact table and its dimensions DW Data Modeling: Fact table All related “Dimensions” are linked through their “Surrogatekeys” A fact table use a “composite” primary key
Design a Fact table and its dimensions DW Data Modeling: Fact table Or, you can also implement “a surrogate key” for the PK of a fact table
Northwind'sOLAP Database schema (Subject:Sale Orders) DW Data Modeling: An insight picture Time Dimension ALWAYSin OLAP!! Both RequiredDate and OrderDate reference to the same dimTimetable ButOrderDateKey is part of the PK De-normalized from Categories table A pre-calculated attribute. To be used for sum() function The key of each table is auto increment (generated by a system)
OLAP Data modeling Choose the right schema that best suits the queries required by users (managements) Star vs. Snowflake vs. Fact constellation Usually, Star schema is chosen if data analysis requires only one FACT table and it is simpler and easier to query Model all Dimension tables and data required for OLAP Decide which dimension table is required Decide if any dimension table needs to be normalized Time dimension alwaysin OLAP schema!! Model Fact table(s) Decide how many fact table is required Each fact table, indicate measure(s), usually additive DW Data Modeling: A Summary
Sales manager wants to list all beverageproducts and their totalrevenue (after discount)generated in Quarter 1 of 1996-1998 from a customer who lived in UK , group by and order by product name Northwind : A Query for OLAP Thef.TotalPriceis pre-calculated and stored in the fact table when generating a DW. SELECT p.ProductName, sum(f.TotalPrice) as [Total Revenue] FROM dimProducts p, dimCustomers c, factOrders f, dimTimet WHEREf.ProductKey=p.ProductKey AND f.CustomerKey=c.CustomerKey AND f.OrderDateKey=t.TimeKey AND p.CategoryName = 'Beverages‘ AND c.Country='UK‘ AND t.QuaterOfYear = 1 AND t.Year in (1996,1997,1998) GROUP BY p.ProductName ORDER BYp.ProductName Look at this date checking statement!! Much shorter and easier to understand than the one written for the OLTP database
Compare them! SELECT p.ProductName, sum((od.UnitPrice*Od.Quantity)-((od.UnitPrice*Od.Quantity)*od.Discount)) as [Total Revenue] FROM Orders o, [Order Details] od, Products p, Categories ca, Customers cu WHERE p.ProductId=od.ProductID AND o.OrderID=od.OrderID AND ca.CategoryID=p.CategoryID AND cu.CustomerID=o.CustomerID AND ca.CategoryName = 'Beverages' AND cu.Country ='UK' AND ((o.OrderDate BETWEEN '1996-01-01' AND '1996-03-30') OR (o.OrderDate BETWEEN '1997-01-01' AND '1997-03-30') OR (o.OrderDate BETWEEN '1998-01-01' AND '1998-03-30')) GROUP BY p.ProductName ORDER BYp.ProductName Query in OLTP Database SELECT p.ProductName, sum(f.TotalPrice) as [Total Revenue] FROM dimProducts p, dimCustomers c, factOrders f, dimTimet WHEREf.ProductKey=p.ProductKey AND f.CustomerKey=c.CustomerKey AND f.OrderDateKey=t.TimeKey AND p.CategoryName= 'Beverages‘ AND c.Country ='UK‘ AND t.QuaterOfYear = 1 AND t.Year in (1996,1997,1998) GROUP BY p.ProductName ORDER BYp.ProductName Query in OLAP Database
Some useful VDOs Dimensional modeling http://www.youtube.com/watch?v=cwpL-3rkRYQ Dimension tables vs. Fact tables http://www.youtube.com/watch?v=6k3nwXXpnMY Questions?