180 likes | 264 Views
Virtual University of Pakistan. Data Warehousing. Lecture-13 Dimensional Modeling (DM). Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan@cluxing.com.
E N D
Virtual University of Pakistan Data Warehousing Lecture-13 Dimensional Modeling (DM) Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan@cluxing.com
The need for ER modeling? • Problems with early COBOLian data processing systems. • Data redundancies • From flat file to Table, each entity ultimately becomes a Table in the physical schema. • Simple O(n2) Join to work with Tables
Why ER Modeling has been so successful? • Coupled with normalization drives out all the redundancy out of the database. • Change (or add or delete) the data at just one point. • Can be used with indexing for very fast access. • Resulted in success of OLTP systems.
Need for DM: Un-answered Qs • Lets have a look at a typical ER data model first. • Some Observations: • All tables look-alike, as a consequence it is difficult to identify: • Which table is more important ? • Which is the largest? • Which tables contain numerical measurements of the business? • Which table contain nearly static descriptive attributes?
12 7 6 12 3 7 11 4 8 8 9 1 10 10 9 11 1 6 3 2 5 2 5 4 Need for DM: Complexity of Representation • Many topologies for the same ER diagram, all appearing different. • Very hard to visualize and remember. • A large number of possible connections to any two (or more) tables
Need for DM: The Paradox • The Paradox: Trying to make information accessible using tables resulted in an inability to query them! • ER and Normalization result in large number of tables which are: • Hard to understand by the users (DB programmers) • Hard to navigate optimally by DBMS software • Real value of ER is in using tables individually or in pairs • Too complex for queries that span multiple tables with a large number of records
How to simplify a ER data model? • Two general methods: • De-Normalization • Dimensional Modeling (DM)
What is DM?… • A simpler logical model optimized for decision support. • Inherently dimensional in nature, with a single central fact table and a set of smaller dimensional tables. • Multi-part key for the fact table • Dimensional tables with a single-part PK. • Keys are usually system generated
What is DM?... • Results in a star like structure, called star schema or star join. • All relationships mandatory M-1. • Single path between any two levels. • Supports ROLAP operations.
Books Cloths Fiction Text Men Women Engg Medical Dimensions have Hierarchies Items Analysts tend to look at the data through dimension at a particular “level” in the hierarchy
Snow-flake Star The two Schemas
“Simplified” 3NF (Retail) CITY DISTRICT M DIVISION PROVINCE 1 BACK district 1 1 division zone M M DISTRICT DIVISION ZONE CITY 1 store week M 1 STORE # STREET ZONE ... DATE WEEK 1 M quarter sale_header M M MONTH QTR RECEIPT # STORE # DATE ... 1 1 M M 1 WEEK MONTH M sale_detail month 1 RECEIPT # ITEM # ... $ YEAR QTR M M 1 year 1 ITEM # CATEGORY ITEM # SUPPLIER item_x_cat M item_x_splir 1 CATEGORY DEPT cat_x_dept
Product Dim ITEM# CATEGORY DEPT facts SUPPLIER Vastly Simplified Star Schema Geography Dim 1 1 STORE# Fact Table ZONE RECEIPT# CITY STORE# M DISTRICT ITEM# M DIVISION DATE Time Dim M PROVINCE . . . DATE 1 WEEK Sale Rs. MONTH QUARTER YEAR
The Benefit of Simplicity Beauty lies in close correspondence with the business, evident even to business users.
Features of Star Schema Dimensional hierarchies are collapsed into a single table for each dimension. Loss of Information? A single fact table created with a single header from the detail records, resulting in: • A vastly simplified physical data model! • Fewer tables (thousands of tables in some ERP systems). • Fewer joins resulting in high performance. • Some requirement of additional space.
Quantifying space requirement Quantifying use of additional space using star schema There are about 10 million mobile phone users in Pakistan. Say the top company has half of them = 500,000 Number of days in 1 year = 365 Number of calls recorded each day = 250,000 (assumed) Maximum number of records in fact table = 91 billion rows Assuming a relatively small header size = 128 bytes Fact table storage used = 11 Tera bytes Average length of city name = 8 characters 8 bytes Total number of cities with telephone access = 170 (1 byte) Space used for city name in fact table using Star = 8 x 0.091 = 0.728 TB Space used for city code using snow-flake = 1x 0.091= 0.091 TB Additional space used 0.637 Tera byte i.e. about 5.8%