1 / 15

Data Warehouse Design

Data Warehouse Design. Alan Schneider. Overview. Operational Design Review Data Warehouse Dimensional Model Star Schema Multiple Star Schemas Fact Tables Granularity Dimension Tables Time Hierarchical Drilling Aggregates / Summaries. Operational Design Review.

Faraday
Download Presentation

Data Warehouse Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouse Design Alan Schneider

  2. Overview • Operational Design Review • Data Warehouse Dimensional Model • Star Schema • Multiple Star Schemas • Fact Tables • Granularity • Dimension Tables • Time • Hierarchical Drilling • Aggregates / Summaries

  3. Operational Design Review • Decisional queries require the summation of hundreds to tens of thousands of figures stored in perhaps as many rows in the database. • Such processing on a fully normalized data structure is slow and cumbersome. • For example: If a business manager requires a Product Sales per Customer report the program code must access the Customer, Account, Account Type, Order, Order Line Item, and Product tables to compute the totals.

  4. Dimensional Modeling • Refers to a set of data modeling techniques that have gained popularity and acceptance for data warehouse implementations. • Two types of tables are used in dimensional modeling: • Fact tables • Dimensional tables.

  5. Dimensional Modeling Star Schema • Is a Fact Table Plus Its Related Dimension Tables • Visually, a dimensional schema looks very much like a star, hence the use of the term star schema to describe dimensional models. • Fact tables reside at the center of the schema, and their dimensions are typically drawn around it, • The dimensions are Client, Time, Product and Organization. The fields in these tables are used to describe the facts in the Sales Fact table.

  6. Multiple Star Schemas(Snow Flake) • A data warehouse will most likely have multiple star schemas, i.e., many Fact tables. Each schema is designed to meet a specific set of information needs. • Multiple schemas, each focusing on a different aspect of the business, are natural in a dimensional warehouse. • Equally normal is the use of the same Dimension table in more than one schema. The classic example of this is the Time dimension. • For example, a retail company that has one star schema to track profitability per store may make use of the same Time dimension table in the star schema that tracks profitability by product.

  7. Fact Tables • Fact tables are used to record actual facts or measures in the business. Facts are the numeric data items that are of interest to the business. • Below are examples of facts for different industries: • Retail. Number of units sold, sales amount • Telecommunications. Length of call in minutes, average number of calls • Banking. Average daily balance, transaction amount • Insurance. Claims amounts • Airline. Ticket cost, baggage weight • Are Fully Normalized

  8. Fact TableGranularity • The term Granularity is used to indicate the level of detail stored in the fact table. The granularity of the Fact table follows naturally from the level of detail of its related dimensions. • For example, if each Time record represents a day, each Product record represents a product, and each Organization record represents one branch, then the grain of a sales Fact table with these dimensions would likely be: sales per product per day per branch. • Proper identification of the granularity of each schema is crucial to the usefulness and cost of the warehouse.

  9. Fact Table GranularityToo High • Severely limits the ability of users to obtain additional detail. • For example, if each time record represented an entire year, there will be one sales fact record for each year, and it would not be possible to obtain sales figures on a monthly or daily basis.

  10. Fact Table GranularityToo Low • Results in an exponential increase in the size requirements of the warehouse. • For example, if each time record represented an hour, there will be one sales fact record for each hour of the day • 8,760 sales fact records for a year with 365 days for each combination of Product, Client, and Organization • If daily sales facts are all that are required, the number of records in the database can be reduced dramatically.

  11. Dimension Tables • Establish the context of the facts. • Dimensional tables store fields that describe the facts. • Below are examples of dimensions for the same industries: • Retail. Store name, store zip code, product name, product category, day of week • Telecommunications. Call origin, call destination • Banking. Customer name, account number, data, branch, account officer • Insurance. Policy type, insured party • Airline. Flight number, flight destination, airfare class • Dimensions Are De-normalized: Product Example:

  12. Time Dimension • The One of the goals of the data warehouse is to offload historical data from the operational systems. • Each fact in the data warehouse must therefore be time-stamped. • This requirement is met through the Time dimension, which is always present in any warehouse schema. • Each record in the Time dimension represents a meaningful chunk of time for the enterprise. • Time dimension depends entirely on the business requirements.

  13. Hierarchical Drilling • As a result of denormalization of the dimensions, each dimension will quite likely have hierarchies that imply the grouping and structure. • For example: • Time dimension has a Day-Month-Quarter-Year hierarchy. Similarly, • Store dimension may have a City-Country-Region-All Stores hierarchy. • Product dimension may have a Product-Product Category-Product Department-All Products hierarchy. • When warehouse users drill up and down for detail, they typically drill up and down these dimensional hierarchies to obtain more or less detail about the business.

  14. Aggregates / Summaries • Aggregates or Summaries are one of the most powerful concepts in data warehousing. • The proper use of aggregates dramatically improves the performance of the data warehouse in terms of query response times, and therefore improves the overall performance and usability of the warehouse. • For queries that require only high-level or summarized data. • Users run queries against aggregated data versus detailed data--significantly smaller number of records.

  15. Design Summary • Dimensional modeling is simple • Dimensional modeling techniques make it possible for warehouse designers to create database schemas that business users can easily grasp and comprehend. • Dimensional modeling promotes data quality • The Star schema allows the warehouse to enforce referential integrity checks. Since the fact record key is a concatenation of the keys of its related dimensions--adds a line of defense against corrupted warehouse data. • Performance optimization is possible through aggregates • As the size of the warehouse increases, performance optimization becomes a pressing concern--aggregates are one of the most manageable ways by which query performance can be optimized.

More Related