760 likes | 974 Views
Chapter 16. Data Warehouse Concepts and Design. Outline. Basic concepts and characteristics Business architectures and applications Data cube concepts and operators Relational database design for data warehouses Enterprise data warehouse development. Historical Perspective.
E N D
Chapter 16 Data Warehouse Concepts and Design
Outline • Basic concepts and characteristics • Business architectures and applications • Data cube concepts and operators • Relational database design for data warehouses • Enterprise data warehouse development
Historical Perspective Failure of relational database technology for decision making Failure causes Lack of decision-making value of individual databases Infeasible to use a database for both operational support and decision making Missing features for summary data Data warehouse technology and organizational deployment developed in 1990s to address these issues.
Comparison of Environments • Transaction processing • Uses operational databases • Short-term decisions: fulfill orders, resolve complaints, provide staffing • Decision support processing • Uses integrated and summarized data • Medium and long-term decisions: capacity planning, store locations, new lines of business
Definition and Characteristics • A central repository for summarized and integrated data from operational databases and external data sources • Key Characteristics • Subject-oriented • Integrated • Time-variant • Nonvolatile
Architectures and Applications • Data warehouse projects • Top-down architectures • Bottom-up architecture • Applications and data mining
Data Warehouse Projects • Large efforts with much coordination across departments • Enterprise data model • Important artifact of data warehouse project • Structure of data model • Meta data for data transformation • Top-down vs. bottom-up business architectures
Maturity Model Importance • Guidance for investment decisions • Stages provide a framework to view an organization’s progress • Insights: difficulty moving between stages • Infant to child stages because of investment level • Teenager to adult because of strategic importance of data warehouse
Data Mining • Discover significant, implicit patterns • Target promotions • Change mix and collocation of items • Requires large volumes of transaction data • Important application for data warehouses
Data Cube Concepts and Operators • Basics • Dimension and measure details • Operators
Data Cube Basics • Multidimensional arrangement of data • Users think about decision support data as data cubes • Terminology • Dimension: subject label for a row or column • Member: value of dimension • Measure: quantitative data stored in cells
Dimensions and Measures • Dimensions • Hierarchies: members can have sub members • Sparsity: many cells do not have data • Measures • Derived measures • Multiple measures in cells
Measure Aggregation Properties Additive Summarized by addition across all dimensions Common measures such as sales, cost, and profit Semi-Additive Summarized in some but not all dimensions such as time Periodic measurements such as account balances and inventory levels Non-Additive Cannot be summarized by addition through any dimension Historical facts such as unit price for a sale
Measure Aggregation Example • Dimensions • Course: course id, degree, department, and college • Student: student id, major, department, and college • Time: semester, academic year, academic decade • Measures: • Credit hours • Grade • Unit tuition • Tuition • Aggregation properties for measures: ?
Time Series Data • Common data type in trend analysis • Reduce dimensionality using time series • Time series properties • Data type • Start date • Calendar • Periodicity • Conversion
Slice Operator • Focus on a subset of dimensions • Set dimension to specific value: 1/1/2010 (Location Product Slice for Time = 1/1/2010)
Dice Operator • Focus on a subset of member values • Replace dimension with a subset of values • Dice operation often follows a slice operation
Other Operators • Operators for hierarchical dimensions • Drill-down: add detail to a dimension • Roll-up: remove detail from a dimension • Recalculate measure values • Pivot: rearrange dimensions
Motivation for Data Warehouse Design • Representation of long-term memory for organizational decision making • Schema design captures measures and variables important to decision making • Poor schema design can lead to decision making problems.
Data Modeling Basics • Dimension table: contains member values • Fact table: contains measure values • 1-M relationships from dimension to fact tables • Grain: most detailed measure values stored
Types of Fact Tables • Transaction • Most common • Usually additive measures • Snapshot • Periodic view of asset level • Usually semi-additive measures • Factless • Event occurrence • No measures, just FKs
Summarizability Problems Motivation Intra dimension patterns and problems Fact-dimension patterns and problems
Summarizability Motivation Ability to compute summary computations for a coarse level of detail from finer levels of details Violations of summarizability Incorrect results Erroneous decision making and user confusion Inability to use performance optimizations Relationships among dimension levels and dimension and fact tables
Non Strict Dimension-Fact Relationship (a) Unit sales by salesperson (b) Shared unit sales by salesperson