260 likes | 412 Views
Competing on Analytics II. BI Tools and Techniques Robert Monroe April 1, 2008. Goals. Introduce dimensional models (aka star schemas) and why they are useful in data warehousing
E N D
Competing on Analytics II BI Tools and Techniques Robert Monroe April 1, 2008
Goals • Introduce dimensional models (aka star schemas) and why they are useful in data warehousing • Explore some common business processes, both inward-facing and outward-facing, that companies choose as a basis for analytic competition • Understand the type of data management and warehousing infrastructure needed to support those analytic capabilities
Quick Review: The Three-Layer Data Architecture • Data goes through three common stages during ETL • Operational Data • transactional data stored in individual systems of record throughout the organization • Reconciled Data • detailed, current data intended to be the single, authoritative source for all decision support applications • Derived Data • data that have been selected, formatted, and aggregated for end-user decision support applications Derived Data Reconciled Data Operational Data
Quick Review: The ETL Process Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Quick Review: Typical Data Warehouse Structure ReconcileData Derive Data Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Derived Data Reconciled Data Operational Data Derived Data • Although reconciled data provides a consistent, hiqh-quality collection of enterprise data it is not necessarily in an efficient form for use by BI tools • Derived data objectives: • Ease of use for decision support applications • Fast response to predefined user queries • Customized data for particular target audiences • Ad-hoc query support • Data mining capabilities • Characteristics • Detailed (mostly periodic) data • Aggregated (for summary) • Processed • Distributed (to data marts)
Dimensional Modeling: Facts and Dimensions • Dimensional Modeling • a simple database design in which dimensional data are separated from fact or event data. Dimensional models are also sometimes called star schemas. • Dimensional models are a common way to represent derived data for informational data stores • Well suited to ad-hoc queries and OLAP • Poorly suited for transaction processing • Commonly used for data warehouse/mart storage model
Dimensional Modeling Dimensional modeling is a simple database design pattern in which dimensional data are separated from fact or event data. Dimensional models are also sometimes called star schemas. Dimensional models are a common way to represent derived data for analytic data stores • Well suited to ad-hoc queries and OLAP • Poorly suited for transaction processing
Dimension tables contain descriptions about the subjects of the business Star Schema Structure Fact tables contain factual or quantitative data Dimension tables are denormalized to maximize performance 1:N relationship between dimension tables and fact tables Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Fact table provides statistics for sales broken down by product, period and store dimensions Dimension tables provides details on stores, products, and time periods Star Schema Example Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Star Schema Example With Data Product Period Store Sales Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Dimensional Model Benefits • Simple and predictable framework • Well suited to ad-hoc analytical queries • Relatively straightforward mapping from most transactional systems • Dimensional independence • Query performance is somewhat independent of dimensions used in the query • Straightforward model extensions support evolution
Dimension Hierarchies • Dimension tables can capture hierarchies • Dimensions use levels to represent hierarchies • Each sub-level subdivides the parent level with finer granularity • Examples • Dimension: Time Period • Levels – Year :: Quarter :: Month :: Week :: Day • Dimension: Organization • Levels – Company :: Division :: Department :: Employee • Exercise: • Define the levels for a Geography dimension for customer locations
Facts and Measures • Measures represent the interesting data at the intersection of different dimensions • There is a space for a measure at every intersection of every level of every dimension • Base facts are stored in the intersections of lowest-level dimensions (either simple or calculated measures) • Aggregate or computed values are stored at the intersections of where all of the dimensions are not at the lowest level (aggregate values must be calculated measures)
Modeling Hierarchies • Dimension tables frequently model hierarchies • Example: • Customers dimension stores data about your customers • You may sell to several divisions of a single company • You want to be able to analyze sales to the individual divisions and also capture “rolled-up” values for the parent company Divisions of ABC Automotive Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Modeling Hierarchies With Denormalized Tables • Hierarchical dimensions are frequently represented with denormalized tables • This approach simplifies and speeds queries • … at the expense of introducing anomalies
In-Class Exercise: Star Schema • Form teams of 2-3 people • Complete exercise 2, question #1 on handout • Build a star schema to store grades at Millenium College
Modeling Dates and Time • Dates, times, and time periods are almost always included in BI systems, but they can be tricky to model • Decide on the granularity of dates carefully (big effect on size) • A common approach to modeling dates as a dimension: Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Multiple Fact Tables • It is frequently useful to store more than one type of fact in a single multidimensional database (star schema) • This can be handled by using multiple fact tables that share dimensions • Example: modeling products sold and products purchased Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
Conformed Dimensions • When dimensions are shared across multiple fact tables they must be conformed dimensions • Conformed dimensions • One or more dimension tables associated with two or more fact tables for which the dimension tables have the same business meaning and primary key with each fact table • Conformed dimensions allow users to: • Query across multiple fact tables • Improve consistency of meaning and structure for derived and retrieved information
Factless Fact Tables – Tracking Events • “Factless” fact tables store only foreign keys, no facts • Factless fact tables allow the tracking of what types of events happened, and under what circumstances they happened Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.
In-Class Exercise 2: Extending The Star Schema • Form 2-3 person teams • Add the ability to store new facts for FCE data: • Specific facts to store: • Aggregate average course eval rating (1-5) for each course section offered • Aggregate average instructor eval rating (1-5) for each course section offered
ETL Tools – Microsoft SSIS Demo Image Source: Conchango Consulting