240 likes | 318 Views
Atlanta Microsoft Database Forum. Introduction to Data Warehousing Concepts. Presented by. Brian Thomas. Solution Builders, Inc. March 8, 2004. Brian.Thomas@SolutionBuilders.com. What is a Data Warehouse?.
E N D
Atlanta Microsoft Database Forum Introduction to Data Warehousing Concepts Presented by Brian Thomas Solution Builders, Inc. March 8, 2004 Brian.Thomas@SolutionBuilders.com
What is a Data Warehouse? Data collected from one or many systems that exist within and outside the organization. The Data is structured in such a way as to reduce the amount of time that it takes to produce reliable information.
Why Build a Data Warehouse? • To Provide a Consistent Common Source for Corporate Information • To Store Large Volumes of Historical Detail Data from Mission Critical Applications • Improve the Ability to Access, Report Against, and Analyze Information • To Solve or Improve Upon Business Processes
System Generated Reports Sales Analysis is extrapolated from the System Reports. Turning Data into Information Functional Data Warehouse Sales System
Functional Data Warehouse of Sales Information Sales Information is available to a wider audience of decision makers. Turning Data into Information Functional Data Warehouse Sales System
Analysis performed and Decisions drawn from the Cross Organizational Sales Data Centralized Data Warehouse of Sales Data from across the Organization Turning Data into Information Cross Organizational Functional Data Warehouse Division A Sales System Division B Sales System Division C Sales System
Corporate Performance Analysis is extrapolated from the System Reports. System Generated Reports Turning Data into Information Cross Functional Data Warehouse Marketing System Sales System Production Systems
Cross Functional Data Warehouse of Information Corporate Performance Analysis is available to a wider audience. Turning Data into Information Cross Functional Data Warehouse Marketing System Sales System Production Systems
Analysis is performed and Decisions made from the Cross Functional Organizational Performance Data Centralized Cross Functional Data Warehouse of Information Turning Data into Information Cross Organizational & Cross Functional Data Warehouse Division A Division B Division C
Enterprise Data Warehouse Increased Local Specifications Increased Level of Standardization Data Warehouse Architecture Management Systems Access Methods Source Systems Data Warehouse Components Planning & Forecasting Corporate Level Portal / Web Interface Division A Analytics & Modeling Desktop Applications Division B Business Group Level Performance Management DW / DM Data Access & Query Management Services DW / DM Extraction Transformation Load (ETL) DW / DM Printed Reports Division C Scorecards & Dashboards Email Divisional Level DM DM DM DM DM DM Query & Reporting External Data Mobile Devices
Data Warehouse Architecture Source Systems Data Staging Area Data Warehouse Repository Division A Division B Extract, Transformation and Load (ETL) Division C External Data
Data Warehouse Architecture Data Staging Area • Subject Area Oriented • Data Structure more closely mirrors Operational System Data Layouts • Supports Identification of Changed Data • Acts as a Working Area to Support the Transformation Process
Data Warehouse Architecture Extraction, Transformation & Load (ETL) • Perform Attribute Standardization and Cleansing • Apply Business Rules and Calculations • Consolidate using Matching and Merge / Purge Logic • Ensure Proper Linking and Tracking of History Extract, Transformation and Load (ETL)
Data Warehouse Architecture Extraction, Transformation & Load (ETL) App. A: Male , Female App. B: 1 , 0 App. C: x , y App. D: m , f Male, Female Lookup Function App. A: pipeline (cm) App. B: pipeline (inches) App. C: pipeline (mcf) App. D: pipeline (yds) pipeline (cm) Conversion Function App. A: Date (julian) App. B: Date (yyyymmdd) App. C: Date (mm/dd/yyyy) App. D: Date (absolute) Date (julian) Formatting Function App. A: Description App. B: Description App. C: Description App. D: Description Description Merging Function App. A: balance on hand App. B: current balance App. C: cash in house App. D: balance Balance Mapping Function
Data Warehouse Architecture Data Warehouse Repository • Organized around Conformed Dimensions and Facts • Promotes Usability and Intuitiveness • Consolidated and Cross-Functional • Historical and Atomic Representation of Data • Insulated from Source System Modifications and Additions
Data Warehouse Repository Star Schema Concepts Fact Table This table is the core of the Star Schema Structure and contains the Facts or Measures available through the Data Warehouse. These Facts answer the questions of “What”, “How Much”, or “How Many”. Some Examples: Sales Dollars, Units Sold, Gross Profit, Expense Amount, Net Income, Unit Cost, Number of Employees, Turnover, Salary, Tenure, etc.
Data Warehouse Repository Star Schema Concepts Dimension Tables These tables describe the Facts or Measures. These tables contain the Attributes and may also be Hierarchical. These Dimensions answer the questions of “Who”, “What”, “When”, or “Where”. Some Examples: • Day, Week, Month, Quarter, Year • Sales Person, Sales Manager, VP of Sales • Product, Product Category, Product Line • Cost Center, Unit, Segment, Business, Company
Shipper_Dim Time_Dim Customer_Dim Employee_Dim Product_Dim ShipperKey CustomerKey ProductKey EmployeeKey TimeKey Sales_Fact CustomerID . . . ProductID . . . ShipperID . . . TheDate . . . EmployeeID . . . TimeKey EmployeeKey ProductKey CustomerKey ShipperKey Required Data (Business Metrics) or (Measures) . . . Data Warehouse Repository Star Schema Concepts
Data Warehouse Repository Cube Concepts Atlanta Chicago MarketsDimension Denver Grapes Cherries Dallas Melons Product Dimension Apples Q4 Q1 Q2 Q3 TimeDimension
SalesFact Data Warehouse Repository Cube Concepts Atlanta Chicago MarketsDimension Denver Grapes Cherries Dallas Melons Product Dimension Apples Q4 Q1 Q2 Q3 TimeDimension
Data Warehouse Repository Storage Concepts • Relational On-Line Analytical Processing (ROLAP): The information that is stored in the Data Warehouse is held in a relational structure. Aggregations are performed on the fly either by the database or in the analysis tool. • Multidimensional On-Line Analytical Processing (MOLAP): This information is aggregated in a predefined manner based on the characteristics of the Measures and the defined hierarchy of the Dimensions. Since the data is pre-aggregated, navigating through the hierarchies is instantaneous. The user is simply navigating to a point within the Multidimensional Cube and not performing any on the fly aggregations. • Hybrid On-Line Analytical Processing (HOLAP): This is a combination of MOLAP and ROLAP. A portion of the data is predefined and aggregated. This would typically be the set of information that is accessed most frequently. Additional detail can be held in a ROLAP structure and allow a user to drill through the MOLAP structure into the ROLAP structure.
Client perspective MOLAP HOLAP ROLAP Query performance Fastest Faster Fast Storage consumption High Medium Low Data Warehouse Repository Cube Concepts
Enterprise Data Warehouse Increased Local Specifications Increased Level of Standardization Microsoft Office, Reporting Services and .NET Framework Where does Microsoft fit in? SQL Server DTS SQL Server Relational Database and Analysis Services Management Systems Access Methods Source Systems Data Warehouse Components Planning & Forecasting Corporate Level Portal / Web Interface Division A Analytics & Modeling Desktop Applications Division B Business Group Level SharePoint Portal, Exchange, and .NET Framework Performance Management DW / DM Data Access & Query Management Services DW / DM Extraction Transformation Load (ETL) DW / DM Printed Reports Division C Scorecards & Dashboards Email Divisional Level DM DM DM DM DM DM Query & Reporting External Data Mobile Devices SQL Stored Procedures, SQL Views, MDX, and .NET Web Services