250 likes | 794 Views
Populating Data Warehouse Structures. Examining the Star Schema. Sales Star Schema. Fact Table. Dimension Tables. Dimension Table. Implementing the Star Schema. 1. Extract Data From Multiple Sources 2. Integrate, Transform, and Restructure Data
E N D
Examining the Star Schema Sales Star Schema Fact Table Dimension Tables Dimension Table
Implementing the Star Schema 1. Extract Data From Multiple Sources 2. Integrate, Transform, and Restructure Data 3. Load Data Into Dimension Tables and Fact Tables
External Files Internal Files External Files The Star Schema Data Load Heterogeneous Data Sources Polaris Data Warehouse Northwind OLTP Staging Area Sales Star DTS DTS Financial Inventory Star Extracting Data From Transforming Loading the Heterogeneous Sources Data Star Schema DTS DTS
Verifying the Dimension Source Data Verifying Accuracy of Source Data • Integrating data from multiple sources • Applying business rules • Checking structural requirements Correcting Invalid Data • Transforming data • Reassigning data values Managing Invalid Data • Rejecting invalid data • Saving invalid data to a log
buyer_name reg_id Barr, Adam 2 Chai, Sean 4 Smith, Jane 2 Paper, Anne 4 buyer_code buyer_code buyer_first buyer_last buyer_last buyer_last reg_id reg_id reg_id buyer_name reg_id U999 Adam Barr Barr Barr 2 2 2 Barr, Adam A123 2 A123 Sean Chai Chai Chai 4 4 4 Chai, Sean B456 4 Erin B456 O’Melia O’Melia O’Melia 6 6 6 O’Melia, Erin ... 6 ... ... ... ... ... ... ... ... ... ... buyer_name reg_id Barr, Adam II Chai, Sean IV buyer_name reg_id Smith, Jane 2 Paper, Anne 4 Dimension Data Load Examples: DTS DTS DTS
Maintaining Integrity of the Dimension • Assigning a Surrogate Key to Each Record • Defines the dimension’s primary key • Relates to the foreign key fields of the fact table • Loading One Record Per Application Key • Maintains uniqueness in the dimension • Depends on how you manage changing dimension data • Maintains integrity of the fact table
Managing Changing Dimension Data • Dimensions with Changing Column Values • Inserts of new data • Updates of existing data • Slowly-Changing Dimension Design Solutions • Type 1: Overwrite the dimension record • Type 2: Write another dimension record • Type 3: Add attributes to the dimension record
Product Dimension After product key product name product size product package product dept product cat product subcat ... 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks ... Before 12 oz. 001 Rice Puffs 12 Oz Bag Grocery Dry Goods Snacks ... Existing record is changed Type 1: Overwriting the Dimension Slide
Product Dimension Before product key product name product size product package product dept product cat product subcat effective_date … 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks 05-01-1995 ... 001 Rice Puffs 10 Oz Bag Grocery Dry Goods Snacks 05-01-1995 ... Type 2: Writing Another Dimension Record After Rice Puffs 12 Oz Bag Grocery Dry Goods Snacks 10-15-1998 ... 731 12 oz. 10 oz. Adds a new record
Before After Product Dimension 001 Rice Puffs 10 Oz Bag Grocery Dry Goods Snacks 05-01-1995 11 Oz 03-20-1994 (null) (null) ... 001 Rice Puffs 12 oz. Bag Grocery Dry Goods Snacks 10-15-1998 10 oz. 05-01-1995 11 Oz 03-20-1994 ... product key product name product size product package product dept product cat product subcat current product size date previous product size previous product size date 2nd previous product size 2nd previous product size date ... 12 oz 10 oz. product size 10-15-1998 previous product size previous product size date 11 oz. 03-20-1994 05-01-1995 11 oz. 03-20-1994 Additional information is storedin an existing record Type 3: Adding Attributes in the Dimension Record
Verifying the Fact Table Source Data Verifying Accuracy of Source Data • Integrating data from multiple sources • Applying business rules • Checking structural requirements Correcting Invalid Data • Transforming data • Reassigning data values Managing Invalid Data • Rejecting invalid data • Saving invalid data to a log
time_dim 134 1/1/2000 Assigning Foreign Keys DimensionTables Source Data customer_dim customer id product id order date quantity_sales amount_sales 201 ALFI Alfreds ALFI 123 123 1/1/2000 1/1/2000 400 400 10,789 10,789 product_dim 25 123 Chai Sales Fact Data cust_key prod_key time_key quantity_sales amount_sales 201 123 25 1/1/2000 134 400 400 10,789
product_id 9GZ 1KJ 0ZA customer_key product_key qty ... 100 512 32 238 207 48 437 338 9 customer_id price qty total_sales ... ... ... VINET .55 32 17.60 ALFI 1.10 48 52.80 HANAR .98 9 8.82 ... ... ... ... Defining Measures • Loading Measures from the Source System • Calculating Additional Measures Source System Data Fact Table Data
Maintaining Data Integrity • Adhering to the Fact Table Grain • A fact table can only have one grain • You must load a fact table with data at the same level of detail as defined by the grain • Enforcing Column Constraints • NOT NULL constraints • FOREIGN KEY constraints
Implementing Staging Tables • Centralize and Integrate Source Data • Break Up Complex Data Transformations • Facilitate Error Recovery market_stage Staging Area sales_stage shipments_stage inventory_stage
DTS Functionality • Accessing Heterogeneous Data Sources • Importing, Exporting, and Transforming Data • Creating Reusable Transformations and Functions • Automating Data Loads • Managing Metadata • Customizing and Extending Functionality
Defining DTS Packages • Identifies Data Sources and Destinations • Defines Tasks or Actions • Implements Transformation Logic • Defines Order of Operations
Identifying Package Components • ConnectionsAccess Data Sources and Destinations • TasksDescribe Data Transformations or Functions • StepsDefine the Order of Task Operations or Workflow • Global Variables Store Data that Can Be Shared Across Tasks
Creating Packages • Using the DTS Import / Export Wizard • Perform ad-hoc table and data transfers • Develop a prototype package • Using DTS Package Designer • Edit packages created with the DTS Import/Export Wizard • Create packages with a wide range of functionality • Programming DTS Applications • Directly access the functionality of the DTS Object Model • Requires Microsoft Visual Basic or Microsoft Visual C++
Using DTS to Populate the Sales Star • Populating the Sales Star Dimensions • Populating the Sales Star Fact Table
Populating the Sales Star Dimensions product_dim Product Tab Delimited Files DTS customer_dim Northwind OLTP DTS time_dim SQL Server Stored Procedure DTS
product_dim sales_stage customer_dim time_dim Populating the Sales Star Fact Table Sales Data File sales_stage DTS sales_fact DTS
Designing Modular Packages • Creating Modular Packages • Simplify complex workflows • Create more readable packages • Produce smaller packages that are easier to debug • Using Outer Packages • Execute multiple packages within a single package • Combine modular packages into logical workflows • Reuse modular packages in different workflows • Execute packages in parallel