1 / 25

Populating Data Warehouse Structures

Populating Data Warehouse Structures. Examining the Star Schema. Sales Star Schema. Fact Table. Dimension Tables. Dimension Table. Implementing the Star Schema. 1. Extract Data From Multiple Sources 2. Integrate, Transform, and Restructure Data

adamdaniel
Download Presentation

Populating Data Warehouse Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Populating Data Warehouse Structures

  2. Examining the Star Schema Sales Star Schema Fact Table Dimension Tables Dimension Table

  3. Implementing the Star Schema 1. Extract Data From Multiple Sources 2. Integrate, Transform, and Restructure Data 3. Load Data Into Dimension Tables and Fact Tables

  4. External Files Internal Files External Files The Star Schema Data Load Heterogeneous Data Sources Polaris Data Warehouse Northwind OLTP Staging Area Sales Star DTS DTS Financial Inventory Star Extracting Data From Transforming Loading the Heterogeneous Sources Data Star Schema DTS DTS

  5. Verifying the Dimension Source Data Verifying Accuracy of Source Data • Integrating data from multiple sources • Applying business rules • Checking structural requirements Correcting Invalid Data • Transforming data • Reassigning data values Managing Invalid Data • Rejecting invalid data • Saving invalid data to a log

  6. buyer_name reg_id Barr, Adam 2 Chai, Sean 4 Smith, Jane 2 Paper, Anne 4 buyer_code buyer_code buyer_first buyer_last buyer_last buyer_last reg_id reg_id reg_id buyer_name reg_id U999 Adam Barr Barr Barr 2 2 2 Barr, Adam A123 2 A123 Sean Chai Chai Chai 4 4 4 Chai, Sean B456 4 Erin B456 O’Melia O’Melia O’Melia 6 6 6 O’Melia, Erin ... 6 ... ... ... ... ... ... ... ... ... ... buyer_name reg_id Barr, Adam II Chai, Sean IV buyer_name reg_id Smith, Jane 2 Paper, Anne 4 Dimension Data Load Examples: DTS DTS DTS

  7. Maintaining Integrity of the Dimension • Assigning a Surrogate Key to Each Record • Defines the dimension’s primary key • Relates to the foreign key fields of the fact table • Loading One Record Per Application Key • Maintains uniqueness in the dimension • Depends on how you manage changing dimension data • Maintains integrity of the fact table

  8. Managing Changing Dimension Data • Dimensions with Changing Column Values • Inserts of new data • Updates of existing data • Slowly-Changing Dimension Design Solutions • Type 1: Overwrite the dimension record • Type 2: Write another dimension record • Type 3: Add attributes to the dimension record

  9. Product Dimension After product key product name product size product package product dept product cat product subcat ... 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks ... Before 12 oz. 001 Rice Puffs 12 Oz Bag Grocery Dry Goods Snacks ... Existing record is changed Type 1: Overwriting the Dimension Slide

  10. Product Dimension Before product key product name product size product package product dept product cat product subcat effective_date … 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks 05-01-1995 ... 001 Rice Puffs 10 Oz Bag Grocery Dry Goods Snacks 05-01-1995 ... Type 2: Writing Another Dimension Record After Rice Puffs 12 Oz Bag Grocery Dry Goods Snacks 10-15-1998 ... 731 12 oz. 10 oz. Adds a new record

  11. Before After Product Dimension 001 Rice Puffs 10 Oz Bag Grocery Dry Goods Snacks 05-01-1995 11 Oz 03-20-1994 (null) (null) ... 001 Rice Puffs 12 oz. Bag Grocery Dry Goods Snacks 10-15-1998 10 oz. 05-01-1995 11 Oz 03-20-1994 ... product key product name product size product package product dept product cat product subcat current product size date previous product size previous product size date 2nd previous product size 2nd previous product size date ... 12 oz 10 oz. product size 10-15-1998 previous product size previous product size date 11 oz. 03-20-1994 05-01-1995 11 oz. 03-20-1994 Additional information is storedin an existing record Type 3: Adding Attributes in the Dimension Record

  12. Verifying the Fact Table Source Data Verifying Accuracy of Source Data • Integrating data from multiple sources • Applying business rules • Checking structural requirements Correcting Invalid Data • Transforming data • Reassigning data values Managing Invalid Data • Rejecting invalid data • Saving invalid data to a log

  13. time_dim 134 1/1/2000 Assigning Foreign Keys DimensionTables Source Data customer_dim customer id product id order date quantity_sales amount_sales 201 ALFI Alfreds ALFI 123 123 1/1/2000 1/1/2000 400 400 10,789 10,789 product_dim 25 123 Chai Sales Fact Data cust_key prod_key time_key quantity_sales amount_sales 201 123 25 1/1/2000 134 400 400 10,789

  14. product_id 9GZ 1KJ 0ZA customer_key product_key qty ... 100 512 32 238 207 48 437 338 9 customer_id price qty total_sales ... ... ... VINET .55 32 17.60 ALFI 1.10 48 52.80 HANAR .98 9 8.82 ... ... ... ... Defining Measures • Loading Measures from the Source System • Calculating Additional Measures Source System Data Fact Table Data

  15. Maintaining Data Integrity • Adhering to the Fact Table Grain • A fact table can only have one grain • You must load a fact table with data at the same level of detail as defined by the grain • Enforcing Column Constraints • NOT NULL constraints • FOREIGN KEY constraints

  16. Implementing Staging Tables • Centralize and Integrate Source Data • Break Up Complex Data Transformations • Facilitate Error Recovery market_stage Staging Area sales_stage shipments_stage inventory_stage

  17. DTS Functionality • Accessing Heterogeneous Data Sources • Importing, Exporting, and Transforming Data • Creating Reusable Transformations and Functions • Automating Data Loads • Managing Metadata • Customizing and Extending Functionality

  18. Defining DTS Packages • Identifies Data Sources and Destinations • Defines Tasks or Actions • Implements Transformation Logic • Defines Order of Operations

  19. Identifying Package Components • ConnectionsAccess Data Sources and Destinations • TasksDescribe Data Transformations or Functions • StepsDefine the Order of Task Operations or Workflow • Global Variables Store Data that Can Be Shared Across Tasks

  20. Creating Packages • Using the DTS Import / Export Wizard • Perform ad-hoc table and data transfers • Develop a prototype package • Using DTS Package Designer • Edit packages created with the DTS Import/Export Wizard • Create packages with a wide range of functionality • Programming DTS Applications • Directly access the functionality of the DTS Object Model • Requires Microsoft Visual Basic or Microsoft Visual C++

  21. Using DTS to Populate the Sales Star • Populating the Sales Star Dimensions • Populating the Sales Star Fact Table

  22. Populating the Sales Star Dimensions product_dim Product Tab Delimited Files DTS customer_dim Northwind OLTP DTS time_dim SQL Server Stored Procedure DTS

  23. product_dim sales_stage customer_dim time_dim Populating the Sales Star Fact Table Sales Data File sales_stage DTS sales_fact DTS

  24. Designing Modular Packages • Creating Modular Packages • Simplify complex workflows • Create more readable packages • Produce smaller packages that are easier to debug • Using Outer Packages • Execute multiple packages within a single package • Combine modular packages into logical workflows • Reuse modular packages in different workflows • Execute packages in parallel

  25. Using DTS to Populate the Sales Star

More Related