330 likes | 637 Views
Populating a Data Warehouse. Overview. Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data Warehouse by Using DTS. Source OLTP Systems. Temporary Data Staging Area. Data Marts. SQL Server. Sales. Oracle. Data Warehouse.
E N D
Overview • Process Overview • Methods of Populating a Data Warehouse • Tools for Populating a Data Warehouse • Populating a Data Warehouse by Using DTS
Source OLTP Systems Temporary DataStaging Area Data Marts SQL Server Sales Oracle Data Warehouse Service Other Other Sales Data Hardware Data Validate, Gather , Transform Populate Data Distribute Make Data Consistent Data Warehouse Data Process Overview
Validating Data • Validate and Correct Data at the Source Before You Import It • Determine and Correct Processes That Invalidate Data • Save Invalid Data to a Log for Review
Making Data Consistent • Data Can Be Inconsistent in Several Ways: • Data in each source is consistent, but you want to represent it differently in the data warehouse • Data is represented differently in different sources • You Can Make Data Consistent by: • Translating codes or values to readable strings • Converting multiple versions of the same information into a single representation
Transform Change buyer_name reg_id total_sales buyer_name reg_id total_sales Barr, Adam II 17.60 Barr, Adam 2 17.60 Chai, Sean IV 52.80 Chai, Sean 4 52.80 O’Melia, Erin VI 8.82 O’Melia, Erin 6 8.82 ... ... ... ... ... ... Combine buyer_first buyer_last reg_id total_sales buyer_name reg_id total_sales Adam Barr 2 17.60 Barr, Adam 2 17.60 Sean Chai 4 52.80 Chai, Sean 4 52.80 Erin O’Melia 6 8.82 O’Melia, Erin 6 8.82 ... ... ... ... ... ... ... Calculate buyer_name price_id qty_id buyer_name price_id qty_id total_sales Barr, Adam .55 32 Barr, Adam .55 32 17.60 Chai, Sean 1.10 48 Chai, Sean 1.10 48 52.80 O’Melia, Erin .98 9 O’Melia, Erin .98 9 8.82 ... ... ... ... ... ... ... Transforming Data
Methods of Populating a Data Warehouse • Select the Method of Populating a Data WarehouseThat Suits Your Business Needs • Method 1: Validate, combine, and transform datain a temporary data staging area • Method 2: Validate, combine, and transform data during the loading process • Migrate Data During Periods of Relatively Low System Use
Tools for Populating a Data Warehouse • What Is the Appropriate Tool to Use • Transact-SQL Query • Distributed Query • bcp Utility and the BULK INSERT Statement • DTS
What Is the Appropriate Tool to Use • Format of Source and Destination Data • Location of Source and Destination Data • Import or Export of Database Objects • Frequency of Data Transfer • Interface Preference • Tool Performance
CustomerSummary Customer FullName Johnson, Steve Smith, Douglas FirstName LastName Wilson, Les Steve Johnson Salinger, Paul Douglas Smith Les Wilson Paul Salinger USE northwind_mart SELECT Lastname + ', ' + Firstname As Fullname INTO CustomerSummary FROM Northwind.dbo.Customer Transact-SQL Query
AccountingServer StockServer Sales Local SQL Server Suppliers Table Products Table Sales Item_Dim Table Distributed Query USE northwind_mart SELECT productname, companyname INTO item_dim FROM StockServer.sales.dbo.products p JOIN AccountingServer.sales.dbo.suppliers s ON p.supplierid = s.supplierid
bcp Utility and the BULK INSERT Statement bcp Utililty BCP accounting.dbo.orders in Orderstbl.dat –c –t, -r \n–Smysqlserver –Usa –Pmypassword BULK INSERT Statement BULK INSERT Accounting.dbo.orders FROM 'C:\ordersdir\orderstble.dat' WITH( DATAFILE TYPE = 'char' FIELDTERMINATOR = '|', ROWTERMINATOR = '|\n')
DTS • When to Use DTS • DTS Data Source and Destination Types • OLE DB • ODBC • ASCII text file • DTS Tools • DTS Import and Export wizards • DTS Designer • dtsrun utility • Custom • HTML • Spreadsheet
Populating a Data Warehouse by Using DTS • Building a DTS Package • Transforming Data by Using an ActiveX Script • Transforming Data by Using a Lookup Query • Defining Transactions • Tracking Data Lineage • Creating a DTS Package Programmatically
Building a DTS Package • Mapping Source and Destination Data • Defining Data Transformation Tasks • Creating and Saving a DTS Package • Executing a DTS Package • Scheduling and Securing a DTS Package
Mapping Source and Destination Data • Mapping Columns • Decide which columns to copy • Choose the columns in the target database that map to the source columns • Mapping Data Types • Specify transformation rules • Specify levels of data conversion
Defining Data Transformation Tasks • DTS Packages Contain Tasks • A Task Can: • Execute a Transact-SQL statement • Execute a script • Launch an external application • Transfer SQL Server 7.0 objects • Execute or retrieve results from a DTS package
Creating and Saving a DTS Package • Creating a DTS Package • By using DTS wizards • By using DTS Designer • By using a COM interface exposed by DTS • Saving a DTS Package • COM-structured storage file • Microsoft Repository • SQL Server msdb database
Executing a DTS Package • You Can Execute a DTS Package by Using SQL Server Enterprise Manager or dtsrun Command Prompt Utility • File Storage Location Determines the dtsrun Syntax dtsrun /sAccounts /uJose /nOrdersImport
Scheduling and Securing a DTS Package • Scheduling a DTS Package • Use DTS Import or DTS Export wizards when you save the DTS package to the msdb database • Use SQL Server Enterprise Manager when you usethe dtsrun command prompt utility • Implementing DTS Package Security • Login permissions • Owner and user passwords
Transforming Data by Using an ActiveX Script • Why Use an ActiveX Script • How to Use an ActiveX Script • Define a function to contain the transformation script • Specify the destination column • Specify the source columns • Use operators and VBScript or JScript functions and control-of-flow statements • Set the return code value for the function • How to Handle Errors with Return Codes
CustomerSummary Customer FullName Johnson, Steve Smith, Douglas FirstName LastName Wilson, Les Steve Johnson Salinger, Paul Douglas Smith Les Wilson Paul Salinger Function Main() DTSDestination(“FullName”) = DTSSource(“Lastname”) + “, ” + DTSSource(“Firstname”) Main = DTSTransformStat_OK End Function Examples of ActiveX Scripts
Source Data State_lookup Abbreviation State FL Florida Lookup Table WY Wyoming AR Arkansas Transform Customer_source Customer_dim Destination Data Name State Name State D. Smith FL D. Smith Florida L. Wilson WY L. Wilson Wyoming P. Salinger AR P. Salinger Arkansas Transforming Data by Using a Lookup Query
Implementing a Lookup Query • Set Up Connections to Source, Destination, and Lookup Tables • Create a Task, and Specify the Source and Destination • Add a Lookup Query Definition • Map the Source and Destination Columns, andCall the Lookup Query from the ActiveX Script
Defining Transactions • You Specifically Must Add a Step or Task to the Transaction • You Can Specify When a Transaction Commits • DTS Only Supports One Transaction Per Package • MS DTC Must Be Running • The Data Provider for the Data Destination Must Support Transactions
1 2 3 Tracking Data Lineage • Using Data Lineage • Tracks history of data at package and table row levels • Provides audit trail of data transformation and DTS package execution • Implementing Data Lineage • Create the table columns in the data warehouse • Add data lineage variables to the DTS package • Map data lineage source and destination columns • Viewing Data Lineage
Demonstration: Defining Transactions and Tracking Data Lineage
DTS Package Steps Steps Steps Steps Steps Precedence Constraints Create Process Send Mail Bulk Insert Transfer Objects Steps Steps Steps Steps Steps Steps Connections Global Variables Tasks Execute SQL Data-driven Query Custom ActiveX Data Pump Source Columns Destination Creating a DTS Package Programmatically
Correct and Validate Data at the Source Use a Temporary Data Storage Area Use an ActiveX Script or a Transact-SQL Script to Transferand Transform Data Save and Store DTS Packages in the Microsoft Repository to Maintain Data Lineage Recommended Practices
Review • Process Overview • Methods of Populating a Data Warehouse • Tools for Populating a Data Warehouse • Populating a Data Warehouse by Using DTS