290 likes | 401 Views
Copying, Managing, and Transforming Data With DTS. Defining Bulk Insert Task Functionality. Quickly Loads Data from a File into SQL Server Encapsulates the Transact-SQL Bulk Insert Statement Supports Table or View Destinations in SQL Server Loads Data with No Applied Transformations
E N D
Defining Bulk Insert Task Functionality • Quickly Loads Data from a File into SQL Server • Encapsulates the Transact-SQL Bulk Insert Statement • Supports Table or View Destinations in SQL Server • Loads Data with No Applied Transformations • Supports Format Files to Specify File Layout • Requires Sysadmin or Bulkadmin Fixed Server Roles Membership The Bulk Insert Task is One of Three Ways to Run SQL Server Bulk Copy Operations
Sidebar: SQL Server Bulk Copy Operations • What Do Bulk Copy Operations Offer? • Allow Fast Loading of Data into SQL Server • Configure Data Load Batches • Allow You to Control Logging Operations • Bcp Utility • Bulk Insert Task or T-SQL Bulk Insert Statement • Bulk Copy APIs for OLE DB, ODBC, DB-Library Applications • Ways to Access Bulk Copy Operations
Defining the Sales_stage Table Load • Using the Bulk Insert Task to Load Tab-delimited File Data into Sales_stage • Loading Sales_stage with Data Bound for Sales_fact DTS Tab Delimited File Polaris
Defining Execute SQL Task Functionality • Executing SQL Statements • Source database must understand SQL syntax • SQL statement determines task performance • Task supports single or multiple SQL statements • You can create queries in the DTS Query Designer • Running Parameterized Queries • Input parameters • Output parameters
Using Parameterized Queries • Understanding Global Variable Basics • User-defined storage locations • Information is shared across package steps • Using Parameters with Global Variables • Assign global variable values to query input parameters • Store query results to a global variable with output parameters
Global Variables Parameter ProductName CategoryName Parameter 1 Parameter 2 Parameter 1 The Parameter’s Position in the Query Determines Its Name Global Variables Provide Data to Input Parameters Creating Dynamic Queries ? Question Marks Represent Query Parameters SELECT * FROM product_dim WHERE product_name = ? AND category_name = ?
Storing Query Results Storing Row Values SELECT begin_date, end_date FROM financial_period WHERE quarter = 1 Output Parameter Global Variables begin_date end_date BeginDate EndDate Storing Entire Rowsets SELECT * FROM product Output Parameter Global Variable Entire Rowset Product Store Query Results in Global Variables
DTS Defining the Time_dim Data Load Time_dim_build Stored Procedure • Input Parameters • @p_start_date • @p_end_date
Defining the DTS Data Pump • DTS Mechanism for Moving and Transforming Data • Allows for High-speed Batch Copying of Data • Contains Supplied Data Transformations • Can Also Define ActiveX Script Transformations • Provides An Extendable COM-based Architecture That Allows for Custom Transformations (C++) • Permits the Application of Transformation Logic to Specific Phases of a Data Pump Operation • Multi Phase Data Pump
OLE DB ODBC Understanding How the Data Pump Processes Data X Forms Source Destination ActiveX Script Copy Trim String … Custom OLE DB ODBC • Connects to the source and destination • Reads OLE DB metadata about source and destination columns • Gathers data transformation definitions DTS Data Pump In Out Implements the transformation Writes completed record to the destination
Defining the Tasks That Transform Data • The Transform Data Task • Inserts • The Transform Data Task • Inserts • The Data Driven Query Task • Inserts • Updates • Deletes • The ParallelDataPumpTask • Processes hierarchical rowsets
Defining the Transform Data Task • Data Movement and Transformation Functionality • Copying data between heterogeneous data sources • Applying optional column level transformations • Extended Data Transfer Functionality • Supporting batch processing of data • Providing error-handling capabilities • Containing optimization settings for SQL Server destinations
Selecting Transformation Types Transformation Description ActiveX Script Invokes user-defined ActiveX scripts. Copy Column Copies data from source to destination. DateTime String Converts a date to a new destination format. Lowercase String Converts a string to lowercase characters. Uppercase String Converts a string to uppercase characters. Middle of String Extracts a sub string of source data. Trim String Removes white space from a source string. Read File Copies contents of a file to a destination column. File path is specified by a source column. Write File Copies contents of a source column to a file. File path is specified by a second source column.
Defining Column Mappings • One-to-One Mappings • Symmetric Many-to-Many Mappings • Asymmetric Mappings
Creating Efficient Column Mappings • Minimizing the Number of Column Mappings • Using Many-to-Many Mappings When Possible • Grouping Common Transformations Together
Loading Customer_dim Northwind OLTP SQL Server Database
Performance Settings • Enabling Fast Load • Using high-speed bulk copy processing • Accepting batches of transformed data • Only applies to SQL Server destinations • Using a Table Lock • Configuring Batch Size
Configuring Batch Size • Assembling Records into Groups • DTS commits records to database as a group • Insert batch size sets the number of records in the group • Understanding Default Behavior • Insert batch size is 0 • DTS assigns one batch for all records • Setting the Insert Batch Size • Value between 0-9999 • Setting value can improve performance
Defining SQL Solutions • You Can Use the Source Query of the Transform Data Task to Implement Data Transformations • The Source SQL Statement Must Be Understood by the Source Database • The Performance of the Source Query Depends on the SQL Statement • You Can Use Parameters in the Source Query to Create Dynamic Source SQL Statements • If You Use the Source Query to Manipulate Data, You Can Use the Copy Column Transformation to Load Data into the Destination
Applying SQL Solutions to Load Fact Tables • Using the Source Query to Join Staging Table Data to Dimension Tables • Retrieving Primary Key Values to Store as Foreign Keys on the Fact Table • Using a Copy Column Transformation in the Transform Data Task • Configuring Fast Load for SQL Server Destinations
time_dim 134 1/1/2000 Loading the Fact Table Source Data DimensionTables customer id product id order date quantity_sales amount_sales • Identifying Dimension Application Key Values in the Fact Table Source Data • Retrieving Primary Keys from Each Dimension Table to Assign Foreign Keys ALFI 123 123 1/1/2000 1/1/2000 400 400 10,789 10,789 customer_dim ALFI 201 ALFI Alfreds Sales Fact Data cust_key prod_key time_key quantity_sales amount_sales product_dim 25 123 Chai 201 123 25 1/1/2000 134 400 400 10,789
DTS Loading Sales_fact • Extracting Data from the Sales_stage Table • Assigning Foreign Keys by Retrieving Primary Keys from the Product_dim, Customer_dim, and Time_dim Dimensions
Best Practices - Performing Inserts • Bulk Insert Task • Accessing data in files • Loading data into SQL Server destinations • Copying data with no transformations • Transform Data Task • Accessing any source • Loading to any destination • Creating data transformations • Using input parameters in the source query • Applying custom logic to phases of the data pump
Best Practices - Performance Settings • Tuning the Transform Data Task • Fast load for SQL Server destinations • Batch size • Table lock • Tuning the Bulk Insert Task • Sort order for clustered indexes • Batch size • Table lock
Best Practices - Executing Flexible Queries • The Data Driven Query Task • Execute flexible queries on a row-by-row basis • Meet flexibility needs that outweigh performance needs • Perform non-insert queries • The Execute SQL Task • Execute SQL statements and extended SQL statements • Perform parameterized queries • Assign query outputs to global variables
Best Practices - Using Custom Tasks • Creating Reusable Functions and Utilities • Adding Functionality to DTS Package Designer • Implementing a Faster Alternative to ActiveX Script Tasks
Best Practices - Creating Efficient Column Mappings • Minimizing the Number of Column Mappings • Using Many-to-Many Mappings When Possible • Grouping Common Transformations Together
Best Practices - The Right Transformation Type • Using Supplied Transformations When Possible • Minimizing ActiveX Script Transformations When Performance Outweighs Flexibility • Using SQL Solutions with Copy Column Transformations • Developing Custom Transformations as a Faster Alternative to ActiveX Script Transformations